You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
| 12 | .t7 style transfer models are 2017-era |`style_transfer.py`| Research AesPA-Net / InST for temporal consistency |TODO|
129
+
| 12 | .t7 style transfer models are 2017-era |`style_transfer.py`| Research AesPA-Net / InST for temporal consistency |RESEARCHED — Best path: ONNX migration (same models, GPU accel via onnxruntime). Magenta arbitrary style ONNX model (~12MB) would replace LAB histogram hack with real neural AdaIN. AesPA-Net/InST too heavy for video. See research notes below.|
130
130
| 13 | Stable Audio Open vs MusicGen |`music_ai.py`| Add as backend option | DONE (2026-04-06) — generate_music_stable_audio() + route + queue allowlist |
| 15 | Gyroflow integration for camera stabilization |`video_fx.py`| Research integration path |TODO|
133
-
| 16 | Caption rendering performance (Pillow bottleneck) |`styled_captions.py`| Research skia-python or FFmpeg drawvg (Cairo) |TODO|
132
+
| 15 | Gyroflow integration for camera stabilization |`video_fx.py`| Research integration path |DEFERRED — Gyroflow v1.6.3 requires camera gyro data (GoPro, Sony, Insta360). Niche use case. vid.stab remains best for general stabilization.|
133
+
| 16 | Caption rendering performance (Pillow bottleneck) |`styled_captions.py`| Research skia-python or FFmpeg drawvg (Cairo) |RESEARCHED — skia-python is clear winner: 10/10 style coverage, 2-5x CPU / 10-50x GPU speedup, excellent cross-platform. drawvg has NO text support. Cairo marginal (1.5x, bad Windows). Wand slower than Pillow. See research notes below.|
134
134
135
135
### TIER 3 -- Verify & Maintain (Working Well)
136
136
137
137
| # | Module | Action | Status |
138
138
|---|--------|--------|--------|
139
-
| 17 |`captions.py`| Verify faster-whisper>=1.1 compat, test turbo-ct2 model |TODO|
140
-
| 18 |`diarize.py`| Verify pyannote 3.1 still latest |TODO|
141
-
| 19 |`audio_enhance.py`| Verify Resemble Enhance + ClearerVoice current |TODO|
| 21 |`object_removal.py`| Verify SAM2 + ProPainter current |TODO|
139
+
| 17 |`captions.py`| Verify faster-whisper>=1.1 compat, test turbo-ct2 model |DONE (2026-04-06) — All current, model names in VALID_WHISPER_MODELS up to date|
140
+
| 18 |`diarize.py`| Verify pyannote 3.1 still latest |DONE (2026-04-06) — 3.1 still latest. Fixed use_auth_token->token for pyannote 4.0+|
141
+
| 19 |`audio_enhance.py`| Verify Resemble Enhance + ClearerVoice current |DONE (2026-04-06) — Both current, imports and model names verified|
| 21 |`object_removal.py`| Verify SAM2 + ProPainter current |DONE (2026-04-06) — Updated SAM 2.0->2.1 model IDs, ProPainter/LaMA current|
144
144
| 22-35 | Remaining modules | No external dep changes needed | OK |
145
145
146
146
---
@@ -188,4 +188,27 @@ All pure Python, no external dependencies. Includes: zoom.py, auto_zoom.py (cv2)
188
188
189
189
---
190
190
191
+
## Research Notes (2026-04-06)
192
+
193
+
### Style Transfer — ONNX Migration Path
194
+
- Current .t7 models (Johnson et al. 2016) work but use dated OpenCV DNN runtime
195
+
-**Recommended**: Swap to ONNX versions of same models (available on HuggingFace onnxmodelzoo). Drop-in replacement using `onnxruntime.InferenceSession()` instead of `cv2.dnn.readNetFromTorch()`. GPU acceleration via onnxruntime-gpu (already an optional dep).
196
+
-**Arbitrary style upgrade**: Replace LAB histogram matching with Magenta arbitrary style ONNX model (~12MB total). Real neural AdaIN vs current color-only transfer. Source: [pdn-styletransfer](https://github.com/patlevin/pdn-styletransfer)
197
+
-**Skip**: AesPA-Net (no pip package, Windows .t7 compat issues), InST (too slow for video — 2-5s/frame), SD img2img (way too heavy)
198
+
-**Future**: EFDM from ComfyUI-StyleTransferPlus is a promising fast arbitrary style method
-**Skip**: FFmpeg drawvg (zero text support), Wand/ImageMagick (slower than Pillow), pycairo (marginal speedup, bad Windows font handling)
205
+
-**Also**: animated_captions.py should switch from OpenCV decode/encode per frame to transparent overlay + FFmpeg composite (architectural win independent of rendering engine)
206
+
207
+
### Gyroflow — Deferred
208
+
- Gyroflow v1.6.3 requires camera gyroscope data (GoPro, Sony, Insta360 specific)
209
+
- Not applicable to general video stabilization workflow
210
+
- vid.stab remains the best scriptable stabilization option for arbitrary video
211
+
212
+
---
213
+
191
214
*Update this document when completing any modernization task. Mark status as DONE with date.*
0 commit comments