Skip to content

Latest commit

 

History

History
615 lines (447 loc) · 48.2 KB

File metadata and controls

615 lines (447 loc) · 48.2 KB

OpenCut — Implementation Roadmap

Version: 3.0 Updated: 2026-04-13 Baseline: v1.9.26 (254 routes, 68 core modules, 17 blueprints, 867 tests) Feature Plan: 302 features across 62 categories (see features.md)

⚡ Active work (2026-Q2 → 2026-Q4) lives in ROADMAP-NEXT.md — Wave A/B/C/D/E covering v1.18.0 → v1.23.0+ from the cross-project OSS research pass. This file is preserved as the long-range 302-feature plan and the v1.14.0 strategic-gap analysis below.


Guiding Principles

  1. Never break what works — Every wave ships a working product. No "rewrite everything then test."
  2. Incremental migration — New code coexists with old. Feature flags gate rollout. Old paths removed only after new paths are proven.
  3. User-facing value first — Each wave delivers visible improvements, not just internal refactors.
  4. Measure before optimizing — Add telemetry/logging before assuming bottlenecks.
  5. Shared infrastructure first — When multiple features need the same foundation (e.g., object tracking, spectral analysis), build the foundation once, then fan out.
  6. One new dependency per feature maximum — Avoid dep explosion. Prefer extending existing deps (OpenCV, FFmpeg, Pillow) over adding new ones.

Completed work (v1.0 - v1.9.26) moved to ROADMAP-COMPLETED.md.

Implementation Waves

Features are organized into 7 waves based on dependency chains, shared infrastructure, and priority. Each wave is independently shippable. Feature numbers reference features.md.

Dependency Legend

Symbol Meaning
FFmpeg Pure FFmpeg filter — no Python deps beyond subprocess
Pillow Image composition — already installed
OpenCV Computer vision — already installed (opencv-python-headless)
Existing AI Uses models already in the codebase (Whisper, Demucs, face detection, etc.)
New dep Requires a new pip dependency
New model Requires downloading a new AI model (potentially large)
Pipeline Orchestrates existing modules — no new deps

Wave 1: Quick Wins — No New Dependencies

Goal: Ship 40+ features using only existing FFmpeg filters, Pillow, NumPy, and current AI models. Maximum user value with minimum risk.

Timeline: 4-6 weeks New deps: Zero New routes: ~35

1A — FFmpeg Filter Features (14 features)

These are pure FFmpeg filter additions — each is a new route calling run_ffmpeg() with a new filter graph.

# Feature Effort Dep Detail
53.2 Adaptive Deinterlacing S FFmpeg yadif/bwdif filter. Auto-detect via ffprobe field_order or idet filter.
52.1 Lens Distortion Correction M FFmpeg lenscorrection filter with k1/k2 coefficients. Ship camera profile JSON (source: lensfun).
52.3 Chromatic Aberration Removal S FFmpeg chromanr filter or per-channel scale via split/scale/merge.
53.5 Frame Rate Conversion (Optical Flow) M FFmpeg minterpolate filter for up/down conversion. Preset modes.
44.1 Timecode Burn-In Overlay S FFmpeg drawtext with %{pts\:hms} or timecode option. Configurable position/font.
45.2 AV1 Encoding Support M FFmpeg libaom-av1 or libsvtav1 encoder. Add to export presets and social platform presets.
45.1 ProRes Export on Windows M FFmpeg prores_ks encoder. Profile selector (Proxy/LT/422/HQ/4444).
32.1 Hardware-Accelerated Encoding M FFmpeg Detect NVENC/QSV/AMF. Add h264_nvenc/hevc_nvenc codec options in export.
20.4 Photosensitive Seizure Detection S FFmpeg Frame-to-frame luminance delta analysis. Flag >3 flashes/sec per ITU-R BT.1702.
38.1 GIF / WebP / APNG Export S FFmpeg gif/libwebp_anim output format. Palette optimization via palettegen/paletteuse.
3.10 Film Grain & Vignette (Enhanced) S FFmpeg noise + vignette filters with presets (Super 8, 16mm, 35mm, VHS).
25.1 Dialogue De-Reverb M FFmpeg arnndn or afftdn with speech-optimized profile.
42.2 Timelapse Deflicker M FFmpeg deflicker filter or rolling-average luminance normalization per frame.
30.3 Freeze Frame Insert S FFmpeg Extract frame at timestamp, generate still clip of configurable duration, splice into sequence.

1B — Pillow/Canvas Overlay Features (10 features)

Image composition overlays using existing Pillow renderer.

# Feature Effort Dep Detail
61.1 Composition Guide Overlay S Pillow Rule-of-thirds, golden ratio, center cross, safe areas on preview frame. Display-only.
36.1 Platform Safe Zone Overlay S Pillow TikTok/YouTube/Instagram UI element overlays on preview frame. JSON-driven zone definitions.
34.1 Scrolling Credits Generator M Pillow Bottom-to-top scroll rendered as video via Pillow frame sequence + FFmpeg encode.
34.3 Lower Third Generator M Pillow Name/title bar with configurable style presets. Burn into video at timestamp range.
20.3 Color Blind Simulation Preview S Pillow Apply CVD color matrix (deuteranopia, protanopia, tritanopia) to preview frame.
11.2 Click & Keystroke Overlay M Pillow Parse click/key logs → render ripple animations and key badges as overlay frames.
11.3 Callout & Annotation Generator M Pillow Numbered callouts, spotlight boxes, blur regions, arrows at timestamps.
18.2 Retro VHS / CRT Effect M Pillow+FFmpeg Scanlines, chroma shift, noise, tracking artifacts, date stamp. Preset chain.
18.3 Glitch Effect Pack M Pillow+FFmpeg Datamosh, RGB shift, block displacement, scan distortion. Per-frame render.
48.1 Highlight Reel Auto-Assembly M Pipeline Score clips by audio energy + motion → select top N → assemble with transitions + music.

1C — Existing AI Extensions (10 features)

Features that extend already-installed AI models with new analysis modes.

# Feature Effort Dep Detail
55.3 Profanity Bleep Automation S Existing AI Whisper word timestamps + configurable word list → 1kHz tone or silence at flagged words.
61.2 Shot Type Auto-Classification M Existing AI Face size relative to frame (MediaPipe) → ECU/CU/MCU/MS/WS classification per scene.
29.1 Shot Type Search & Tagging M Existing AI Store shot type in footage index (FTS5). Enable search by shot type.
56.4 Room Tone Auto-Generation M NumPy Analyze quiet segments → spectral envelope → shape white noise to match → fill cuts.
61.3 Intelligent Pacing Analysis M Existing AI Scene detection cut points → mean/median/stddev shot lengths → genre benchmark comparison.
28.1 Black Frame / Frozen Frame Detection S FFmpeg+OpenCV blackdetect filter + frame differencing for frozen frames. Report timestamps.
28.2 Audio Phase & Silence Gap Check S FFmpeg aphasemeter + silence detection. Flag phase issues and unnatural gaps.
4.8 Best Take Selection M Existing AI Per-take scoring: audio quality (SNR), face visibility, sharpness, duration. Rank takes.
11.5 Dead-Time Detection & Speed Ramp S Existing AI Frame differencing (scene_detect) + silence detection → speed-ramp or cut dead time.
52.4 Lens Profile Auto-Detection S FFmpeg Parse camera model from ffprobe metadata → look up in lensfun JSON database.

1D — Split-Screen & Comparison (6 features)

New composite video modes using FFmpeg overlay/hstack/vstack.

# Feature Effort Dep Detail
57.1 Split-Screen Layout Templates M FFmpeg JSON layout definitions (cells with x/y/w/h %). Composite via overlay filter chain.
57.2 Reaction Video Template M FFmpeg Main content + PiP webcam. Auto-sync via audio cross-correlation. Audio ducking.
57.3 Before/After Comparison Export M FFmpeg hstack/vstack, animated wipe via overlay + keyframed crop. Label overlay.
57.4 Multi-Cam Grid View Export M FFmpeg 2x2 to 4x4 grid. Optional active-speaker highlight border from diarization data.
6.3 Side-by-Side Before/After Preview M FFmpeg Preview modal showing original vs processed frame. Slider wipe in panel.
3.9 Multi-Camera Audio Sync M FFmpeg+NumPy Audio fingerprint cross-correlation for time offset detection. Multicam XML output.

Wave 1 Total: ~40 features, 0 new dependencies, ~35 new routes


Wave 2: Pipeline Orchestration — Chain Existing Modules

Goal: Build high-value composite workflows that chain existing modules into new products. These are the features that competitors charge monthly subscriptions for.

Timeline: 3-5 weeks (can overlap with Wave 1) New deps: Zero (all existing) New routes: ~20

2A — Content Repurposing Pipelines (5 features)

# Feature Effort Dep Detail
58.1 Long-Form to Multi-Short Extraction L Pipeline Transcribe → LLM highlights (N clips) → per-clip: trim + face-reframe 9:16 + burn captions + export. Folder of numbered shorts + metadata CSV.
58.4 Podcast Episode Bundle M Pipeline Denoise + normalize → clean audio export → transcribe → chapters → highlight clips → audiogram → show notes → transcript. All outputs in timestamped folder.
54.4 AI Video Summary / Condensed Recap M Pipeline Scene detect → transcript LLM analysis → engagement scoring → select top N shots → trim 3-5s each → assemble with crossfades. Configurable target duration.
58.2 Video-to-Blog-Post Generator M Pipeline Transcribe → LLM structured article with section headings → extract key frames at section boundaries → assemble markdown + images folder.
58.3 Social Media Caption Generator S Pipeline Per-exported-clip: extract transcript → LLM generates platform-optimized post caption (char limits, hashtags, tone). JSON output alongside each clip.

2B — Advanced Workflow Presets (8 features)

# Feature Effort Dep Detail
53.3 Old Footage Restoration Pipeline L Pipeline Stabilize → deinterlace (53.2) → denoise (temporal) → upscale (Real-ESRGAN) → color restore → frame rate conversion. VHS/8mm/Early Digital presets.
40.3 Video Podcast to Audio-Only S FFmpeg Extract audio track, normalize, denoise, export as podcast-ready MP3/WAV with ID3 tags.
40.4 Podcast Show Notes Generator M Pipeline Transcribe → LLM: summary, key topics with timestamps, pull quotes, mentioned resources, chapter markers. Markdown/HTML output.
12.3 Auto Montage Builder M Pipeline Score clips (audio energy + motion) → select top N → detect beats in music track → trim clips to beat intervals → concatenate with transitions.
14.1 Paper Edit / Script Sync L Pipeline Import script text → fuzzy-match against transcript → generate organized clip assembly with confidence scores.
4.1 Watch Folder / Hot Folder M Pipeline Monitor directory for new files → auto-run configured workflow → output to destination folder. Background polling with configurable interval.
4.2 Render Queue M Pipeline Queue multiple export jobs with different settings. Sequential execution with progress tracking. Notification on batch completion.
5.1 Multi-Platform Batch Publish L Pipeline Single source → batch export for YouTube + TikTok + Instagram + LinkedIn with per-platform reframe, caption style, loudness target, and metadata.

2C — Composite Feature Enhancements (4 features)

# Feature Effort Dep Detail
24.1 Shot-Change-Aware Subtitle Timing M Pipeline Scene detection (existing) → post-process captions: split at cut boundaries with minimum gap. Integrate into caption generation pipeline.
16.1 Beat-Synced Auto-Edit L Pipeline Detect beats (existing librosa) → scene detect → align cuts to nearest beat → assemble. Music video editing automation.
36.4 Vertical-First Intelligent Reframe M Pipeline Saliency detection + face tracking → auto-crop to 9:16 with smooth path. Better than center-crop for non-face content.
30.1 Ripple Trim / Gap Close M ExtendScript After cut application, auto-close gaps by rippling subsequent clips. ExtendScript removeEmptyTrackItems().

Wave 2 Total: ~17 features, 0 new dependencies, ~20 new routes


Wave 3: Architecture & Infrastructure

Goal: Complete the remaining architectural work that enables heavy AI features in Waves 4-7. These are not user-facing but are prerequisites for scale.

Timeline: 6-10 weeks (runs in parallel with Waves 1-2) Dependencies: Internal refactoring

3A — Process Isolation for GPU Workers (P0)

The single most important infrastructure change. Every AI feature in Waves 4-7 benefits from this.

Task Detail
Worker pool architecture opencut/workers/ with WorkerManager. Workers are separate Python processes per model family (whisper, demucs, realesrgan, depth, generation).
IPC protocol Workers communicate via localhost HTTP (minimal Flask on random port) or multiprocessing.Queue. Job dispatcher routes by type.
GPU memory management Worker reports VRAM on startup. Dispatcher checks available VRAM against model's known requirement before scheduling. Workers exit after 5-min idle to free VRAM.
Graceful degradation GPU OOM → specific guidance ("Model needs 4GB VRAM, you have 2GB. Switching to CPU.") → optional CPU re-dispatch.
Model registry models.json mapping model name → VRAM requirement, download size, expected load time. UI shows this info.

Deliverable: No more OOM crashes from model conflicts. GPU utilization visible in status bar.

3B — UXP Full Parity & CEP Migration (P0)

CEP end-of-life is approximately September 2026. UXP must be production-ready before then.

Task Detail
Shared component library extension/shared/ with framework-agnostic components. Both CEP and UXP import from here. Build system outputs two bundles.
Feature registry features.json defines every feature: id, label, endpoint, params schema, requires. Both panels auto-generate UI from this. Adding a feature = one JSON entry + one backend route.
UXP feature gap closure Port remaining ~15% of CEP features to UXP. Mostly: workflow builder, full settings panel, plugin UI.
Native UXP timeline access Replace ExtendScript evalScript() with direct premierepro UXP module for timeline read/write. 10x faster.
Premiere menu integration Right-click → "OpenCut: Remove Silence" / "Add Captions" / "Normalize Audio" via UXP API.
CEP deprecation plan Mark CEP panel as "legacy" in docs. Freeze CEP feature additions. All new features UXP-only after Wave 3.

Deliverable: UXP panel at 100% parity. CEP can be removed when Adobe enforces it.

3C — FastAPI Migration (P3 — Deferred)

Low priority. Flask works fine at current scale. Migrate only if:

  • Request validation boilerplate becomes unmanageable (>300 routes)
  • WebSocket needs outgrow the current websockets library
  • Auto-generated OpenAPI docs become essential for plugin developers

If triggered, migrate one blueprint at a time (system → settings → search → nlp → timeline → jobs → captions → audio → video). Pydantic models replace safe_float()/safe_int() hand-validation.

3D — TypeScript Migration (P3 — Incremental)

Continue incremental migration as files are touched. Priority order:

  1. API layer (src/api/types.ts from OpenAPI schema)
  2. Store/state management
  3. Tab modules as they're refactored for new features

No dedicated sprint. Piggyback on feature work.


Wave 4: New Feature Domains — Moderate Dependencies

Goal: Add new feature domains that require 1-2 new dependencies each but significantly expand OpenCut's capability.

Timeline: 6-8 weeks (after Wave 1, can overlap with Wave 3) New deps: 4-6 new pip packages New routes: ~30

4A — Privacy & Content Redaction (5 features)

Shared infrastructure: object detection framework, tracking pipeline, audio masking.

# Feature Effort New Dep Detail
55.1 License Plate Detection & Blur M paddleocr or YOLO plate model Detect plates per frame → track with IoU → Gaussian blur on tracked regions.
55.3 Profanity Bleep Automation S None (done in Wave 1)
55.2 OCR-Based PII Redaction L paddleocr (shared with 55.1) OCR → regex PII patterns (SSN, phone, email, CC) → NER for names → track text regions → blur.
55.4 Document & Screen Redaction M OpenCV (existing) Edge detection → perspective transform → classify as screen/document/whiteboard → blur surface.
55.5 Audio Speaker Anonymization M Existing (pedalboard) Diarize → target speaker segments → pitch shift + formant shift or TTS resynthesis.

New dependency: paddleocr (or reuse existing Tesseract if sufficient). One dep serves 55.1 + 55.2.

4B — Camera & Lens Correction (3 remaining features)

# Feature Effort New Dep Detail
52.2 Rolling Shutter Correction L gyroflow CLI (subprocess) Integrate Gyroflow as subprocess with lens profiles. Parse gyro metadata from GoPro/DJI.
13.4 LOG / Camera Profile Pipeline M None Auto-detect LOG profile from ffprobe metadata → apply bundled technical LUT (free Sony/Canon/Panasonic LUTs).
43.4 Color Space Auto-Detection M None Read color_primaries/transfer_characteristics from ffprobe → auto-apply correct input transform.

New dependency: gyroflow CLI (optional, subprocess only — not a pip package).

4C — Spectral Audio Editing (4 features)

Shared infrastructure: STFT analysis/resynthesis pipeline.

# Feature Effort New Dep Detail
56.4 Room Tone Auto-Generation M None (done in Wave 1)
56.3 AI Environmental Noise Classifier M tensorflow-lite or onnxruntime (existing) YAMNet model (521 sound classes, TFLite). Classify → selective removal via spectral masking.
56.2 Spectral Repair / Frequency Removal M librosa (existing) STFT → identify persistent spectral peaks (hum/buzz) → attenuate → inverse STFT. Auto-detect mode.
56.1 Visual Spectrogram Editor L librosa (existing) FFmpeg showspectrumpic or librosa STFT → zoomable canvas in panel → brush tool mask → inverse STFT reconstruction.

New dependency: None if using onnxruntime (already installed) for YAMNet. Otherwise tflite-runtime (lightweight).

4D — Proxy & Media Management (4 features)

# Feature Effort New Dep Detail
60.1 Auto Proxy Generation L None Detect clips >1080p → FFmpeg scale to target res + CRF 28 → store in ~/.opencut/proxies/ with manifest. Background job.
60.2 Proxy-to-Full-Res Swap on Export S None Query timeline clip paths via ExtendScript → check against proxy manifest → verify originals exist → report.
60.3 Media Relinking Assistant M None ExtendScript: enumerate offline items. Python: recursive search by filename + size matching. Batch relink UI.
60.4 Duplicate Media Detection M None File size grouping → partial hash (first+last 64KB) → full hash for matches. Optional pHash for content matches.

New dependency: None.

4E — Pro Color Science — First Pass (4 features)

# Feature Effort New Dep Detail
13.1 Real-Time Color Scopes L FFmpeg+Pillow FFmpeg waveform, vectorscope, histogram filters render scope images. Display as image grid in panel.
13.5 Film Stock Emulation M None Custom 3D LUTs per stock (Kodak/Fuji) + grain overlay + gate weave + halation via blend. Preset package.
13.4 LOG Camera Profile Pipeline M None (listed in 4B)
43.1 ACES Color Pipeline L None ACES IDT/ODT via FFmpeg colorspace + lut3d. Bundled ACES LUTs (free from AMPAS).

New dependency: None (FFmpeg + bundled LUT files).

Wave 4 Total: ~18 features (excluding duplicates from Wave 1), 1-2 new deps, ~30 new routes


Wave 5: AI Dubbing & Voice Translation

Goal: Build the end-to-end AI dubbing pipeline — the single highest-value new AI capability. This is what ElevenLabs, HeyGen, and Rask.ai charge $50-100/month for.

Timeline: 4-6 weeks (after Wave 3A process isolation is ready) Prerequisite: Wave 3A (GPU process isolation) — dubbing loads multiple large models sequentially New deps: Minimal (leverages existing Chatterbox, Whisper, Demucs, SeamlessM4T)

# Feature Effort Detail
62.1 End-to-End AI Dubbing Pipeline XL Transcribe → translate (SeamlessM4T) → voice-clone TTS (Chatterbox) with duration constraints → stem-separate original (Demucs, remove dialogue, keep music/SFX) → mix dubbed dialogue + original music/SFX → export.
62.2 Isochronous Translation L LLM-assisted translation constrained by segment duration. Iterate: translate → estimate TTS duration from syllable count → if too long, ask LLM to rephrase shorter → if too short, expand. Target +-10% of original.
62.3 Multi-Language Audio Track Management M FFmpeg -map to mux multiple audio streams with language metadata. Panel UI: track list with language dropdown, add/remove, default flag. Export multi-track MKV/MP4 or per-language files.
62.4 Emotion-Preserving Voice Translation L Extract prosody (F0 contour via librosa, RMS energy, speaking rate) from original → generate TTS with neutral prosody → transfer original prosody shape to dubbed audio via WORLD vocoder or pitch manipulation.

Workflow chain: The dubbing pipeline calls 5 existing modules in sequence. The key new code is the orchestrator (core/dubbing.py) and the isochronous translation loop (core/isochron_translate.py).

New dependency: Potentially pyworld for vocoder-based prosody transfer (62.4). Everything else is already installed.

Wave 5 Total: 4 features, 0-1 new deps, ~8 new routes


Wave 6: Advanced Professional Features

Goal: Deep features for professional editors, colorists, and post-production specialists. These differentiate OpenCut from casual tools.

Timeline: 8-12 weeks (can be worked on in parallel tracks) New deps: 2-4

6A — Composition & Framing Intelligence (3 features)

# Feature Effort Detail
61.4 Saliency-Guided Auto-Crop M Face regions (high weight) + motion regions (frame diff) + text regions (OCR) + high-contrast edges → weighted heat map → place crop to maximize saliency.
13.2 Three-Way Color Wheels L SVG color wheel widgets in panel → map wheel positions to FFmpeg colorbalance filter values (lift/gamma/gain). Preview via frame extraction.
13.3 HSL Qualifier / Secondary Correction L OpenCV HSV range masking with feathered edges → apply corrections to masked region only → composite. Preview matte in panel.

6B — Pre-Production Tools (4 features)

# Feature Effort Detail
59.4 Script-to-Rough-Cut Assembly XL Batch transcribe all footage → fuzzy-match transcript segments against script text → rank matches by similarity + audio quality + face visibility → assemble best take per segment as OTIO/Premiere XML.
59.2 Shot List Generator from Screenplay M Parse screenplay format (INT./EXT., ACTION, DIALOGUE) → LLM suggests shot count and camera angles per scene → export as CSV.
59.1 AI Storyboard Generation from Script L Parse script into shots → generate one image per shot via Stable Diffusion or external API → layout as storyboard grid with descriptions → export PDF.
59.3 Mood Board Generator from Footage M Extract keyframes → k-means color clustering → style tags (warm/cold, contrast, saturation) → suggest matching LUTs → compile as visual reference image.

6C — Video Repair & Restoration (3 remaining features)

# Feature Effort Detail
53.1 Corrupted File Recovery M Detect corruption type (missing moov, truncated stream). For missing moov: untrunc algorithm with reference file. For truncated: ffmpeg -err_detect ignore_err salvage. Report recovery stats.
53.4 SDR-to-HDR Upconversion L FFmpeg zscale (bt709 → bt2020) + inverse tone mapping. Apply PQ/HLG transfer function. Embed ST.2086 metadata.
13.6 Power Windows with Tracking L Shape masks (circle, rect, polygon) in panel → track via MediaPipe (face) or SAM2 (object) → apply corrections inside/outside mask via per-frame FFmpeg filter.

6D — Forensic & Legal (3 features)

# Feature Effort Detail
35.1 Selective Redaction Tool M Click-to-select regions in preview → track across frames → blur/pixelate/black. Export redaction log for audit trail.
35.2 Chain of Custody Metadata S SHA-256 hash of original + all operations applied + timestamps → embed as metadata or export as sidecar JSON.
35.3 Forensic Enhancement M Stabilize + denoise + sharpen + contrast stretch + frame interpolation for low-quality surveillance footage.

6E — Accessibility & Compliance (3 features)

# Feature Effort Detail
20.1 Caption Compliance Checker M Parse captions → check against rulesets (Netflix <=42 CPL, FCC <=32 CPL, BBC <=160 WPM, min duration, CPS). Flag violations with auto-fix suggestions.
20.2 Audio Description Track Generator L Detect dialogue pauses (existing VAD) → extract key frames during pauses → describe via LLM vision → TTS synthesis → mix into gaps → export as AD track.
27.1 C2PA Content Credentials M Embed Content Authenticity Initiative metadata (origin, edit history, AI disclosure). c2pa-python library.

Wave 6 Total: ~16 features, 2-3 new deps, ~25 new routes


Wave 7: AI Generation, 360, & Emerging Tech

Goal: Forward-looking AI capabilities and niche professional features. These are differentiators, not table-stakes.

Timeline: Ongoing (8-16 weeks, lowest priority) New deps: Several (heavy AI models) Prerequisite: Wave 3A (GPU process isolation) essential for multiple large models

7A — AI Video Generation & Synthesis (5 features)

# Feature Effort New Dep Detail
54.2 Image-to-Video Animation L diffusers (existing) SVD or CogVideoX with image conditioning → 2-6s clip from still image + motion prompt.
54.5 AI Background Replacement L diffusers (existing) RVM foreground extraction + Stable Diffusion background from text prompt → composite.
54.1 AI Outpainting / Frame Extension L diffusers (existing) Extend canvas to target aspect ratio → inpaint borders via ProPainter or SD. Keyframe-based for temporal consistency.
54.3 AI Scene Extension XL diffusers (existing) Feed last N frames to video prediction model → generate continuation. Best for static scenes.
21.1 Multimodal Timeline Copilot XL LLM API (existing) Chat interface backed by multimodal AI that sees video + audio + transcript. Navigate, select, edit via natural language.

7B — 360 / VR / Immersive (4 features)

# Feature Effort Detail
51.2 Equirectangular to Flat Projection M FFmpeg v360 filter. Keyframeable yaw/pitch/roll for virtual camera paths.
51.3 FOV Region Extraction from 360 M Face detection in equirectangular space → per-speaker flat extraction with smooth tracking → multicam XML.
51.1 360 Video Stabilization L Parse gyro metadata (GoPro GPMF, Insta360) → apply inverse rotation via FFmpeg v360.
51.4 Spatial Audio Alignment L Map speaker positions from face detection → route mono dialogue to correct ambisonic channel. First-order ambisonics output.

7C — Niche Professional Features

# Feature Effort Detail
41.1 DJI Telemetry Data Overlay M Parse DJI SRT files → render altitude, speed, GPS, battery as configurable overlay.
42.1 Image Sequence Import & Assembly M Import folder of images (TIFF, EXR, DPX, PNG) → assemble as video with configurable FPS and transitions.
39.1 Elgato Stream Deck Integration M WebSocket/HTTP listener for Stream Deck commands → map buttons to OpenCut operations. Plugin for Stream Deck SDK.
12.1 Gaming Highlight / Kill Detection L Multi-signal fusion: audio peaks + motion intensity + optional OCR on kill feed → score segments → extract top clips.
33.1 Lecture Recording Auto-Split M Scene detection + chapter generation → split lecture by topic → generate per-topic clips with title cards.
46.1 Multi-Step Autonomous Editing Agent XL LLM plans editing steps from high-level instruction → executes via OpenCut API → iterates on result quality. Full agent loop with human review checkpoints.

Wave 7 Total: ~15 features, 0-2 new deps (most already installed), ~20 new routes


Implementation Order & Dependencies

Wave 1 (Quick Wins)          |=============================|
Wave 2 (Pipelines)           |=======================|
Wave 3A (GPU Isolation)           |========================|
Wave 3B (UXP Parity)              |=====================|
Wave 4 (New Domains)                   |========================|
Wave 5 (AI Dubbing)                         |================|
Wave 6 (Pro Features)                            |===========================|
Wave 7 (Emerging)                                      |=========================>
                              Wk 1    Wk 6    Wk 12   Wk 18   Wk 24   Wk 30+

Critical path: Wave 3A (GPU isolation) must land before Waves 5 and 7A (heavy AI features).

Parallel tracks:

  • Wave 1 + Wave 2 can run simultaneously (different developers or even same developer — no conflicts)
  • Wave 3A + Wave 3B are independent
  • Wave 4 can start as soon as Wave 1 is done (shares no code)
  • Wave 6 features are independent of each other (can be cherry-picked)

Route Growth Projection

Milestone Routes Core Modules Tests (est.)
Current (v1.9.26) 254 68 867
After Wave 1 ~290 ~78 ~1,050
After Wave 2 ~310 ~85 ~1,200
After Wave 4 ~340 ~95 ~1,400
After Wave 5 ~348 ~99 ~1,500
After Wave 6 ~373 ~110 ~1,700
After Wave 7 ~393 ~120 ~1,900

Priority Matrix (Updated)

P0 — Critical (Do First)

# Feature Wave Effort Why Critical
3A GPU Process Isolation 3 XL Prerequisite for all heavy AI features. Eliminates OOM crashes.
3B UXP Full Parity 3 XL CEP end-of-life ~Sept 2026. Must be ready before then.
32.1 Hardware-Accelerated Encoding 1 M Users with GPUs expect NVENC/QSV. Every other tool has this.
58.1 Long-Form to Multi-Short Extraction 2 L $228/year competitor (Opus Clip). Highest-value pipeline.

P1 — High Impact (Next Priority)

# Feature Wave Effort Why High Impact
62.1 End-to-End AI Dubbing 5 XL $50-100/month competitor category. Uses all existing modules.
57.1 Split-Screen Templates 1 M CapCut/iMovie table-stakes. Massive content category.
55.1 License Plate Blur 4 M Privacy law compliance. Every content creator needs this.
55.3 Profanity Bleep 1 S Broadcast requirement. Trivial to build.
53.2 Adaptive Deinterlacing 1 S Every NLE has this. Legacy footage is common.
52.1 Lens Distortion Correction 1 M Standard camera correction. lensfun database is free.
56.4 Room Tone Auto-Generation 1 M iZotope RX feature. Makes silence removal sound professional.
60.1 Auto Proxy Generation 4 L Premiere/Resolve/FCPX all have this. 4K editing prerequisite.
61.2 Shot Type Classification 1 M Enables intelligent editing decisions and footage search.
45.2 AV1 Encoding 1 M Modern codec with 30-50% bitrate savings. YouTube prefers it.
45.1 ProRes Export (Windows) 1 M Professional delivery format. Resolve offers this on Windows.
13.1 Real-Time Color Scopes 6 L Every colorist needs scopes. Color tools are blind without them.
59.4 Script-to-Rough-Cut 6 XL Biggest time saver in post-production. Avid ScriptSync competitor.
20.1 Caption Compliance Checker 6 M Netflix/FCC/BBC requirements. Prevents platform rejection.
24.1 Shot-Change-Aware Subtitle Timing 2 M Broadcast QC standard. Simple post-processing.

P2 — Valuable (Scheduled)

All remaining Wave 1-6 features not listed above (~60 features).

P3 — Future (Backlog)

All Wave 7 features + FastAPI migration + TypeScript + niche items (~40 features).


Success Metrics (Updated)

Metric v1.9.26 (Current) After Waves 1-2 After Waves 3-5 After Waves 6-7
API routes 254 ~310 ~348 ~393
Core modules 68 ~85 ~99 ~120
Tests 867 ~1,200 ~1,500 ~1,900
Time to first useful action ~30s (workflow) ~15s (pipeline) ~10s (context + agent) ~5s (copilot)
Install success rate ~90% ~92% ~95% (isolation) ~99% (Docker)
Competitor features covered ~60% ~75% ~85% ~95%
Features available in UXP ~85% ~90% 100% 100%
New deps added 0 0 1-2 4-6

Risk Register

Risk Impact Mitigation
CEP deprecation before UXP ready High Wave 3B is P0. Start immediately. Freeze CEP feature additions.
GPU process isolation complexity High Start with simple subprocess model. Upgrade to full worker pool later. Ship incremental improvements.
AI model download sizes Medium Models are optional. Clear size warnings in UI. Pre-download in installer. Offer cloud API fallback where possible.
Too many features → quality regression High Every new feature gets a smoke test before merge. Ruff lint on CI. No feature without a test.
Dependency conflicts from new packages Medium One new dep per feature max. Pin versions. Test in isolated venv before adding to pyproject.toml.
Scope creep from 302-feature plan Medium Waves are independently shippable. Only commit to one wave at a time. Review and reprioritize between waves.

This roadmap should be reviewed at the start of each wave and reprioritized based on user feedback, competitive landscape changes, and lessons learned from the previous wave.


Research & Strategic Gaps (Auto-Generated Analysis)

Auditor: Principal Systems Architect analysis Date: 2026-04-14 Baseline: v1.14.0 (1,088 routes, 408 core modules, 83 blueprints, 87 test files, 6,925 tests) Method: Full codebase scan, security audit, architecture bottleneck analysis, test/CI pipeline review

Context: This roadmap was authored at v1.9.26 (254 routes, 68 modules). The codebase has since grown 4.3x in routes and 6x in modules. The Wave 1-7 structure and growth projections are now obsolete — the "After Wave 7" target of ~393 routes was surpassed at v1.10.5. This analysis identifies the gaps that the rapid feature expansion has opened.


HIGH Priority — Blocking Issues

  • GPU process isolation is still unimplemented (Wave 3A). This was marked P0 and remains the single most critical infrastructure gap. MAX_CONCURRENT_JOBS = 10 in opencut/jobs.py:42 allows 10 simultaneous ML model loads into VRAM. PyTorch models (Demucs, Real-ESRGAN, InsightFace, SAM2, CLIP, etc.) each consume 500MB-4GB VRAM. Concurrent loads will OOM on consumer GPUs. No memory reservation, no model-aware scheduling, no graceful degradation path exists. Every AI feature added since v1.10 has widened this gap. The 408-module codebase now has 40+ modules that load GPU models — 6x more than when Wave 3A was planned.

    • Recommended action: Implement a GPU memory budget system immediately. At minimum: reduce MAX_CONCURRENT_JOBS to 3 for GPU-tagged routes, add a @gpu_exclusive decorator that serializes GPU model access behind a semaphore, and report VRAM usage in /system/status.
  • Rate limiting covers 4% of async routes. Security audit found 597 async route handlers but only 23 rate-limit calls. The require_rate_limit() decorator exists and works, but was only applied to model-install and a handful of AI routes. All 574 unprotected async routes accept concurrent requests limited only by MAX_CONCURRENT_JOBS=10. A single client can trivially exhaust all 10 job slots with expensive operations (batch rendering, video processing, ML inference), starving other requests.

    • Recommended action: Introduce rate-limit categories (gpu_heavy, cpu_heavy, io_bound, light) and apply to all async routes. GPU-heavy operations should share a pool of 2-3 concurrent slots. CPU-heavy should cap at 4-6.
  • Test coverage is broad but shallow. 87 test files exist with 6,925 test functions, but the architecture audit reveals 97% of the 408 core modules lack dedicated behavioral tests — they're only exercised indirectly through route smoke tests. The smoke tests in test_route_smoke.py use broad status code assertions like assert resp.status_code in (200, 400, 429) which pass regardless of whether the feature works correctly. CI enforces only 50% line coverage (--cov-fail-under=50 in build.yml), which is insufficient for a codebase of this size and complexity.

    • Recommended action: Raise CI coverage threshold to 65% (target 80% over 2 sprints). Add schema validation for route responses (JSON structure, not just "is JSON"). Prioritize integration tests for the 40 GPU-model-loading modules — these are the highest-risk code paths with the least coverage.
  • Roadmap growth projections are 3x out of date. The "Route Growth Projection" table estimates 393 routes after all 7 waves. Actual count is 1,088 — a 2.8x overshoot. The "Success Metrics" table, "Completed Work" section, and wave feature lists don't reflect v1.10-v1.14 additions (categories 63-77, 155 new core modules, 20 new route blueprints). The roadmap should be rebased to reflect current reality so it can be trusted for planning.

    • Recommended action: Rebase all tables to v1.14.0 actuals. Mark Wave 1-2 features that were implemented in v1.10-v1.14 as DONE. Update dependency legend with new module families. Revise success metrics to reflect 1,088-route baseline.

MEDIUM Priority — Technical Debt & Infrastructure

  • helpers.py is a god module (350 imports). Every core module and most route files import from opencut/helpers.py. It contains FFmpeg execution, video probing, output path logic, temp file cleanup, package installation, and progress utilities — responsibilities that span 6+ concerns. This makes it a merge conflict magnet, impossible to test in isolation, and a startup bottleneck (every import chain pulls in the entire module).

    • Recommended action: Decompose into helpers/ffmpeg.py, helpers/video_probe.py, helpers/paths.py, helpers/cleanup.py, helpers/packages.py. Re-export from helpers/__init__.py for backward compat. Do this incrementally during feature work, not as a dedicated refactor sprint.
  • UXP migration has 5 months remaining. CEP end-of-life is approximately September 2026. The roadmap states UXP is at ~85% feature parity (Wave 3B). The UXP panel (extension/com.opencut.uxp/) has 7 tabs vs. CEP's 8, and the UXP main.js is 1,523 lines vs. CEP's 7,730 — indicating significant feature gaps in the frontend. No UXP-specific tests exist in CI. The CEP panel continues to receive features (v1.14.0 version bumps touch CEP files), violating the roadmap's "freeze CEP feature additions" directive.

    • Recommended action: Audit UXP vs. CEP parity at the feature level (not tab level). Add UXP smoke test to CI. Enforce CEP freeze — new frontend features go to UXP only.
  • No type checking in CI. 523 Python files with no mypy or pyright enforcement. Type errors (None where str expected, dict where dataclass expected, wrong callback signature) are caught at runtime — if at all. The on_progress callback pattern is already documented in CLAUDE.md as a gotcha (core modules call with 1 arg, routes define closures with 2 args), which is exactly the class of bug static typing catches.

    • Recommended action: Add mypy --ignore-missing-imports opencut/ to CI. Start with --no-strict and fix errors incrementally. Target: 0 type errors in opencut/core/ within 2 sprints.
  • Untracked subprocesses can orphan on cancel. The @async_job decorator registers the job's main thread for cancellation, and _register_job_process() tracks Popen handles. But 158 subprocess calls across core modules call subprocess.run() directly — these finish synchronously within the job thread but can't be interrupted mid-execution. If a user cancels a job while FFmpeg is mid-render (a 30-minute operation), the FFmpeg process runs to completion even though the job is marked cancelled. The process exit code is then silently discarded.

    • Recommended action: Wrap long-running subprocess calls in a pattern that checks job_cancelled flag and sends SIGTERM to the child process. Alternatively, refactor run_ffmpeg() in helpers.py to accept a job_id parameter and auto-register the Popen for cancellation.
  • No security scanning in CI pipeline. The build.yml workflow runs ruff lint and pytest but has no security tooling: no bandit (Python security linter), no CodeQL (GitHub's code scanning), no dependabot/Snyk (dependency vulnerability scanning), no SBOM generation. For a project that executes FFmpeg subprocesses, runs pip install at runtime via safe_pip_install(), and loads ML models from external sources, this is a meaningful gap.

    • Recommended action: Add bandit -r opencut/ -ll to CI (catches high-confidence security issues). Enable GitHub Dependabot for dependency alerts (zero-effort, just add dependabot.yml). Add CodeQL for deeper analysis.
  • Temp file accumulation under load. 93 modules create temp files via tempfile.mkstemp() or NamedTemporaryFile(). The deferred cleanup mechanism (_schedule_temp_cleanup() in helpers.py) uses a 5-second delay with 3 retries. Under concurrent load (10 video processing jobs), this means hundreds of multi-GB temp files (intermediate FFmpeg outputs, extracted frames, model outputs) can accumulate before cleanup fires. No disk quota, no max-temp-size check, no cleanup-on-startup sweep.

    • Recommended action: Add a startup sweep of tempfile.gettempdir() for stale opencut_* temp files. Add a periodic (60s) background cleanup for files older than 10 minutes. Log temp disk usage in /system/status.
  • 25+ tests use time.sleep() creating flaky CI. Tests in test_batch_executor.py, test_batch_parallel.py, test_boolean_coercion.py, test_integration_ffmpeg.py, and test_preview_realtime.py contain sleeps ranging from 10ms to 500ms. These are timing-dependent and will intermittently fail on slow CI runners, Windows VMs, or under load. Additionally, test_solver_agent.py uses random.seed(42) but other tests don't seed, introducing non-determinism.

    • Recommended action: Replace time.sleep() in tests with event-based synchronization (threading.Event, condition variables). For async result tests, poll with timeout rather than fixed sleep. Audit and seed all random usage.

LOW Priority — Future Investment

  • No auto-generated API documentation. With 1,088 routes across 83 blueprints, there is no OpenAPI/Swagger spec, no auto-generated endpoint catalog, and no machine-readable API schema. Plugin developers and external integrators must read route source code. The roadmap's Wave 3C notes FastAPI migration (which brings auto-generated OpenAPI) but defers it. The original trigger — "if >300 routes" — was passed long ago.

    • Recommended action: Generate an OpenAPI spec from Flask routes using flask-smorest or apispec without migrating to FastAPI. Serve Swagger UI at /api/docs for development mode only. This is a 1-day effort that unlocks plugin ecosystem development.
  • Blueprint registration is sequential and eager. register_blueprints() in routes/__init__.py performs 83 sequential import statements at app startup. Each import may trigger module-level initialization (cache setup, constant computation, availability checks). Measured impact is 2-5 seconds on startup — not a production issue but noticeable during development when the server auto-restarts on file changes.

    • Recommended action: No immediate action needed. If dev-cycle time becomes a complaint, implement lazy blueprint registration (register on first request to URL prefix).
  • No performance regression detection. No benchmarks, no load tests, no response-time tracking in CI. With 1,088 routes and 408 modules, a single change to helpers.py or jobs.py could degrade performance across hundreds of endpoints with no visibility.

    • Recommended action: Add a simple benchmark suite (10 representative endpoints, measure p50/p95 response time) that runs in CI and fails on >20% regression. Use pytest-benchmark or custom timing.
  • Missing production governance files. No SECURITY.md (vulnerability disclosure process), no CODE_OF_CONDUCT.md, no CONTRIBUTING.md with architecture guide, no SBOM (software bill of materials). For an open-source project with 408 modules and ML model downloads, these are expected by enterprise adopters.

    • Recommended action: Add SECURITY.md with disclosure process and supported-versions table. Generate SBOM from pyproject.toml deps.
  • FastAPI migration trigger has been reached. The roadmap defers FastAPI migration until ">300 routes" with the rationale that validation boilerplate would become unmanageable. Current state: 1,088 routes, 879 mutation endpoints, manual safe_float()/safe_int()/safe_bool() validation in every handler. Pydantic models would eliminate ~60% of per-route validation boilerplate and provide automatic request/response schema generation.

    • Recommended action: This remains low priority because Flask works and migration risk is high with 83 blueprints. However, the original deferral rationale no longer holds. If a major refactor is planned (e.g., helpers.py decomposition), consider migrating 1-2 blueprints to FastAPI as a proof-of-concept to measure the cost/benefit.

Summary Matrix

Finding Priority Effort Impact Status
GPU process isolation (Wave 3A) HIGH XL Eliminates OOM crashes Not started
Rate limiting expansion HIGH M Prevents DoS / resource exhaustion 4% coverage
Test depth & coverage threshold HIGH L Catches regressions before release 50% threshold
Roadmap rebase to v1.14.0 HIGH S Accurate planning Stale since v1.9.26
helpers.py decomposition MEDIUM M Reduces coupling, merge conflicts 350 imports
UXP full parity (Wave 3B) MEDIUM L CEP EOL Sept 2026 ~85% parity
Type checking in CI MEDIUM M Catches type bugs statically Not started
Subprocess cancellation MEDIUM M Clean job cancel behavior 158 untracked calls
Security scanning in CI MEDIUM S Catches vulnerabilities Not started
Temp file disk management MEDIUM S Prevents disk exhaustion No quota
Flaky test elimination MEDIUM S Reliable CI 25+ sleep-based tests
Auto-generated API docs LOW S Enables plugin ecosystem No spec exists
Performance benchmarks in CI LOW M Detects regressions Not started
Production governance files LOW S Enterprise readiness Missing
FastAPI migration evaluation LOW XL Reduces boilerplate at scale Deferred