Skip to content

fika-dev/sequence-to-video

Repository files navigation

Sequence to Video

DDD-based video generation pipeline from sequence planning data.

Features

  • Sequencing Domain: Generate sequence JSON from scripts using Gemini
  • Planning Domain: Parse sequence JSON into structured scenarios
  • Library Domain: Index and search video footage, manage and analyze reference ads
  • Studio Domain: Generate TTS (Chirp v3), images (Gemini 3 Pro), videos (Veo 3.1), text animations, Lottie overlays, sound effects
  • Editing Domain: Compose final video with FFmpeg, apply effects and transitions

Installation

uv sync
uv run playwright install chromium

Configuration

cp .env.example .env

Required environment variables:

GOOGLE_PROJECT_ID=your-project-id
GCS_BUCKET=your-gcs-bucket

Usage

Full Pipeline

# 1. Generate sequence from script
uv run python main.py sequence examples/script.txt -o sequence.json -v

# 2. Render video from sequence
uv run python main.py render sequence.json -o output.mp4 -v

# Or directly from existing sequence JSON
uv run python main.py render examples/sample_sequence.json -o output.mp4

Sequencing Strategies

The sequence generator supports multiple strategies for creating video sequences. Each strategy offers a different approach to utilizing available footage.

# Default: Script-driven, ignores footage library
uv run python main.py sequence script.txt -o seq.json -v

# Footage-Aware: Considers existing footage while following the script
uv run python main.py sequence script.txt -o seq.json --strategy footage_aware -v

# Appeal-First: Analyzes footage appeal points first, then builds narrative (2-step)
uv run python main.py sequence script.txt -o seq.json --strategy appeal_first -v

# Control scene count (default: 10)
uv run python main.py sequence script.txt --strategy appeal_first --scene-count 8 -v

Strategy Comparison

Strategy Approach Best For
default Script-driven, footage-agnostic AI-generated content, no existing footage
footage_aware Script-driven with footage recommendations Matching script to available UGC
appeal_first Footage-driven, 2-step generation Maximizing impact of existing UGC
reference_guided Reference ads + UGC, 3-step generation Following proven ad structures with available footage

How Each Strategy Works

default

  • Single LLM call
  • Generates sequence purely from script content
  • No awareness of footage library

footage_aware

  • Single LLM call
  • Injects footage catalog + product context into prompt
  • Recommends candidate_clips for each scene based on script requirements
  • Output includes clip IDs that best match each scene

appeal_first (Multi-step)

  1. Step 1 - Appeal Analysis: Analyzes footage catalog to identify high-impact clips and their marketing appeal
  2. Step 2 - Sequence Generation: Uses the content strategy from Step 1 to build a narrative that showcases the best footage
Footage Catalog → Content Strategy → Sequence JSON
     ↓                   ↓                ↓
 (appeal points)    (narrative arc)   (candidate_clips)

Output: candidate_clips

When using footage_aware or appeal_first, scenes include candidate_clips:

{
  "scene_id": "s01",
  "visual_layer": {
    "type": "existing_footage",
    "prompt": "Woman checking bloated belly in mirror",
    "candidate_clips": ["비포 컷_clip_000", "비포컷_직장인_clip_001"],
    "query_tags": ["bloating", "body concern"],
    "fallback_gen_prompt": "..."
  }
}

During rendering, the FootageSelector prioritizes these candidate clips when selecting footage for each scene.

reference_guided (Multi-step, Reference + UGC)

Uses analyzed reference ads to guide sequence structure while utilizing available UGC footage:

  1. Step 1 - Blueprint: Finds similar reference ads, extracts framework/pacing/patterns, creates beat plan
  2. Step 2 - Casting: Maps each beat to available UGC clips via embedding search + LLM selection
  3. Step 3 - Sequence: Generates final sequence JSON with candidate_clips and style rules
Reference Ads → Blueprint → Beat Casting → Sequence JSON
     ↓              ↓            ↓              ↓
 (structure)    (beat plan)  (clip matches) (final output)

Prerequisites:

  • Analyzed reference ads in assets/reference_index/ (use analyze_references.py)
  • Indexed UGC footage in assets/library_index/
uv run python main.py sequence script.txt --strategy reference_guided -v

Index Raw Footage

# Index all videos in assets/raw_footage (generic analysis)
uv run python main.py index -v

# Index as product UGC (with marketing-focused analysis)
uv run python main.py index --type product_ugc -v

# Index with explicit context file
uv run python main.py index --type product_ugc --context path/to/context.txt -v

# Index single file with context
uv run python main.py index path/to/video.mp4 --type product_ugc --context context.txt -v

# Force re-index all
uv run python main.py index --force -v

Product Context for UGC Analysis

When using --type product_ugc, provide product context via:

  1. --context flag (explicit path):
uv run python main.py index --type product_ugc --context products/serum_info.txt -v
  1. Auto-detection (place in footage folder):
assets/raw_footage/
├── my_product/
│   ├── context.txt      ← Auto-detected
│   ├── ugc_video1.mp4
│   └── ugc_video2.mp4

Auto-detected filenames (checked in order): context.txt, product.txt, info.txt

context.txt example:

Product: VitaC Brightening Serum
Category: Skincare / Serum
Key Ingredients: 15% Vitamin C, Niacinamide, Hyaluronic Acid
Benefits: Brightening, dark spot care, hydration
Target: 20-40s with dull skin tone concerns
USP: Fast absorption, non-sticky texture

The analyzer uses this context to:

  • Generate product-relevant tags and descriptions
  • Identify appeal points specific to the product's benefits
  • Better match clips to scene requirements during rendering

Reference Ads Management

Download, organize, and analyze competitor/reference ads from Facebook Ads Library or YouTube to inform sequence generation.

Prerequisites

Install the patched yt-dlp with Facebook challenge bypass:

pipx install "git+https://github.com/legraphista/yt-dlp.git@fix/15577-facebook-ads-extractor-callenge" --suffix="-fb"

Step 1: Download Reference Ads

Create a links file with ad URLs (one per line), then download as a collection:

# Create links file
cat > fb_links.txt << 'EOF'
https://www.facebook.com/ads/library/?id=2015764748993699
https://www.facebook.com/ads/library/?id=719057414376082
# Comments are supported
https://www.facebook.com/ads/library/?id=312304267031140
EOF

# Download all ads into a collection
uv run python main.py reference-download fb_links.txt \
  --title "Supplement Ads Q1" \
  --domain "ads/supplements" \
  -v

# Options
uv run python main.py reference-download fb_links.txt \
  --title "Campaign Name" \
  --domain "ads/cosmetics" \
  --source youtube \        # facebook (default) or youtube
  --tags "korean" "ugc" \   # Optional tags
  -v

Step 2: Analyze Reference Ads

Analyze a collection using Gemini to extract narrative structure, pacing, hooks, and reusable patterns:

# Analyze by collection title, ID, or slug
uv run python main.py reference-analyze "Supplement Ads Q1" -v

# Force re-analyze already analyzed items
uv run python main.py reference-analyze "Supplement Ads Q1" --force -v

Step 3: List and Manage Collections

# List all collections
uv run python main.py reference-list

# Filter by domain
uv run python main.py reference-list --domain "ads/supplements"
uv run python main.py reference-list --domain "ads/*"  # All ads domains

# Sync to GCS (backup/team sharing)
uv run python main.py reference-sync --push -v

# Migrate legacy structure (if upgrading from old format)
uv run python main.py reference-migrate --dry-run -v
uv run python main.py reference-migrate -v

Directory Structure

assets/references/
├── index.json                    # Collection summaries
├── .gcs_state.json               # Sync state (gitignored)
└── collections/
    └── supplement_ads_q1/        # Collection slug
        ├── collection.json       # Collection metadata
        ├── source_links.txt      # Original links
        └── items/
            └── fb_2015764748993699/  # {source}_{source_id}
                ├── item.json         # Item metadata
                ├── video.mp4
                ├── thumbnail.jpg
                ├── source_meta.json  # Platform metadata
                ├── analysis.json     # Gemini analysis
                ├── fingerprint.npy   # Ad-level embedding
                └── b01.npy, b02.npy  # Beat embeddings

Analysis Schema

Each reference ad is analyzed for:

Field Description
framework Ad structure (hook_body_cta, aida, pas, etc.)
one_sentence_positioning Core message of the ad
target_audience Who the ad targets
core_pain_point Problem being addressed
core_promise Solution/benefit offered
pacing cuts_per_minute, hook_duration, first_cta_time
beats 2-6 second segments with narrative_role, visual_summary, on_screen_text
reusable_patterns Rules for generating similar ads
style_guide tone, text_overlay_style, music_mood

Example analysis output:

{
  "framework": "hook_body_cta",
  "one_sentence_positioning": "Fiber supplement that resolves fatigue",
  "pacing": {
    "hook_duration": 3.2,
    "first_product_reveal_time": 3.9,
    "first_cta_time": 13.0
  },
  "beats": [
    {
      "beat_id": "b01",
      "narrative_role": "hook",
      "hook_technique": "question",
      "on_screen_text": ["영양제 잔뜩 먹어도 피곤한 이유"]
    }
  ],
  "reusable_patterns": [
    "Start with a relatable problem or question",
    "Show product by ~3-4 seconds",
    "Include testimonials before CTA"
  ]
}

Domain Categories

Domain Description
ads/supplements Health supplements advertising
ads/cosmetics Beauty and skincare products
ads/food Food and beverage products
ads/etc Other advertising categories
content/education Educational content
content/medical Medical/health content
content/entertainment Entertainment content

Scene-Level Editing

Individual scenes are automatically saved to assets/review_output/{project_id}/scenes/ during rendering.

# Re-render a specific scene (e.g., after modifying sequence.json)
uv run python main.py rerender-scene sequence.json s03 -v

# Reassemble final video from existing scene files
uv run python main.py reassemble sequence.json -o output_v2.mp4 -v

Workflow example:

# 1. Initial render (scenes saved individually)
uv run python main.py render sequence.json -o output.mp4 -v

# 2. Review output, identify issues with scene s03

# 3. Modify sequence.json for s03, then re-render only that scene
uv run python main.py rerender-scene sequence.json s03 -v

# 4. Reassemble all scenes into final video
uv run python main.py reassemble sequence.json -o output_v2.mp4 -v

Web Viewer

Launch an interactive web UI to preview and edit sequence JSON files. The viewer allows real-time editing of scenes, regeneration of individual assets, and scene re-rendering.

# Launch viewer with a sequence file pre-loaded
uv run python main.py viewer examples/sample_sequence.json

# Launch empty viewer (load file via UI)
uv run python main.py viewer

# Custom host/port
uv run python main.py viewer sequence.json --host 0.0.0.0 --port 8080

Open http://127.0.0.1:8765 in your browser. The viewer provides:

  • Scene List: Navigate between scenes, see asset status badges
  • Scene Editor: Edit audio script, visual layer, text overlay, timing, effects
  • Text Overlay: Toggle on/off, edit content/style/animation, customize font and background colors
  • Lottie Overlays: Add/remove/edit Lottie animations with position, scale, and timing
  • Sound Effects: Add/remove/edit SFX with volume, timing, and fade controls
  • Asset Preview: View generated assets, regenerate individual components
  • Re-render: Re-render individual scenes or the entire video

Changes are auto-saved to the sequence JSON file when you click "Save Changes".

Generate Embeddings

Embeddings enable semantic search for footage selection. They are automatically generated during indexing, but you can regenerate them separately:

# Generate embeddings for all existing indexes (without re-analyzing videos)
uv run python main.py embed -v

Use this when:

  • You updated the embedding model
  • Embeddings were missing from older indexes
  • You want to refresh embeddings without full re-indexing

Convert MOV to MP4

uv run python scripts/convert_mov_to_mp4.py -v
uv run python scripts/convert_mov_to_mp4.py -v --delete  # Delete original after conversion

Individual Domain Testing

Each domain can be tested independently for development and debugging.

TTS Generation

# Generate TTS audio
uv run python scripts/test_tts.py "안녕하세요, 테스트입니다."

# With custom preset and speed
uv run python scripts/test_tts.py "텍스트" -p chirp_v3_korean_female_confident -s 1.2

# List available voice presets
uv run python scripts/test_tts.py --list-presets

Image Generation

# Generate image with Gemini 3 Pro
uv run python scripts/test_image.py "A modern Korean skincare product on white background"

# Custom size and context
uv run python scripts/test_image.py "Product shot" -W 720 -H 1280 --context "skincare advertisement"

Video Generation (Veo 3.1)

# Generate video (takes several minutes)
uv run python scripts/test_video.py "A woman applying skincare product" -d 5

# Custom duration (max 8 seconds)
uv run python scripts/test_video.py "Motion graphic animation" -d 8

Text Overlay

# Generate text overlay video
uv run python scripts/test_text_overlay.py "강력한 효과!" -d 3

# With custom style and animation
uv run python scripts/test_text_overlay.py "텍스트" -s bold_impact_red -a bounce

# List available styles and animations
uv run python scripts/test_text_overlay.py --list-styles
uv run python scripts/test_text_overlay.py --list-animations

Sequence Generation

# Generate sequence JSON from script file
uv run python scripts/test_sequence.py examples/script.txt -v

# Generate from inline text
uv run python scripts/test_sequence.py "첫 번째 장면: 제품 클로즈업. 두 번째 장면: 사용 후기" -v

Adding New Sequencing Strategies

To add a custom strategy:

  1. Create a new file in domains/sequencing/strategies/:
# domains/sequencing/strategies/my_strategy.py
from typing import Any
from domains.sequencing.strategies.base import SequencingStrategy

class MyStrategy(SequencingStrategy):
    @property
    def name(self) -> str:
        return "my_strategy"

    @property
    def is_multi_step(self) -> bool:
        return False  # Set True for multi-step strategies

    def build_prompt(
        self,
        script: str,
        product_context: str | None = None,
        footage_catalog: str | None = None,
        lottie_presets_section: str = "",
        sfx_presets_section: str = "",
        scene_count: int = 10,
        step_context: dict[str, Any] | None = None,
    ) -> str:
        # Build your custom prompt here
        return f"Your prompt with {script}, {footage_catalog}, etc."
  1. Register in domains/sequencing/strategies/__init__.py:
from domains.sequencing.strategies.my_strategy import MyStrategy

STRATEGIES = {
    "default": DefaultStrategy,
    "footage_aware": FootageAwareStrategy,
    "appeal_first": AppealFirstStrategy,
    "my_strategy": MyStrategy,  # Add here
}
  1. Update CLI choices in main.py (optional, for tab completion):
sequence_parser.add_argument(
    "--strategy",
    choices=["default", "footage_aware", "appeal_first", "my_strategy"],
)

Lottie Overlay

Lottie animations (checkmark, error, warning, etc.) are pre-rendered to ProRes 4444 MOV files with alpha channel for efficient runtime use.

# List available presets
uv run python scripts/test_lottie.py --list-presets

# Load a preset overlay
uv run python scripts/test_lottie.py success

Sound Effects (SFX)

Pre-downloaded sound effects from Freesound for common UI feedback sounds.

# List available presets
uv run python scripts/test_sfx.py --list-presets

# Test a preset
uv run python scripts/test_sfx.py success

Available presets: success, error, warning, loading, whoosh, click

Using SFX in Sequence JSON

{
  "scenes": [
    {
      "scene_id": "s01",
      "sound_effects": [
        {"preset_name": "success", "volume": 0.5, "start_time": 0.5},
        {"preset_name": "whoosh", "volume": 0.3, "start_time": 1.0, "fade_out": 0.2}
      ]
    }
  ]
}

SFX options:

  • preset_name: Sound effect preset name (required)
  • volume: Volume level 0.0-1.0 (default: 0.5)
  • start_time: Start time in seconds (default: 0.0)
  • fade_in: Fade in duration in seconds (default: 0.0)
  • fade_out: Fade out duration in seconds (default: 0.0)

Adding New Lottie Animations

  1. Download Lottie JSON files from LottieFiles or IconScout
  2. Place JSON files in assets/stock/lottie/
  3. Convert to MOV:
# Convert all Lottie JSON files to MOV
uv run python scripts/convert_lottie_to_mov.py -v

# Convert with custom size
uv run python scripts/convert_lottie_to_mov.py -W 1080 -H 1080 -v

# Force re-convert existing files
uv run python scripts/convert_lottie_to_mov.py --force -v

Output MOV files are saved to assets/stock/lottie_mov/ and can be used as overlays in video composition.

Managing Presets

Preset definitions for Lottie overlays and sound effects are stored in assets/presets/. The sequence generator reads these files to include available presets in the generation prompt.

assets/presets/
├── lottie_presets.json   # Lottie animation presets with descriptions
└── sfx_presets.json      # Sound effect presets with descriptions

Each preset includes:

  • description: What the preset is (shown to LLM during sequence generation)
  • use_case: When to use it (helps LLM make appropriate selections)

Syncing Presets

When you add new Lottie animations or sound effects to the providers, sync the preset files:

# Check for differences without modifying files
uv run python scripts/update_presets.py --check

# Sync preset files (adds new presets, removes stale ones)
uv run python scripts/update_presets.py

The script will:

  • Add new presets with placeholder descriptions (edit manually)
  • Remove presets that no longer exist in providers
  • Preserve existing descriptions for unchanged presets

After syncing, edit the JSON files to add meaningful descriptions for new presets.

Project Structure

domains/
├── sequencing/  # Script → Sequence JSON (Gemini 3 Flash)
│   ├── generator.py           # SequenceGenerator with strategy support
│   └── strategies/            # Pluggable sequencing strategies
│       ├── base.py            # SequencingStrategy ABC
│       ├── default.py         # Script-driven (no footage awareness)
│       ├── footage_aware.py   # Script + footage catalog integration
│       └── appeal_first.py    # 2-step: appeal analysis → sequence
├── planning/    # Sequence JSON parsing
├── library/     # Video asset indexing & search + reference ads (Gemini 3 Flash)
│   ├── analyzer.py            # VideoContentAnalyzer for raw footage
│   ├── repository.py          # AssetRepository: index storage + embedding search
│   ├── selector.py            # FootageSelector: LLM-based clip ranking
│   ├── catalog.py             # FootageCatalog for LLM-friendly summaries
│   ├── reference_store.py     # ReferenceStore: collection-based reference management
│   ├── reference_analyzer.py  # Gemini-based reference ad analyzer
│   ├── reference_repository.py# Reference embedding search
│   └── reference_models.py    # ReferenceAdAnalysis, ReferenceAdBeat models
├── studio/      # Content generation
│   ├── tts_generator.py       # Google Cloud TTS (Chirp v3)
│   ├── image_generator.py     # Gemini 3 Pro Image
│   ├── video_generator.py     # Veo 3.1 (us-central1 only)
│   ├── text_renderer.py       # Playwright + FFmpeg text overlays
│   ├── lottie_renderer.py     # Pre-rendered Lottie MOV loader
│   ├── sfx_provider.py        # Pre-downloaded sound effects
│   └── fallback_generator.py  # Black screen fallback
└── editing/     # FFmpeg composition
    ├── composer.py   # Scene orchestration
    ├── renderer.py   # FFmpeg rendering
    └── effects.py    # Camera movements, transitions

infrastructure/
├── config.py    # Configuration management
├── cache.py     # Asset caching
└── metadata.py  # Generation metadata tracking

viewer/
└── server.py    # FastAPI web viewer for sequence editing

scripts/
├── test_tts.py              # TTS testing
├── test_image.py            # Image generation testing
├── test_video.py            # Video generation testing
├── test_text_overlay.py     # Text overlay testing
├── test_lottie.py           # Lottie overlay testing
├── test_sfx.py              # Sound effects testing
├── test_sequence.py         # Sequence generation testing
├── convert_lottie_to_mov.py # Lottie JSON → MOV conversion
├── convert_mov_to_mp4.py    # Format conversion
├── update_presets.py        # Sync preset JSON files from providers
├── download_fb_ads.py       # Download Facebook Ads Library videos
└── analyze_references.py    # Analyze reference ads with Gemini

assets/
├── presets/          # Preset definitions for sequence generation
│   ├── lottie_presets.json
│   └── sfx_presets.json
├── references/       # Reference ads (collection-based management)
│   ├── index.json    # Collection summaries
│   └── collections/  # Per-collection folders
│       └── {slug}/   # collection.json, items/{id}/video.mp4, analysis.json, etc.
└── stock/
    ├── lottie/       # Source Lottie JSON files
    ├── lottie_mov/   # Pre-rendered MOV files (ProRes 4444 with alpha)
    └── sfx/          # Pre-downloaded sound effect MP3 files

Model Configuration

Purpose Model Location
Sequence Generation gemini-3-flash-preview global
Video Analysis gemini-3-flash-preview global
Reference Ad Analysis gemini-3-flash-preview global
Footage Selection gemini-2.5-flash-lite global
Embedding text-embedding-005 global
Image Generation gemini-3-pro-image-preview global
Video Generation veo-3.1-generate-preview us-central1 (required)
TTS Chirp3-HD (ko-KR) -

Footage Selection

When rendering with existing footage (video_type: ugc_centered or mixed), the system uses a two-stage selection process:

Stage 1: Embedding-Based Search

Clips are pre-filtered using semantic similarity:

  • Each clip has an embedding generated from its description, appeal points, and tags
  • Scene requirements (narration + visual prompt + query tags) are embedded as a query
  • Top candidates are retrieved using cosine similarity

Stage 2: LLM-Based Selection

The LLM evaluates the top candidates and selects the best match:

  1. Indexing: Footage is analyzed and tagged with descriptions, appeal points, content types
  2. Selection: For each scene, LLM evaluates candidate clips against scene requirements (narration, visual prompt, tags)
  3. Fallback: If no suitable clip found, falls back to AI image generation

This two-stage approach balances speed (embedding search) with accuracy (LLM reasoning).

Caching

Generated assets are cached by default in assets/generated/.cache/.

# Disable cache for render
uv run python main.py render input.json -o output.mp4 --no-cache

Metadata

Generation metadata (prompts, parameters) is saved in assets/generated/.metadata/ for each generated asset.

Composition metadata is saved alongside output videos as {output}.meta.json, containing:

  • Project settings (resolution, fps, total duration)
  • Per-scene details (sources, effects, rendered file paths)
  • Creation timestamp

About

DDD-based video generation pipeline from sequence planning data

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors