Add words to transcript via ElevenLabs TTS + freeze frame

## Idea

Use ElevenLabs Text-to-Speech to **add new words** to the transcript, not just remove them. When the user types new words into the transcript, generate audio for those words during export.

## Video sync challenge

Added words have no matching video footage. Two approaches:

1. **Freeze frame** — hold the last frame of the preceding clip while the added words play. Doable with FFmpeg `tpad` or `loop` filters. Looks like a brief pause in the video.
2. **Audio-only insert** — only works for podcast/audio-first content.

Freeze frame is the more practical approach.

## Flow

1. User adds words in the transcript editor (new editing capability needed)
2. During export, detect which words are "added" (no original audio)
3. Generate audio for added words via ElevenLabs TTS API
4. At each insertion point, freeze the last frame of the preceding clip
5. Stitch everything together with FFmpeg

## Complexity

This is a significant expansion — requires:
- Transcript editing (currently only deletion is supported)
- Per-segment TTS generation with the selected voice
- Timeline manipulation to insert freeze frames at precise timestamps
- Matching the voice/tone of the surrounding audio

## Notes

- Came up during ElevenLabs voice recreation implementation
- The current STS flow replaces the entire audio track; this would need per-word generation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add words to transcript via ElevenLabs TTS + freeze frame #1

Idea

Video sync challenge

Flow

Complexity

Notes

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Add words to transcript via ElevenLabs TTS + freeze frame #1

Description

Idea

Video sync challenge

Flow

Complexity

Notes

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions