Skip to content

feat(transcribe): add local Whisper transcription helper#60

Open
abman4444 wants to merge 1 commit into
browser-use:mainfrom
abman4444:add-local-whisper-transcriber
Open

feat(transcribe): add local Whisper transcription helper#60
abman4444 wants to merge 1 commit into
browser-use:mainfrom
abman4444:add-local-whisper-transcriber

Conversation

@abman4444

@abman4444 abman4444 commented Jun 8, 2026

Copy link
Copy Markdown

What

Adds helpers/transcribe_whisper.py — a drop-in alternative to the ElevenLabs Scribe transcribe.py helper for users who don't have an API key or prefer a free, fully offline option.

Why

The existing pipeline requires an ElevenLabs API key for transcription. Users on free-tier accounts or those who want a zero-cost setup have no fallback. Local Whisper fills that gap.

How it works

  • Uses openai-whisper with word_timestamps=True to produce word-level timestamps
  • Output JSON matches the Scribe envelope shape (words, text, language_code, alignment) so pack_transcripts.py, render.py, and the rest of the pipeline work unchanged
  • Caches transcripts per source file — skips re-transcription on repeat runs (same behaviour as Scribe helper)
  • Extracts mono 16kHz WAV via ffmpeg before passing to Whisper (same as the Scribe path)
  • Supports all Whisper model sizes (tinylarge); defaults to medium (good balance of speed and accuracy for English)

Usage

# Basic
python helpers/transcribe_whisper.py my_video.mp4

# Custom model and language
python helpers/transcribe_whisper.py my_video.mp4 --model large --language en

# Custom output directory
python helpers/transcribe_whisper.py my_video.mp4 --edit-dir /path/to/edit

Reviewer notes

  • No changes to existing files — purely additive
  • Requires openai-whisper (pip install openai-whisper) and ffmpeg on PATH, both already listed as setup dependencies
  • Word-level timestamp accuracy is slightly lower than Scribe (Whisper drifts ~50–100ms) but within the cut-padding working window defined in SKILL.md

Summary by cubic

Add a local Whisper transcription helper that outputs Scribe-compatible JSON, enabling offline, zero-API-key transcription without changing the rest of the pipeline. Adds caching and model selection for better control and speed.

  • New Features

    • Added helpers/transcribe_whisper.py using openai-whisper with word_timestamps=True.
    • Emits Scribe-compatible JSON (words, text, language_code, alignment) so pack_transcripts.py and render.py work unchanged.
    • Extracts mono 16 kHz WAV via ffmpeg before transcription.
    • Caches transcripts per source file to skip repeat runs.
    • Supports all Whisper models (tinylarge); defaults to medium. Optional --language.
  • Dependencies

    • Requires openai-whisper and ffmpeg available on PATH.

Written for commit ff6884c. Summary will update on new commits.

Review in cubic

Adds helpers/transcribe_whisper.py — a drop-in alternative to the
ElevenLabs Scribe transcribe.py helper for users who don't have an
API key or prefer a free, offline option.

- Uses openai-whisper with word_timestamps=True to produce word-level
  timestamps matching the Scribe JSON envelope shape, so pack_transcripts.py,
  render.py, and the rest of the pipeline work unchanged
- Caches transcripts per source (skips re-transcription on repeat runs)
- Supports all Whisper model sizes (tiny → large); defaults to medium
- Extracts mono 16kHz WAV via ffmpeg before transcription (same as Scribe path)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 1 file

Re-trigger cubic

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant