| title | BilingualSub |
|---|---|
| emoji | 🎥 |
| colorFrom | blue |
| colorTo | indigo |
| sdk | docker |
| app_port | 7860 |
| pinned | false |
Automated bilingual subtitle generator for YouTube videos with high-quality LLM translation.
- Download YouTube videos with yt-dlp
- Automatic speech recognition using Groq Whisper (whisper-large-v3-turbo)
- High-quality LLM translation via Agno framework (default: groq:openai/gpt-oss-120b)
- Bilingual subtitle output in SRT and ASS formats
- Optional subtitle burn-in with hardware acceleration (VideoToolbox on macOS)
- Real-time progress tracking via SSE
- Web-based UI with i18n support (English/繁體中文)
- Job-based async architecture with in-memory storage
docker build -t bilingualsub . && docker run -p 7860:7860 -e GROQ_API_KEY=your_key_here bilingualsubThen open http://localhost:7860 in your browser.
This path is optional. Use it only when you want translations to go through a local CLIProxyAPI container backed by your own Antigravity/Codex/Claude OAuth login. The regular Docker flow above does not require CLIProxyAPI.
For Antigravity/agy, CLIProxyAPI can route requests out of the box once the
host has OAuth credentials. Install CLIProxyAPI on the host and log in. This
creates OAuth token files under ~/.cli-proxy-api, which are mounted read/write
into the proxy container:
cliproxyapi -antigravity-loginCreate a local .env from the example and set at least GROQ_API_KEY:
cp .env.example .envFor the compose setup, use an OpenAI-compatible proxy model:
TRANSLATOR_MODEL=openai:bilingualsub-gemini-flash
# Optional: set only when your auth directory is not ~/.cli-proxy-api
# CLIPROXY_AUTH_DIR=/absolute/path/to/.cli-proxy-apiThen start both services:
docker compose up --buildBilingualSub runs at http://localhost:7860. It talks to CLIProxyAPI through the
compose network at http://cli-proxy:8317/v1, so OAuth tokens are never baked
into the image or committed to the repository.
The proxy port is bound to 127.0.0.1 only; the compose stack uses the fixed
local bearer key bilingualsub-local internally.
The default alias maps to Antigravity's gemini-3.5-flash-low, which is the
most consistently discoverable Flash variant in current CLIProxyAPI releases.
If the alias does not exist in your version, list the available proxy models
and set TRANSLATOR_MODEL=openai:<model-id> in .env:
curl -H "Authorization: Bearer bilingualsub-local" http://localhost:8317/v1/modelsPrerequisites: Python 3.11+, FFmpeg, Node.js 18+, pnpm
# 1. Install backend dependencies
uv sync --dev --extra e2e
# 2. Install frontend dependencies
cd frontend && pnpm install
# 3. Start backend server (in one terminal)
uv run uvicorn bilingualsub.api.app:create_app --factory --reload
# 4. Start frontend dev server (in another terminal)
cd frontend && pnpm devBackend runs at http://localhost:8000, frontend at http://localhost:5173.
| Environment Variable | Description | Default | Required |
|---|---|---|---|
GROQ_API_KEY |
Groq API key for Whisper transcription | - | Yes |
OPENAI_API_KEY |
OpenAI API key (only if using OpenAI provider) | - | No |
TRANSCRIBER_PROVIDER |
Transcription provider | groq |
No |
TRANSCRIBER_MODEL |
Whisper model to use | whisper-large-v3-turbo |
No |
TRANSLATOR_MODEL |
LLM model for translation | groq:openai/gpt-oss-120b |
No |
OPENAI_BASE_URL |
OpenAI-compatible proxy URL | - | No |
YouTube URL → Download (yt-dlp) → Extract Audio (FFmpeg) → Transcribe (Groq Whisper) →
Translate (LLM via Agno) → Bilingual Subtitles (SRT/ASS) → Optional Burn-in (FFmpeg)
Backend: FastAPI with job-based async architecture. Jobs are created via POST /api/jobs, processed in the background, and stream progress updates through GET /api/jobs/{id}/events using Server-Sent Events (SSE). Job data is stored in-memory with a 30-minute TTL.
Frontend: React SPA built with Vite 7. State management via useJob hook (idle → submitting → processing → completed/failed). API communication handled by ApiClient singleton with REST and SSE support. Internationalization via i18next.
| Backend | Frontend |
|---|---|
| FastAPI | Vite 7 |
| Python 3.11+ | React 19 |
| yt-dlp | TypeScript 5.9 |
| FFmpeg | Tailwind CSS 4 |
| Groq Whisper | i18next |
| Agno (LLM framework) | pnpm |