Translate movie subtitles into Albanian using any LLM. Built because we couldn't find English subs for a 2004 Italian comedy at 3 AM in January.
I wanted to watch Christmas in Love (2004) โ a classic Italian cinepanettone with Boldi & De Sica. The movie is in Italian. English subtitles? Don't exist. Albanian subtitles? Forget about it.
This isn't a one-off problem. Thousands of movies โ Italian, Turkish, Greek, Indian โ are loved by Albanian audiences but have zero Albanian subtitle coverage. The existing subtitle databases (OpenSubtitles, Subscene, Podnapisi) have virtually nothing in Albanian. What does exist is often machine-translated garbage that misses cultural context, humor, and natural speech.
Albanian is one of the most underserved languages in the subtitle ecosystem.
The implications go beyond just watching movies:
- Albanian diaspora (estimated 10M+ worldwide) consumes foreign media daily with no subtitle support
- Albanian film education suffers โ students can't study foreign cinema in their language
- Cultural accessibility โ older generations who don't speak English are locked out of global entertainment
- The Albanian art scene โ directors, screenwriters, and filmmakers lose exposure to international storytelling techniques when they can't access foreign films with quality translations
AlbSub is a CLI pipeline that takes subtitle files (.srt) in any source language and produces high-quality Albanian translations using LLMs. Not Google Translate. Not a lookup table. Actual contextual, natural, colloquial Albanian โ the kind that sounds like a human translator wrote it.
- ๐ Multi-language input โ Italian, English, Turkish, Greek, French, German, Spanish, and more โ Albanian
- ๐ค Any LLM backend โ OpenAI, Anthropic, local Ollama models, or any OpenAI-compatible API
- ๐ Live progress tracking โ real-time progress bar with ETA, blocks translated, speed
- โ Line validation โ automatically checks that every block has the correct number of lines (no dropped second lines, no truncated dialogue)
- ๐ Batch processing โ translates in configurable batches for speed and reliability
- ๐ Auto-retry โ failed blocks are automatically retried with exponential backoff
- ๐ SRT-aware โ preserves timestamps, HTML tags (
<i>,<b>), speaker labels ([Name]), and subtitle formatting - ๐ญ Context-aware โ sends surrounding blocks as context so the LLM understands the scene, not just isolated lines
- ๐ Validation report โ post-translation report showing block count match, line count match, empty block detection
- โก Parallel workers โ configurable concurrency for faster translation
โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ
โ Input .srt โโโโโโถโ SRT Parser โโโโโโถโ Batch Chunker โโโโโโถโ LLM Workers โ
โ (any lang) โ โ (validate) โ โ (configurable) โ โ (parallel) โ
โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ โโโโโโโโฌโโโโโโโโ
โ
โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ โ
โ Output .srt โโโโโโโ Validator โโโโโโโโโโโโโโ
โ (Albanian) โ โ (line matching) โ
โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ
- Parse โ Read .srt, extract blocks (number, timestamp, text lines)
- Detect language โ Auto-detect source language or accept user override
- Chunk โ Group blocks into batches (default: 50 blocks per batch)
- Translate โ Send each batch to the configured LLM with:
- System prompt enforcing Albanian translation rules
- Context window (previous 3 blocks for continuity)
- Strict instruction to preserve line count per block
- Validate โ For each translated block:
- Line count matches original โ
- No empty lines where original had text โ
- HTML tags preserved โ
- Speaker labels preserved โ
- Timestamps unchanged โ
- Retry โ Any failed validation โ re-translate that block with explicit error feedback
- Assemble โ Write validated blocks to output .srt
- Report โ Print summary: total blocks, pass rate, any remaining issues
# Basic usage โ Italian to Albanian using Claude
albsub translate movie.ita.srt -o movie.alb.srt --language it --provider anthropic
# Using OpenAI
albsub translate movie.srt -o movie.alb.srt --language en --provider openai --model gpt-4o
# Using local Ollama model
albsub translate movie.srt -o movie.alb.srt --language tr --provider ollama --model llama3
# With config file
albsub translate movie.srt -o movie.alb.srt --language el --config albsub.config.yml
# Parallel workers for speed
albsub translate movie.srt -o movie.alb.srt --language it --workers 4
# Validate an existing translation
albsub validate original.srt translated.srt
# Dry run โ show what would be translated without calling the API
albsub translate movie.srt -o movie.alb.srt --language it --dry-run# albsub.config.yml
provider: anthropic # anthropic | openai | ollama | custom
model: claude-sonnet-4-20250514 # any model the provider supports
api_key: ${ANTHROPIC_API_KEY} # env var reference
base_url: null # custom endpoint (for ollama, vllm, etc.)
translation:
target: sq # Albanian (ISO 639-1)
batch_size: 50 # blocks per API call
context_window: 3 # surrounding blocks for context
workers: 2 # parallel translation workers
max_retries: 3 # retry failed blocks
validation:
strict_line_count: true # enforce matching line counts
check_empty: true # flag empty translations
check_tags: true # verify HTML tag preservation
check_labels: true # verify speaker label preservation
style:
formality: colloquial # colloquial | neutral | formal
dialect: standard # standard | gheg | tosk
preserve_slang: true # attempt to find Albanian equivalents for slang| Language | Code | Quality |
|---|---|---|
| Italian | it |
โญโญโญโญโญ (tested extensively) |
| English | en |
โญโญโญโญโญ |
| Turkish | tr |
โญโญโญโญ |
| Greek | el |
โญโญโญโญ |
| French | fr |
โญโญโญโญ |
| German | de |
โญโญโญโญ |
| Spanish | es |
โญโญโญโญ |
| Serbian | sr |
โญโญโญโญ |
| Arabic | ar |
โญโญโญ |
| Hindi | hi |
โญโญโญ |
Quality depends on the LLM's training data for that language pair. Italian/English โ Albanian works best since most LLMs have strong coverage of all three.
The #1 problem with LLM subtitle translation is dropped lines. A 2-line subtitle block comes back as 1 line, losing half the dialogue. AlbSub solves this:
Original (Italian): Bad Translation: AlbSub Output:
โโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโ
[Guido] <i>Questo sono io,</i> [Guido] <i>This is me,</i> [Guido] <i>Ky jam unรซ,</i>
<i>Guido Baldi. Ho 54 anni.</i> (LINE MISSING!) <i>Guido Baldi. Jam 54 vjeรง.</i>
Every block is validated post-translation. If line counts don't match, the block is automatically re-sent to the LLM with an explicit correction prompt. This runs up to 3 times before flagging it for manual review.
Google Translate for subtitles gives you:
- โ Literal word-for-word translation
- โ No understanding of humor, sarcasm, or cultural context
- โ Formal register when the character is being casual
- โ No awareness that this is dialogue, not a document
LLMs give you:
- โ Natural, conversational Albanian
- โ Humor and cultural references adapted (not just translated)
- โ Correct register โ casual when characters are casual, formal when formal
- โ Context from surrounding dialogue
- โ Understanding of speaker labels and scene context
January, 3 AM. I wanted to watch Christmas in Love (2004) โ a Boldi & De Sica Italian Christmas comedy. The movie exists in Italian. English subtitles? Scraped the entire internet โ OpenSubtitles, Subscene, Podnapisi, SubDL, obscure forums โ nothing. Found Italian .srt files, ran them through a translation pipeline I built on the spot, and had English subs in 15 minutes.
Then I thought: if English subs don't exist for a popular Italian comedy, what about Albanian? Albanian subtitles are virtually nonexistent for foreign films. Millions of Albanian speakers worldwide consuming Turkish dramas, Italian comedies, Greek films โ all without subtitle support.
That's how AlbSub was born. A tool that can take any .srt file in any language and produce quality Albanian subtitles using the LLM of your choice.
PRs welcome. Especially:
- New language pair testing and quality reports
- Albanian dialect support (Gheg/Tosk)
- Performance optimizations
- Additional LLM provider integrations
All translations below were generated by AlbSub using GPT-4o with default settings (batch size 25, context window 3, temperature 0.3). 100% validation pass rate on all runs.
Classic cinepanettone with Boldi & De Sica. The film that started this whole project.
| # | ๐ฎ๐น Italian (Original) | ๐ฆ๐ฑ Albanian (AlbSub) |
|---|---|---|
| 4 | Questo sono io, Guido Baldi. Ho 54 anni. | Ky jam unรซ, Guido Baldi. Kam 54 vjeรง. |
| 11 | e una moglie splendida, mai tradita. Finchรฉ non รจ arrivata lei. | dhe njรซ grua e mrekullueshme, kurrรซ e tradhtuar. Derisa erdhi ajo. |
| 12 | Sofia, russa di Siberia, 25 anni, bella da far paura! | Sofia, ruse nga Siberia, 25 vjeรง, e bukur sa tรซ tremb! |
| 16 | Mi sono innamorato di lei come un bimbo. | U dashurova me tรซ si njรซ fรซmijรซ. |
| 18 | - Tieni, amore. - Cos'รจ? | - Ja, dashuri. - รfarรซ รซshtรซ? |
| 19 | - Buon compleanno! L'ho ricordato. | - Gรซzuar ditรซlindjen! E mbajta mend. |
| # | ๐ฌ๐ง English (Original) | ๐ฆ๐ฑ Albanian (AlbSub) |
|---|---|---|
| 1 | Good morning everyone! | Mirรซmรซngjes tรซ gjithรซve! |
| 3 | I can't believe this happened. | Nuk mund ta besoj qรซ ndodhi kjo. |
| 5 | I went to the market with my mother. | Shkova nรซ treg me mamanรซ time. |
| 9 | Life is beautiful, but also difficult. | Jeta รซshtรซ e bukur, por edhe e vรซshtirรซ. |
| 14 | [Julia] I really hope so. With all my heart. | [Julia] Shpresoj shumรซ. Me gjithรซ zemรซr. |
| 16 | Don't forget the keys! | Mos harro รงelรซsat! |
Same dialogue translated from both Italian and English sources. Shows AlbSub produces consistent Albanian regardless of source language.
| ๐ฎ๐น Italian | ๐ฌ๐ง English | ๐ฆ๐ฑ from Italian | ๐ฆ๐ฑ from English |
|---|---|---|---|
| Buongiorno a tutti! | Good morning everyone! | Mirรซmรซngjes tรซ gjithรซve! | Mirรซmรซngjes tรซ gjithรซve! |
| Come stai oggi? Tutto bene? | How are you today? Everything okay? | Si je sot? รdo gjรซ mirรซ? | Si jeni sot? Gjithรงka nรซ rregull? |
| Sono andato al mercato con mia madre. | I went to the market with my mother. | Shkova nรซ treg me mamin. | Shkova nรซ treg me mamanรซ time. |
| Grazie di tutto, amico mio. | Thank you for everything, my friend. | Faleminderit pรซr gjithรงka, miku im. | Faleminderit pรซr gjithรงka, miku im. |
โ Consistent meaning across source languages ยท โ Natural phrasing variation ยท โ Speaker labels & HTML tags preserved
Let's be real: no AI will ever match a native Albanian speaker. AlbSub gets you 90% of the way there โ fast. It handles structure, context, formatting, and produces surprisingly natural Albanian. But it's still an LLM at the end of the day. It might mix up gender (Kjo vs Ky), pick a slightly awkward phrasing, or miss a cultural nuance that only a native would catch.
The point isn't to replace human translators. It's to give you working subtitles in minutes when there's no human translator available โ which, for Albanian, is almost always. Watch the movie tonight, not next month.
For production-quality subtitles, run AlbSub first, then have a native speaker do a quick pass. You'll save hours compared to translating from scratch.
MIT
Made with ๐ฅ by Irdi Zeneli
Because every language deserves subtitles.