Karaoke Maker

Karaoke Maker is a local, source-first pipeline for turning a song plus lyrics into a karaoke video.

It prepares stems, drafts lyric timings, lets you review and correct those timings in a browser editor, and renders the final video with the bundled video-gen/ Remotion app.

Status

Current release target: open-source v1
Supported platform: macOS only
Distribution model: clone the repo and run it locally
Primary interface: uv run karaoke-maker ...

Toolchain

Python 3.10
uv
Node.js 24.14.0
npm 11.9.0
FFmpeg / ffprobe on PATH

Quickstart With The Synthetic Smoke Demo

Clone the repository:

git clone https://github.com/nitsh/karaoke-maker.git
cd karaoke-maker

Install dependencies:
```
uv sync
npm --prefix video-gen install
```
Run the demo pipeline:

uv run karaoke-maker generate \
  --audio examples/demo/song.wav \
  --lyrics examples/demo/lyrics.txt \
  --alignment-provider detect-cues \
  --skip-render

This stages a complete demo run without invoking the final Remotion render. Generated artifacts will appear under:

output/alignment/
video-gen/public/

If you want the final video too, rerun the same command without --skip-render.

Bundled Public-Domain Song Demos

If you want a recognizable song instead of the tiny synthetic smoke test, the repository also ships two public-domain demo packs under examples/demo/real-songs/:

twinkle-twinkle-little-star
mary-had-a-little-lamb

Each pack includes:

song.wav
vocals.wav
instruments.wav
lyrics.txt
manual_aligned_lyrics.txt

Example:

uv run karaoke-maker generate \
  --audio examples/demo/real-songs/twinkle-twinkle-little-star/song.wav \
  --lyrics examples/demo/real-songs/twinkle-twinkle-little-star/lyrics.txt \
  --alignment-provider detect-cues \
  --skip-render

These recordings were synthesized locally from public-domain melodies and lyrics, so the demo stays source-first and rights-clean.

Use Your Own Song

You can either place files at the default runtime locations:

input/song.mp3
input/lyrics.txt

or pass explicit paths:

uv run karaoke-maker generate --audio /path/to/song.wav --lyrics /path/to/lyrics.txt

Workflow

Prepare assets

uv run karaoke-maker prepare \
  --audio examples/demo/song.wav \
  --lyrics examples/demo/lyrics.txt \
  --alignment-provider detect-cues

This command:

runs Demucs to split vocals and instrumentals
creates an alignment draft
stages review files into video-gen/public/

Review timings

uv run karaoke-maker review

Then open:

http://127.0.0.1:8766

Saving from the editor writes:

input/manual_aligned_lyrics.txt
video-gen/public/manual_aligned_lyrics.json
output/alignment/corrections/<timestamp>.json

Publish manual text edits

If you edit input/manual_aligned_lyrics.txt directly:

uv run karaoke-maker publish

Generate the final video

uv run karaoke-maker generate

Timing source behavior:

--timings-source auto prefers reviewed manual timings when input/manual_aligned_lyrics.txt exists, otherwise falls back to the latest draft
--timings-source manual requires reviewed manual timings
--timings-source draft forces the latest draft timing JSON

Command Reference

`prepare`

uv run karaoke-maker prepare

Useful flags:

--audio
--lyrics
--skip-demucs
--skip-aligner
--alignment-provider {auto,elevenlabs,stable-ts,detect-cues,whisperx}
--language

`review`

uv run karaoke-maker review

`publish`

uv run karaoke-maker publish

`generate`

uv run karaoke-maker generate

Useful flags:

--timings-source {auto,manual,draft}
--skip-render

Alignment Providers

Provider	Mode	When to use it	Notes
`auto`	fallback chain	Default	Tries providers in the configured order and uses the first one available.
`elevenlabs`	forced alignment	Best remote quality when you have API access	Requires `ELEVENLABS_API_KEY`.
`stable-ts`	local forced alignment	Useful for local alignment with reference lyrics	Requires the local `stable-ts` / `stable_whisper` dependency in the environment.
`detect-cues`	local heuristic timing	Fast fallback for predictable line-level timings	Uses silence detection on vocals-only audio.
`whisperx`	local transcription + timestamps	Fallback when reference lyrics are unavailable or forced alignment is unavailable	Requires the local `whisperx` dependency in the environment.

Generated Files

The pipeline writes runtime artifacts to:

output/alignment/alignment_result.json
output/alignment/manual_draft.txt
output/alignment/manual_draft.json
video-gen/public/aligned_lyrics.json
video-gen/public/alignment_review.json
video-gen/public/manual_aligned_lyrics.json
video-gen/public/render_meta.json
video-gen/public/instruments.wav
video-gen/out/video.mov

input/ and output/ are runtime directories. Bundled demo assets live under examples/demo/, including the tiny synthetic smoke clip and the public-domain song packs in examples/demo/real-songs/.

Development

Canonical Python source lives in src/karaoke_maker/. Root files such as main.py, alignment_editor.py, detect_cues.py, and manual_aligned_lyrics_parser.py are compatibility shims only.

Validation commands:

uv run pytest
uv run ruff check .
npm --prefix video-gen test

Additional implementation notes live in docs/architecture.md.

Limitations

macOS is the only validated platform for this release.
The pipeline depends on local multimedia tooling and can be heavy on CPU.
Automatic timings are still best treated as a draft. Manual review is expected for polished output.
Provider availability depends on your local environment and credentials.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
.github		.github
alignment		alignment
docs		docs
editor		editor
examples/demo		examples/demo
input		input
output		output
src/karaoke_maker		src/karaoke_maker
tests		tests
video-gen		video-gen
.gitignore		.gitignore
.node-version		.node-version
.python-version		.python-version
AGENTS.md		AGENTS.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
alignment_editor.py		alignment_editor.py
detect_cues.py		detect_cues.py
main.py		main.py
manual_aligned_lyrics_parser.py		manual_aligned_lyrics_parser.py
pyproject.toml		pyproject.toml
utils.py		utils.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Karaoke Maker

Status

Toolchain

Quickstart With The Synthetic Smoke Demo

Bundled Public-Domain Song Demos

Use Your Own Song

Workflow

Prepare assets

Review timings

Publish manual text edits

Generate the final video

Command Reference

`prepare`

`review`

`publish`

`generate`

Alignment Providers

Generated Files

Development

Limitations

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Karaoke Maker

Status

Toolchain

Quickstart With The Synthetic Smoke Demo

Bundled Public-Domain Song Demos

Use Your Own Song

Workflow

Prepare assets

Review timings

Publish manual text edits

Generate the final video

Command Reference

prepare

review

publish

generate

Alignment Providers

Generated Files

Development

Limitations

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`prepare`

`review`

`publish`

`generate`

Packages