Skip to content

nitsh/karaoke-maker

Karaoke Maker

Karaoke Maker is a local, source-first pipeline for turning a song plus lyrics into a karaoke video.

It prepares stems, drafts lyric timings, lets you review and correct those timings in a browser editor, and renders the final video with the bundled video-gen/ Remotion app.

Status

  • Current release target: open-source v1
  • Supported platform: macOS only
  • Distribution model: clone the repo and run it locally
  • Primary interface: uv run karaoke-maker ...

Toolchain

  • Python 3.10
  • uv
  • Node.js 24.14.0
  • npm 11.9.0
  • FFmpeg / ffprobe on PATH

Quickstart With The Synthetic Smoke Demo

  1. Clone the repository:

    git clone https://github.com/nitsh/karaoke-maker.git
    cd karaoke-maker
  2. Install dependencies:

    uv sync
    npm --prefix video-gen install
  3. Run the demo pipeline:

uv run karaoke-maker generate \
  --audio examples/demo/song.wav \
  --lyrics examples/demo/lyrics.txt \
  --alignment-provider detect-cues \
  --skip-render

This stages a complete demo run without invoking the final Remotion render. Generated artifacts will appear under:

  • output/alignment/
  • video-gen/public/

If you want the final video too, rerun the same command without --skip-render.

Bundled Public-Domain Song Demos

If you want a recognizable song instead of the tiny synthetic smoke test, the repository also ships two public-domain demo packs under examples/demo/real-songs/:

  • twinkle-twinkle-little-star
  • mary-had-a-little-lamb

Each pack includes:

  • song.wav
  • vocals.wav
  • instruments.wav
  • lyrics.txt
  • manual_aligned_lyrics.txt

Example:

uv run karaoke-maker generate \
  --audio examples/demo/real-songs/twinkle-twinkle-little-star/song.wav \
  --lyrics examples/demo/real-songs/twinkle-twinkle-little-star/lyrics.txt \
  --alignment-provider detect-cues \
  --skip-render

These recordings were synthesized locally from public-domain melodies and lyrics, so the demo stays source-first and rights-clean.

Use Your Own Song

You can either place files at the default runtime locations:

  • input/song.mp3
  • input/lyrics.txt

or pass explicit paths:

uv run karaoke-maker generate --audio /path/to/song.wav --lyrics /path/to/lyrics.txt

Workflow

Prepare assets

uv run karaoke-maker prepare \
  --audio examples/demo/song.wav \
  --lyrics examples/demo/lyrics.txt \
  --alignment-provider detect-cues

This command:

  1. runs Demucs to split vocals and instrumentals
  2. creates an alignment draft
  3. stages review files into video-gen/public/

Review timings

uv run karaoke-maker review

Then open:

http://127.0.0.1:8766

Saving from the editor writes:

  • input/manual_aligned_lyrics.txt
  • video-gen/public/manual_aligned_lyrics.json
  • output/alignment/corrections/<timestamp>.json

Publish manual text edits

If you edit input/manual_aligned_lyrics.txt directly:

uv run karaoke-maker publish

Generate the final video

uv run karaoke-maker generate

Timing source behavior:

  • --timings-source auto prefers reviewed manual timings when input/manual_aligned_lyrics.txt exists, otherwise falls back to the latest draft
  • --timings-source manual requires reviewed manual timings
  • --timings-source draft forces the latest draft timing JSON

Command Reference

prepare

uv run karaoke-maker prepare

Useful flags:

  • --audio
  • --lyrics
  • --skip-demucs
  • --skip-aligner
  • --alignment-provider {auto,elevenlabs,stable-ts,detect-cues,whisperx}
  • --language

review

uv run karaoke-maker review

publish

uv run karaoke-maker publish

generate

uv run karaoke-maker generate

Useful flags:

  • --timings-source {auto,manual,draft}
  • --skip-render

Alignment Providers

Provider Mode When to use it Notes
auto fallback chain Default Tries providers in the configured order and uses the first one available.
elevenlabs forced alignment Best remote quality when you have API access Requires ELEVENLABS_API_KEY.
stable-ts local forced alignment Useful for local alignment with reference lyrics Requires the local stable-ts / stable_whisper dependency in the environment.
detect-cues local heuristic timing Fast fallback for predictable line-level timings Uses silence detection on vocals-only audio.
whisperx local transcription + timestamps Fallback when reference lyrics are unavailable or forced alignment is unavailable Requires the local whisperx dependency in the environment.

Generated Files

The pipeline writes runtime artifacts to:

  • output/alignment/alignment_result.json
  • output/alignment/manual_draft.txt
  • output/alignment/manual_draft.json
  • video-gen/public/aligned_lyrics.json
  • video-gen/public/alignment_review.json
  • video-gen/public/manual_aligned_lyrics.json
  • video-gen/public/render_meta.json
  • video-gen/public/instruments.wav
  • video-gen/out/video.mov

input/ and output/ are runtime directories. Bundled demo assets live under examples/demo/, including the tiny synthetic smoke clip and the public-domain song packs in examples/demo/real-songs/.

Development

Canonical Python source lives in src/karaoke_maker/. Root files such as main.py, alignment_editor.py, detect_cues.py, and manual_aligned_lyrics_parser.py are compatibility shims only.

Validation commands:

uv run pytest
uv run ruff check .
npm --prefix video-gen test

Additional implementation notes live in docs/architecture.md.

Limitations

  • macOS is the only validated platform for this release.
  • The pipeline depends on local multimedia tooling and can be heavy on CPU.
  • Automatic timings are still best treated as a draft. Manual review is expected for polished output.
  • Provider availability depends on your local environment and credentials.

About

Karaoke Maker is a local pipeline for turning a song plus lyrics into a karaoke video. The root Python package prepares stems, alignment drafts, and review artifacts. The nested video-gen/ Remotion app renders the final video from staged assets.

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors