Skip to content

Qwen3-tts.cpp fork with saving/loading Speaker Embeddings and interactive mode #12

@Topping1

Description

@Topping1

@predict-woo
Qwen3-tts.cpp fork with saving/loading Speaker Embeddings and interactive mode

you can precompute speaker embeddings (and load them later) to speed up speech generations. You can use an interactive mode to test a voice with many prompts.

Usage: ./build/qwen3-tts-cli [options] -m <model_dir>

Options:
-m, --model

Model directory (required)
-t, --text Text to synthesize (required unless interactive or saving speaker)
-i, --interactive Run in interactive loop mode (load once, generate many)
-o, --output Output WAV file (default: output.wav)
-r, --reference Reference audio for voice cloning
-s, --speaker Load precomputed speaker embedding (.spk)
--save-speaker Extract embedding from -r and save to file
--temperature Sampling temperature (default: 0.9, 0=greedy)
--top-k Top-k sampling (default: 50, 0=disabled)
--top-p Top-p sampling (default: 1.0)
--max-tokens Maximum audio tokens (default: 4096)
--repetition-penalty Repetition penalty (default: 1.05)
-l, --language Language: en,ru,zh,ja,ko,de,fr,es (default: en)
-j, --threads Number of threads (default: 4)
-h, --help Show this help

Example:
./build/qwen3-tts-cli -m ./models -t "Hello, world!" -o hello.wav
./build/qwen3-tts-cli -m ./models -i -r reference.wav -o output.wav
./build/qwen3-tts-cli -m ./models -r ref.wav --save-speaker voice.spk
./build/qwen3-tts-cli -m ./models -s voice.spk -t "Hello, world!" -o output.wav

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions