@predict-woo
Qwen3-tts.cpp fork with saving/loading Speaker Embeddings and interactive mode
you can precompute speaker embeddings (and load them later) to speed up speech generations. You can use an interactive mode to test a voice with many prompts.
Usage: ./build/qwen3-tts-cli [options] -m <model_dir>
Options:
-m, --model
Model directory (required)
-t, --text Text to synthesize (required unless interactive or saving speaker)
-i, --interactive Run in interactive loop mode (load once, generate many)
-o, --output Output WAV file (default: output.wav)
-r, --reference Reference audio for voice cloning
-s, --speaker Load precomputed speaker embedding (.spk)
--save-speaker Extract embedding from -r and save to file
--temperature Sampling temperature (default: 0.9, 0=greedy)
--top-k Top-k sampling (default: 50, 0=disabled)
--top-p Top-p sampling (default: 1.0)
--max-tokens Maximum audio tokens (default: 4096)
--repetition-penalty Repetition penalty (default: 1.05)
-l, --language Language: en,ru,zh,ja,ko,de,fr,es (default: en)
-j, --threads Number of threads (default: 4)
-h, --help Show this help
Example:
./build/qwen3-tts-cli -m ./models -t "Hello, world!" -o hello.wav
./build/qwen3-tts-cli -m ./models -i -r reference.wav -o output.wav
./build/qwen3-tts-cli -m ./models -r ref.wav --save-speaker voice.spk
./build/qwen3-tts-cli -m ./models -s voice.spk -t "Hello, world!" -o output.wav
@predict-woo
Qwen3-tts.cpp fork with saving/loading Speaker Embeddings and interactive mode
you can precompute speaker embeddings (and load them later) to speed up speech generations. You can use an interactive mode to test a voice with many prompts.
Usage: ./build/qwen3-tts-cli [options] -m <model_dir>
Options:
Model directory (required)-m, --model
-t, --text Text to synthesize (required unless interactive or saving speaker)
-i, --interactive Run in interactive loop mode (load once, generate many)
-o, --output Output WAV file (default: output.wav)
-r, --reference Reference audio for voice cloning
-s, --speaker Load precomputed speaker embedding (.spk)
--save-speaker Extract embedding from -r and save to file
--temperature Sampling temperature (default: 0.9, 0=greedy)
--top-k Top-k sampling (default: 50, 0=disabled)
--top-p Top-p sampling (default: 1.0)
--max-tokens Maximum audio tokens (default: 4096)
--repetition-penalty Repetition penalty (default: 1.05)
-l, --language Language: en,ru,zh,ja,ko,de,fr,es (default: en)
-j, --threads Number of threads (default: 4)
-h, --help Show this help
Example:
./build/qwen3-tts-cli -m ./models -t "Hello, world!" -o hello.wav
./build/qwen3-tts-cli -m ./models -i -r reference.wav -o output.wav
./build/qwen3-tts-cli -m ./models -r ref.wav --save-speaker voice.spk
./build/qwen3-tts-cli -m ./models -s voice.spk -t "Hello, world!" -o output.wav