Voice Morpher

Local-first voice conversion and voice cloning toolkit with a Gradio WebUI, pluggable model backends, and model download management.

Voice Morpher is built for local demos on Apple Silicon first. It does not bundle model weights or lock the app to one model. Instead, it provides a small application layer around open-source speech models such as Seed-VC, CosyVoice3, Qwen3 TTS / MLX, Chatterbox, F5-TTS, OpenVoice, IndexTTS, and XTTS.

What It Does

Convert source audio into a target speaker's timbre while preserving source rhythm and pauses as much as possible.
Generate cloned speech from text and a target speaker reference audio.
Download supported Hugging Face or ModelScope model snapshots from the WebUI.
Clone reusable voice profiles from reference audio in the WebUI.
Use built-in WebUI flows first, with command-template backends kept for advanced setups.
Keep runtime data, external model repos, and downloaded weights outside the source tree.

Current Status

This is an MVP. The application flow, audio preprocessing, Gradio interface, voice profile library, model catalog, model download management, and backend adapters are implemented.

The default passthrough backend is intentionally model-free. It copies the preprocessed source audio to the output so the UI and pipeline can be tested before installing any model.

Supported Workflows

Workflow	Input	Output	Recommended backend
Voice conversion	Source audio + target reference audio	Converted audio	`seed_vc_cli`
TTS voice cloning	Target reference audio + text	Generated speech	`cosyvoice3_builtin`, `cosyvoice3_cli`, `qwen3_tts_cli`, `chatterbox_cli`, `f5_tts_cli`, `openvoice_cli`, `indextts_cli`, `xtts_cli`
Pipeline test	Source audio + target reference audio	Copied source audio	`passthrough`

Supported Model Families

Model family	Integration status	Notes
Qwen3 TTS / MLX	Download catalog + CLI backend	Strong Apple Silicon candidate.
CosyVoice3	Download catalog + built-in backend + CLI backend	Built-in backend expects the official CosyVoice Python package locally.
Chatterbox	Download catalog + CLI backend	Open-source TTS voice cloning with expressive controls.
F5-TTS	Download catalog + CLI backend	Mature zero-shot TTS; check model license for commercial use.
OpenVoice V2	Download catalog + CLI backend	Lightweight MIT-licensed voice cloning candidate.
IndexTTS-2	Download catalog + CLI backend	Current public weights. IndexTTS-2.5 has a technical report, but no official downloadable weights are configured yet.
XTTS v2	Download catalog + CLI backend	Classic multilingual voice cloning model; check license before production use.
Seed-VC	CLI backend	Audio-to-audio voice conversion, not TTS.

CLI backend means the WebUI can call the model through a configured command template. It does not mean the third-party model runtime is bundled in this repository.

Requirements

macOS, Linux, or Windows
Python 3.12+
uv
Optional model-specific runtimes for Seed-VC, CosyVoice3, Qwen3 TTS / MLX, or Chatterbox

Apple Silicon is the primary local target, but the app itself is model-agnostic.

Quick Start

uv sync
uv run python main.py

Open:

http://127.0.0.1:8000

If port 8000 is already in use:

VOICE_MORPHER_PORT=8001 uv run python main.py

WebUI

The Gradio interface includes:

Audio Voice Conversion: upload source audio and target reference audio.
Voice Clone Library: save target speaker reference audio as a reusable voice profile.
Text Voice Cloning: upload target reference audio and enter text.
Model Download: download configured Hugging Face or ModelScope snapshots into models/.
Model Configuration: inspect configured and missing CLI backends.

Model Catalog

Downloadable model metadata is stored in:

config/models.toml

Add or edit entries there instead of editing Python code. Each entry defines the display label, Hugging Face repo, local directory name, task, and description.

Downloaded weights are stored under:

models/

models/ is ignored by git.

For example, the CosyVoice3 entry downloads to:

models/Fun-CosyVoice3-0.5B-2512/

The download source can be selected in the WebUI. Hugging Face is available for every configured model. ModelScope is available only when the model entry defines modelscope_id.

The built-in CosyVoice3 backend still expects the official CosyVoice repository to exist locally:

external/CosyVoice/
models/Fun-CosyVoice3-0.5B-2512/

The repository is not cloned by this app. Install it manually if you want to use the built-in CosyVoice3 backend.

Backend Configuration

Copy the example env file if you prefer file-based configuration:

cp .env.example .env

Command templates support these placeholders:

{source}: preprocessed source audio path.
{reference}: preprocessed target speaker reference audio path.
{text}: input text for TTS voice cloning.
{output}: output wav path that the backend command must create.

Seed-VC

Install and test Seed-VC separately first, then configure:

VOICE_MORPHER_SEED_VC_COMMAND='python inference.py --source {source} --target {reference} --output {output}' \
uv run python main.py

Seed-VC may require a wrapper if its CLI writes to an output directory instead of a single wav file.

CosyVoice3

The recommended app workflow is:

Save a voice in Voice Clone Library.
Download the CosyVoice3 model in Model Download.
Install the official CosyVoice repo manually under external/CosyVoice.
Open Text Voice Cloning, select the saved voice, and select CosyVoice3 Built-in.

The command-template backend is still available for custom setups:

VOICE_MORPHER_COSYVOICE3_COMMAND='python cosyvoice3_infer.py --prompt-audio {reference} --text {text} --output {output}' \
uv run python main.py

Qwen3 TTS / MLX

VOICE_MORPHER_QWEN3_TTS_COMMAND='python qwen3_tts.py --reference {reference} --text {text} --output {output}' \
uv run python main.py

Chatterbox

VOICE_MORPHER_CHATTERBOX_COMMAND='python chatterbox_tts.py --reference {reference} --text {text} --output {output}' \
uv run python main.py

F5-TTS

VOICE_MORPHER_F5_TTS_COMMAND='f5-tts_infer-cli --ref_audio {reference} --ref_text {prompt_text} --gen_text {text} --output_file {output}' \
uv run python main.py

OpenVoice V2

VOICE_MORPHER_OPENVOICE_COMMAND='python openvoice_infer.py --reference {reference} --text {text} --output {output}' \
uv run python main.py

IndexTTS

VOICE_MORPHER_INDEXTTS_COMMAND='python indextts_infer.py --reference {reference} --text {text} --output {output}' \
uv run python main.py

XTTS v2

VOICE_MORPHER_XTTS_COMMAND='python xtts_infer.py --speaker_wav {reference} --text {text} --output {output}' \
uv run python main.py

Project Layout

.
├── config/
│   └── models.toml
├── docs/
│   └── TECHNICAL_DESIGN.md
├── src/
│   ├── backends/
│   ├── core/
│   ├── services/
│   └── ui/
├── tests/
├── .env.example
├── LICENSE
├── README.md
└── README.zh.md

Development

uv run ruff check
uv run pytest

Roadmap

Add wrappers for common Seed-VC output layouts.
Add install helpers for external model repositories.
Add background jobs and progress reporting for long downloads and inference.
Add model-specific validation for required local files.
Add long-audio slicing, silence handling, and vocal separation.
Add ASR + TTS workflow for video translation.

Safety

Only use voices you own or are authorized to process. This project does not include consent verification, watermarking, or misuse detection. Add those controls before any public or commercial deployment.

License

This project is released under the MIT License.

Third-party model weights and model repositories have their own licenses. Check each model license before commercial use.

Contact

Maintainer: SK Studio

Email: developer@skstudio.cn

Technical Design

See docs/TECHNICAL_DESIGN.md.

For model selection details, see docs/OPEN_SOURCE_VOICE_CLONING.md.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Voice Morpher

What It Does

Current Status

Supported Workflows

Supported Model Families

Requirements

Quick Start

WebUI

Model Catalog

Backend Configuration

Seed-VC

CosyVoice3

Qwen3 TTS / MLX

Chatterbox

F5-TTS

OpenVoice V2

IndexTTS

XTTS v2

Project Layout

Development

Roadmap

Safety

License

Contact

Technical Design

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
config		config
docs		docs
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
README.zh.md		README.zh.md
main.py		main.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

Voice Morpher

What It Does

Current Status

Supported Workflows

Supported Model Families

Requirements

Quick Start

WebUI

Model Catalog

Backend Configuration

Seed-VC

CosyVoice3

Qwen3 TTS / MLX

Chatterbox

F5-TTS

OpenVoice V2

IndexTTS

XTTS v2

Project Layout

Development

Roadmap

Safety

License

Contact

Technical Design

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages