Skip to content

sihuangtech/voice-morpher

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Voice Morpher

中文文档

Local-first voice conversion and voice cloning toolkit with a Gradio WebUI, pluggable model backends, and model download management.

Voice Morpher is built for local demos on Apple Silicon first. It does not bundle model weights or lock the app to one model. Instead, it provides a small application layer around open-source speech models such as Seed-VC, CosyVoice3, Qwen3 TTS / MLX, Chatterbox, F5-TTS, OpenVoice, IndexTTS, and XTTS.

What It Does

  • Convert source audio into a target speaker's timbre while preserving source rhythm and pauses as much as possible.
  • Generate cloned speech from text and a target speaker reference audio.
  • Download supported Hugging Face or ModelScope model snapshots from the WebUI.
  • Clone reusable voice profiles from reference audio in the WebUI.
  • Use built-in WebUI flows first, with command-template backends kept for advanced setups.
  • Keep runtime data, external model repos, and downloaded weights outside the source tree.

Current Status

This is an MVP. The application flow, audio preprocessing, Gradio interface, voice profile library, model catalog, model download management, and backend adapters are implemented.

The default passthrough backend is intentionally model-free. It copies the preprocessed source audio to the output so the UI and pipeline can be tested before installing any model.

Supported Workflows

Workflow Input Output Recommended backend
Voice conversion Source audio + target reference audio Converted audio seed_vc_cli
TTS voice cloning Target reference audio + text Generated speech cosyvoice3_builtin, cosyvoice3_cli, qwen3_tts_cli, chatterbox_cli, f5_tts_cli, openvoice_cli, indextts_cli, xtts_cli
Pipeline test Source audio + target reference audio Copied source audio passthrough

Supported Model Families

Model family Integration status Notes
Qwen3 TTS / MLX Download catalog + CLI backend Strong Apple Silicon candidate.
CosyVoice3 Download catalog + built-in backend + CLI backend Built-in backend expects the official CosyVoice Python package locally.
Chatterbox Download catalog + CLI backend Open-source TTS voice cloning with expressive controls.
F5-TTS Download catalog + CLI backend Mature zero-shot TTS; check model license for commercial use.
OpenVoice V2 Download catalog + CLI backend Lightweight MIT-licensed voice cloning candidate.
IndexTTS-2 Download catalog + CLI backend Current public weights. IndexTTS-2.5 has a technical report, but no official downloadable weights are configured yet.
XTTS v2 Download catalog + CLI backend Classic multilingual voice cloning model; check license before production use.
Seed-VC CLI backend Audio-to-audio voice conversion, not TTS.

CLI backend means the WebUI can call the model through a configured command template. It does not mean the third-party model runtime is bundled in this repository.

Requirements

  • macOS, Linux, or Windows
  • Python 3.12+
  • uv
  • Optional model-specific runtimes for Seed-VC, CosyVoice3, Qwen3 TTS / MLX, or Chatterbox

Apple Silicon is the primary local target, but the app itself is model-agnostic.

Quick Start

uv sync
uv run python main.py

Open:

http://127.0.0.1:8000

If port 8000 is already in use:

VOICE_MORPHER_PORT=8001 uv run python main.py

WebUI

The Gradio interface includes:

  • Audio Voice Conversion: upload source audio and target reference audio.
  • Voice Clone Library: save target speaker reference audio as a reusable voice profile.
  • Text Voice Cloning: upload target reference audio and enter text.
  • Model Download: download configured Hugging Face or ModelScope snapshots into models/.
  • Model Configuration: inspect configured and missing CLI backends.

Model Catalog

Downloadable model metadata is stored in:

config/models.toml

Add or edit entries there instead of editing Python code. Each entry defines the display label, Hugging Face repo, local directory name, task, and description.

Downloaded weights are stored under:

models/

models/ is ignored by git.

For example, the CosyVoice3 entry downloads to:

models/Fun-CosyVoice3-0.5B-2512/

The download source can be selected in the WebUI. Hugging Face is available for every configured model. ModelScope is available only when the model entry defines modelscope_id.

The built-in CosyVoice3 backend still expects the official CosyVoice repository to exist locally:

external/CosyVoice/
models/Fun-CosyVoice3-0.5B-2512/

The repository is not cloned by this app. Install it manually if you want to use the built-in CosyVoice3 backend.

Backend Configuration

Copy the example env file if you prefer file-based configuration:

cp .env.example .env

Command templates support these placeholders:

  • {source}: preprocessed source audio path.
  • {reference}: preprocessed target speaker reference audio path.
  • {text}: input text for TTS voice cloning.
  • {output}: output wav path that the backend command must create.

Seed-VC

Install and test Seed-VC separately first, then configure:

VOICE_MORPHER_SEED_VC_COMMAND='python inference.py --source {source} --target {reference} --output {output}' \
uv run python main.py

Seed-VC may require a wrapper if its CLI writes to an output directory instead of a single wav file.

CosyVoice3

The recommended app workflow is:

  1. Save a voice in Voice Clone Library.
  2. Download the CosyVoice3 model in Model Download.
  3. Install the official CosyVoice repo manually under external/CosyVoice.
  4. Open Text Voice Cloning, select the saved voice, and select CosyVoice3 Built-in.

The command-template backend is still available for custom setups:

VOICE_MORPHER_COSYVOICE3_COMMAND='python cosyvoice3_infer.py --prompt-audio {reference} --text {text} --output {output}' \
uv run python main.py

Qwen3 TTS / MLX

VOICE_MORPHER_QWEN3_TTS_COMMAND='python qwen3_tts.py --reference {reference} --text {text} --output {output}' \
uv run python main.py

Chatterbox

VOICE_MORPHER_CHATTERBOX_COMMAND='python chatterbox_tts.py --reference {reference} --text {text} --output {output}' \
uv run python main.py

F5-TTS

VOICE_MORPHER_F5_TTS_COMMAND='f5-tts_infer-cli --ref_audio {reference} --ref_text {prompt_text} --gen_text {text} --output_file {output}' \
uv run python main.py

OpenVoice V2

VOICE_MORPHER_OPENVOICE_COMMAND='python openvoice_infer.py --reference {reference} --text {text} --output {output}' \
uv run python main.py

IndexTTS

VOICE_MORPHER_INDEXTTS_COMMAND='python indextts_infer.py --reference {reference} --text {text} --output {output}' \
uv run python main.py

XTTS v2

VOICE_MORPHER_XTTS_COMMAND='python xtts_infer.py --speaker_wav {reference} --text {text} --output {output}' \
uv run python main.py

Project Layout

.
├── config/
│   └── models.toml
├── docs/
│   └── TECHNICAL_DESIGN.md
├── src/
│   ├── backends/
│   ├── core/
│   ├── services/
│   └── ui/
├── tests/
├── .env.example
├── LICENSE
├── README.md
└── README.zh.md

Development

uv run ruff check
uv run pytest

Roadmap

  • Add wrappers for common Seed-VC output layouts.
  • Add install helpers for external model repositories.
  • Add background jobs and progress reporting for long downloads and inference.
  • Add model-specific validation for required local files.
  • Add long-audio slicing, silence handling, and vocal separation.
  • Add ASR + TTS workflow for video translation.

Safety

Only use voices you own or are authorized to process. This project does not include consent verification, watermarking, or misuse detection. Add those controls before any public or commercial deployment.

License

This project is released under the MIT License.

Third-party model weights and model repositories have their own licenses. Check each model license before commercial use.

Contact

Maintainer: SK Studio

Email: developer@skstudio.cn

Technical Design

See docs/TECHNICAL_DESIGN.md.

For model selection details, see docs/OPEN_SOURCE_VOICE_CLONING.md.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages