GPU service for voice cloning via Retrieval-based Voice Conversion (CUDA + ROCm).
Part of the CRAFT content studio.
RVC converts any voice recording to sound like a target speaker using a trained voice model. In CRAFT, it's used as an optional post-processor for TTS — any TTS service (Edge, ElevenLabs, OpenAI) can have RVC applied to clone a custom voice.
| Endpoint | Method | Description |
|---|---|---|
/models |
GET | List available voice models |
/convert |
POST | Convert audio using a voice model |
docker build -t rvc .
docker run --gpus all -p 5050:5050 -v ./models:/app/models rvcPlace .pth voice model files in the models/ directory. The first model is auto-loaded on startup.
docker build -f Dockerfile.rocm -t rvc-rocm .
docker run --device /dev/kfd --device /dev/dri -p 5050:5050 -v ./models:/app/models rvc-rocmRequires NVIDIA GPU with CUDA or AMD GPU with ROCm.