AI-powered voice conversion API deployed on RunPod Serverless using the LinaCodec model.
LinaCodec Voice Conversion is a serverless API that transforms speech from one voice to another. Simply provide a source audio file (content) and a reference audio file (style/timbre), and receive a converted audio file in MP3 or WAV format. Built for scalability on RunPod's GPU infrastructure with optional S3 integration for persistent storage.
- Zero-Cold-Start Optimization: Lazy model loading minimizes first-request latency
- Flexible Audio Formats: Output as MP3 (192k bitrate) or WAV (PCM_16)
- Dual Output Modes: Return audio as base64-encoded data or via S3 presigned URL
- Session Isolation: UUID-based temporary file handling for concurrent processing
- Persistent Caching: Network volume stores model cache across pod restarts
- Optional S3 Integration: Upload outputs directly to S3 with presigned URLs (1-hour expiry)
- High-Quality Output: 48kHz sample rate audio conversion
- Graceful Error Handling: Comprehensive logging and fallback mechanisms
The system is built on RunPod Serverless with the following components:
- Handler (
handler.py): Main RunPod serverless entry point with lazy-loaded model - Bootstrap (
bootstrap.sh): Idempotent container initialization script - Configuration (
config.py): Environment-based configuration management - Persistent Volume: Stores model cache, outputs, and Python environment across restarts
- Hugging Face Hub: Model repository (requires
HF_TOKEN) - AWS S3: Optional output storage (requires
S3_*environment variables) - GitHub: LinaCodec source code repository
- Input: Client sends source audio URL, reference audio URL, and format preference
- Download: Audio files are downloaded to isolated temporary directories
- Model Load: LinaCodec model is loaded (cached after first request)
- Voice Conversion:
model.convert_voice()processes the audio at 48kHz - Encoding: Output encoded to MP3 (FFmpeg) or WAV (Soundfile)
- Storage: Saved to persistent volume and optionally uploaded to S3
- Response: Returns S3 URL or base64-encoded audio
- Cleanup: Temporary files are removed
- Docker installed
- RunPod account with GPU support
- Hugging Face API token (get one here)
- (Optional) S3 credentials for cloud storage
# Clone the repository
git clone https://github.com/your-username/LinaCodec.git
cd LinaCodec
# Build the Docker image
docker build -t linacodec .
# Run with GPU support
docker run -d --gpus all \
-e HF_TOKEN=your_hf_token_here \
linacodec# Build and push to RunPod
docker build -t your-registry/linacodec:latest .
docker push your-registry/linacodec:latest
# Deploy via RunPod Console or CLI
# Set environment variables in RunPod template:
# - HF_TOKEN (required)
# - S3_ENDPOINT_URL, S3_ACCESS_KEY_ID, S3_SECRET_ACCESS_KEY, S3_BUCKET_NAME, S3_REGION (optional)Send a POST request to your RunPod serverless endpoint:
{
"input": {
"audio_url_1": "https://example.com/source.mp3",
"audio_url_2": "https://example.com/reference.mp3",
"format": "mp3"
}
}| Parameter | Type | Required | Description |
|---|---|---|---|
audio_url_1 |
string | Yes | URL of source audio (content voice) |
audio_url_2 |
string | Yes | URL of reference audio (style/timbre voice) |
format |
string | No | Output format: "mp3" (default) or "wav" |
Success with S3 enabled:
{
"status": "success",
"format": "mp3",
"audio_url": "https://s3-bucket.s3.region.amazonaws.com/output.mp3"
}Success without S3 (base64):
{
"status": "success",
"format": "wav",
"audio_base64": "UklGRiQAAABXQVZFZm10IBAAAAABAAEARKwAAIhYAQACABAAZGF0YQAAAAA..."
}Error:
{
"error": "Failed to download source audio from https://example.com/source.mp3"
}| Variable | Required | Default | Description |
|---|---|---|---|
HF_TOKEN |
Yes | - | Hugging Face API token for model access |
S3_ENDPOINT_URL |
No | - | Custom S3 endpoint URL |
S3_ACCESS_KEY_ID |
No | - | S3 access key ID |
S3_SECRET_ACCESS_KEY |
No | - | S3 secret access key |
S3_BUCKET_NAME |
No | - | S3 bucket name for output storage |
S3_REGION |
No | us-east-1 |
S3 region |
The persistent volume at /runpod-volume/LinaCodecVC/ contains:
/runpod-volume/LinaCodecVC/
├── output/ # Generated audio files
├── cache/ # HuggingFace model cache (HF_HOME)
├── src/ # LinaCodec source code
└── venv/ # Python virtual environment
The bootstrap.sh script runs on container start:
- Creates directory structure on network volume
- Checks for first-run flag file
- First run only:
- Creates Python virtual environment
- Installs PyTorch 2.9.1 with CUDA 12.8 support
- Installs Flash Attention v2.8.3
- Clones LinaCodec from GitHub
- Installs LinaCodec package in editable mode
- Installs Python dependencies
- Creates first-run flag
- Subsequent runs: Activates existing venv
- Starts handler
# Core ML framework (installed in bootstrap)
torch==2.9.1
torchvision==0.24.1
torchaudio==2.9.1
# Flash Attention (installed in bootstrap)
flash_attn==2.8.3
# RunPod serverless
runpod>=1.6.0
# Audio processing
librosa
soundfile
numpy>=1.26.0
# HTTP/storage
boto3>=1.26.0
requests
# Model acceleration
hf_transfer- Sample Rate: 48kHz output (LinaCodec native rate)
- MP3 Encoding: FFmpeg with 192k bitrate, constant quality
- WAV Encoding: Soundfile with PCM_16 subtype
- Input Handling: Automatic float32 to int16 conversion with clamping
Issue: "Failed to load model at startup"
- Solution: Ensure
HF_TOKENis set and valid. Check network connectivity to Hugging Face.
Issue: "FFmpeg encoding failed"
- Solution: FFmpeg is installed in bootstrap.sh. Verify installation or check audio input format.
Issue: Slow first request
- Solution: Expected behavior due to model download (~2GB). Subsequent requests are faster due to caching.
Issue: S3 upload fails, no base64 fallback
- Solution: Check S3 credentials and endpoint. Ensure bucket exists and credentials have write permissions.
View logs for debugging:
# RunPod logs via console
runpodctl logs <pod_id>
# Or check container logs
docker logs <container_id>- Runtime: Python 3.10+
- ML Framework: PyTorch 2.9.1 with CUDA 12.8
- Optimization: Flash Attention v2.8.3
- Model: LinaCodec by ysharma3501
- Platform: RunPod Serverless
- Container Base: runpod/base:1.0.3-cuda1281-ubuntu2404
- Audio Tools: FFmpeg, SoX, Librosa, Soundfile
- Storage: AWS S3 (optional)
This project is licensed under the MIT License - see LICENSE for details.
- LinaCodec - Voice conversion model
- RunPod - Serverless GPU platform
- Hugging Face - Model hosting and distribution