LinaCodec Voice Conversion

AI-powered voice conversion API deployed on RunPod Serverless using the LinaCodec model.

LinaCodec Voice Conversion is a serverless API that transforms speech from one voice to another. Simply provide a source audio file (content) and a reference audio file (style/timbre), and receive a converted audio file in MP3 or WAV format. Built for scalability on RunPod's GPU infrastructure with optional S3 integration for persistent storage.

Features

Zero-Cold-Start Optimization: Lazy model loading minimizes first-request latency
Flexible Audio Formats: Output as MP3 (192k bitrate) or WAV (PCM_16)
Dual Output Modes: Return audio as base64-encoded data or via S3 presigned URL
Session Isolation: UUID-based temporary file handling for concurrent processing
Persistent Caching: Network volume stores model cache across pod restarts
Optional S3 Integration: Upload outputs directly to S3 with presigned URLs (1-hour expiry)
High-Quality Output: 48kHz sample rate audio conversion
Graceful Error Handling: Comprehensive logging and fallback mechanisms

Architecture

The system is built on RunPod Serverless with the following components:

Handler (handler.py): Main RunPod serverless entry point with lazy-loaded model
Bootstrap (bootstrap.sh): Idempotent container initialization script
Configuration (config.py): Environment-based configuration management
Persistent Volume: Stores model cache, outputs, and Python environment across restarts

External Integrations

Hugging Face Hub: Model repository (requires HF_TOKEN)
AWS S3: Optional output storage (requires S3_* environment variables)
GitHub: LinaCodec source code repository

Data Flow

Input: Client sends source audio URL, reference audio URL, and format preference
Download: Audio files are downloaded to isolated temporary directories
Model Load: LinaCodec model is loaded (cached after first request)
Voice Conversion: model.convert_voice() processes the audio at 48kHz
Encoding: Output encoded to MP3 (FFmpeg) or WAV (Soundfile)
Storage: Saved to persistent volume and optionally uploaded to S3
Response: Returns S3 URL or base64-encoded audio
Cleanup: Temporary files are removed

Quick Start

Prerequisites

Docker installed
RunPod account with GPU support
Hugging Face API token (get one here)
(Optional) S3 credentials for cloud storage

Local Development with Docker

# Clone the repository
git clone https://github.com/your-username/LinaCodec.git
cd LinaCodec

# Build the Docker image
docker build -t linacodec .

# Run with GPU support
docker run -d --gpus all \
  -e HF_TOKEN=your_hf_token_here \
  linacodec

Deployment to RunPod

# Build and push to RunPod
docker build -t your-registry/linacodec:latest .
docker push your-registry/linacodec:latest

# Deploy via RunPod Console or CLI
# Set environment variables in RunPod template:
# - HF_TOKEN (required)
# - S3_ENDPOINT_URL, S3_ACCESS_KEY_ID, S3_SECRET_ACCESS_KEY, S3_BUCKET_NAME, S3_REGION (optional)

Usage

API Request Format

Send a POST request to your RunPod serverless endpoint:

{
  "input": {
    "audio_url_1": "https://example.com/source.mp3",
    "audio_url_2": "https://example.com/reference.mp3",
    "format": "mp3"
  }
}

Parameter	Type	Required	Description
`audio_url_1`	string	Yes	URL of source audio (content voice)
`audio_url_2`	string	Yes	URL of reference audio (style/timbre voice)
`format`	string	No	Output format: `"mp3"` (default) or `"wav"`

API Response Format

Success with S3 enabled:

{
  "status": "success",
  "format": "mp3",
  "audio_url": "https://s3-bucket.s3.region.amazonaws.com/output.mp3"
}

Success without S3 (base64):

{
  "status": "success",
  "format": "wav",
  "audio_base64": "UklGRiQAAABXQVZFZm10IBAAAAABAAEARKwAAIhYAQACABAAZGF0YQAAAAA..."
}

Error:

{
  "error": "Failed to download source audio from https://example.com/source.mp3"
}

Configuration

Environment Variables

Variable	Required	Default	Description
`HF_TOKEN`	Yes	-	Hugging Face API token for model access
`S3_ENDPOINT_URL`	No	-	Custom S3 endpoint URL
`S3_ACCESS_KEY_ID`	No	-	S3 access key ID
`S3_SECRET_ACCESS_KEY`	No	-	S3 secret access key
`S3_BUCKET_NAME`	No	-	S3 bucket name for output storage
`S3_REGION`	No	`us-east-1`	S3 region

RunPod Volume Structure

The persistent volume at /runpod-volume/LinaCodecVC/ contains:

/runpod-volume/LinaCodecVC/
├── output/    # Generated audio files
├── cache/     # HuggingFace model cache (HF_HOME)
├── src/       # LinaCodec source code
└── venv/      # Python virtual environment

Development

Bootstrap Process

The bootstrap.sh script runs on container start:

Creates directory structure on network volume
Checks for first-run flag file
First run only:
- Creates Python virtual environment
- Installs PyTorch 2.9.1 with CUDA 12.8 support
- Installs Flash Attention v2.8.3
- Clones LinaCodec from GitHub
- Installs LinaCodec package in editable mode
- Installs Python dependencies
- Creates first-run flag
Subsequent runs: Activates existing venv
Starts handler

Dependencies

# Core ML framework (installed in bootstrap)
torch==2.9.1
torchvision==0.24.1
torchaudio==2.9.1

# Flash Attention (installed in bootstrap)
flash_attn==2.8.3

# RunPod serverless
runpod>=1.6.0

# Audio processing
librosa
soundfile
numpy>=1.26.0

# HTTP/storage
boto3>=1.26.0
requests

# Model acceleration
hf_transfer

Audio Processing Details

Sample Rate: 48kHz output (LinaCodec native rate)
MP3 Encoding: FFmpeg with 192k bitrate, constant quality
WAV Encoding: Soundfile with PCM_16 subtype
Input Handling: Automatic float32 to int16 conversion with clamping

Troubleshooting

Common Issues

Issue: "Failed to load model at startup"

Solution: Ensure HF_TOKEN is set and valid. Check network connectivity to Hugging Face.

Issue: "FFmpeg encoding failed"

Solution: FFmpeg is installed in bootstrap.sh. Verify installation or check audio input format.

Issue: Slow first request

Solution: Expected behavior due to model download (~2GB). Subsequent requests are faster due to caching.

Issue: S3 upload fails, no base64 fallback

Solution: Check S3 credentials and endpoint. Ensure bucket exists and credentials have write permissions.

Logs

View logs for debugging:

# RunPod logs via console
runpodctl logs <pod_id>

# Or check container logs
docker logs <container_id>

Technology Stack

Runtime: Python 3.10+
ML Framework: PyTorch 2.9.1 with CUDA 12.8
Optimization: Flash Attention v2.8.3
Model: LinaCodec by ysharma3501
Platform: RunPod Serverless
Container Base: runpod/base:1.0.3-cuda1281-ubuntu2404
Audio Tools: FFmpeg, SoX, Librosa, Soundfile
Storage: AWS S3 (optional)

License

This project is licensed under the MIT License - see LICENSE for details.

Acknowledgments

LinaCodec - Voice conversion model
RunPod - Serverless GPU platform
Hugging Face - Model hosting and distribution

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
docs/diagrams		docs/diagrams
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
Dockerfile		Dockerfile
README.md		README.md
bootstrap.sh		bootstrap.sh
config.py		config.py
handler.py		handler.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LinaCodec Voice Conversion

Features

Architecture

External Integrations

Data Flow

Quick Start

Prerequisites

Local Development with Docker

Deployment to RunPod

Usage

API Request Format

API Response Format

Configuration

Environment Variables

RunPod Volume Structure

Development

Bootstrap Process

Dependencies

Audio Processing Details

Troubleshooting

Common Issues

Logs

Technology Stack

License

Acknowledgments

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

sruckh/LinaCodec-Serverless

Folders and files

Latest commit

History

Repository files navigation

LinaCodec Voice Conversion

Features

Architecture

External Integrations

Data Flow

Quick Start

Prerequisites

Local Development with Docker

Deployment to RunPod

Usage

API Request Format

API Response Format

Configuration

Environment Variables

RunPod Volume Structure

Development

Bootstrap Process

Dependencies

Audio Processing Details

Troubleshooting

Common Issues

Logs

Technology Stack

License

Acknowledgments

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages