GPU-enabled LiveKit Agent for streaming multi-speaker transcription using NVIDIA NeMo's multitalker ASR and diarization models.
- Real-time streaming multi-speaker transcription
- Speaker diarization (up to 4 speakers)
- Multiple audio tracks per room support
- Multiple concurrent rooms via LiveKit AgentServer scaling
- Model prewarming for efficient resource usage
- Docker with NVIDIA GPU support
- NVIDIA GPU with CUDA 12.1+ support
- LiveKit server (cloud or self-hosted)
docker build -t multitalker-livekit-agent:latest .Verify the container can access the GPU and load the models:
docker run --gpus all --env-file .env multitalker-livekit-agent:latest \
python -m agent.gpu_smoke_testdocker run --gpus all --env-file .env multitalker-livekit-agent:latestCopy .env.example to .env and fill in your LiveKit credentials:
cp .env.example .envEnvironment variables:
| Variable | Description | Default |
|---|---|---|
LIVEKIT_URL |
LiveKit server URL | (required) |
LIVEKIT_API_KEY |
LiveKit API key | (required) |
LIVEKIT_API_SECRET |
LiveKit API secret | (required) |
INPUT_TRACK_NAME |
Name of audio track to transcribe | mix |
TRANSCRIPT_TOPIC |
Data topic for transcript messages | multitalker_transcript |
ROOM_PREFIX |
Only join rooms with this prefix (empty = all rooms) | (empty) |
Test the NeMo pipeline with a local audio file:
docker run --gpus all -v $(pwd)/tests/data:/audio \
multitalker-livekit-agent:latest \
python -m agent.multitalker_pipeline --audio /audio/sample.wavdocker run --gpus all multitalker-livekit-agent:latest \
pytest -qWith the agent running and connected to LiveKit:
python scripts/multi_room_test.pyThe agent uses LiveKit's standard AgentServer scaling model:
- Worker Processes: The AgentServer spawns worker processes
- Model Prewarming: Each worker loads NeMo models once via
prewarm() - Multiple Jobs: Each worker can handle multiple concurrent jobs (rooms)
- Horizontal Scaling: Deploy multiple agent containers for higher capacity
-
Shared across jobs (in
proc.userdata):- MultitalkerTranscriptionConfig
- SortformerEncLabelModel (diarization)
- ASRModel (multitalker ASR)
-
Per-job state (local to each room):
- Sessions per audio track
- Asyncio tasks per track
- Frame count metrics
- One job (room) can handle multiple audio tracks concurrently
- Each track gets its own
MultitalkerStreamingSession - Transcripts include session/track identification
Transcripts are published as JSON via LiveKit data packets:
{
"type": "multitalker_transcript",
"segments": [
{
"session_id": "room-name_track-sid",
"speaker": "spk_0",
"start_time": 1.5,
"end_time": 3.2,
"text": "Hello, how are you?",
"is_final": true
}
]
}# Create virtual environment
python -m venv .venv
source .venv/bin/activate
# Install dependencies
pip install -r requirements.txt
# Install NeMo (requires CUDA)
pip install git+https://github.com/NVIDIA/NeMo.git@main#egg=nemo_toolkit[asr].
├── agent/
│ ├── __init__.py
│ ├── agent.py # LiveKit agent entrypoint
│ ├── gpu_smoke_test.py # GPU/model verification script
│ ├── multitalker_pipeline.py # NeMo streaming wrapper
│ └── multitalker_transcript_config.py # Configuration dataclass
├── scripts/
│ └── multi_room_test.py # Integration test harness
├── tests/
│ ├── __init__.py
│ └── test_multitalker_pipeline.py # Unit tests
├── Dockerfile
├── requirements.txt
├── .env.example
└── README.md
docker tag multitalker-livekit-agent:latest your-registry.com/multitalker-livekit-agent:latest
docker push your-registry.com/multitalker-livekit-agent:latestkubectl create secret generic livekit-agent-secrets \
--from-literal=LIVEKIT_URL=wss://your-livekit-server.com \
--from-literal=LIVEKIT_API_KEY=your-api-key \
--from-literal=LIVEKIT_API_SECRET=your-api-secret# multitalker-agent-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: multitalker-livekit-agent
labels:
app: multitalker-agent
spec:
replicas: 1
selector:
matchLabels:
app: multitalker-agent
template:
metadata:
labels:
app: multitalker-agent
spec:
containers:
- name: agent
image: your-registry.com/multitalker-livekit-agent:latest
resources:
limits:
nvidia.com/gpu: 1
requests:
nvidia.com/gpu: 1
memory: "8Gi"
cpu: "2"
envFrom:
- secretRef:
name: livekit-agent-secrets
env:
- name: INPUT_TRACK_NAME
value: "mix"
- name: TRANSCRIPT_TOPIC
value: "multitalker_transcript"
- name: ROOM_PREFIX
value: "transcribe-" # Only join rooms starting with "transcribe-"
tolerations:
- key: "nvidia.com/gpu"
operator: "Exists"
effect: "NoSchedule"
nodeSelector:
accelerator: nvidia-gpuApply with:
kubectl apply -f multitalker-agent-deployment.yamlThe agent registers with LiveKit using the Agents SDK. When a room is created, LiveKit automatically dispatches an available agent worker to join. The agent:
- Subscribes to audio tracks matching
INPUT_TRACK_NAME - Runs real-time ASR with speaker diarization
- Publishes transcripts via LiveKit data channel on
TRANSCRIPT_TOPIC
JavaScript/TypeScript:
import { Room } from 'livekit-client';
const room = new Room();
room.on('dataReceived', (payload, participant, kind, topic) => {
if (topic === 'multitalker_transcript') {
const transcript = JSON.parse(new TextDecoder().decode(payload));
transcript.segments.forEach(segment => {
console.log(`[${segment.speaker}]: ${segment.text}`);
});
}
});
await room.connect(LIVEKIT_URL, accessToken);Python:
from livekit import rtc
import json
async def on_data_received(data: bytes, participant, kind, topic: str):
if topic == "multitalker_transcript":
transcript = json.loads(data.decode("utf-8"))
for segment in transcript.get("segments", []):
print(f"[{segment['speaker']}]: {segment['text']}")
room = rtc.Room()
room.on("data_received", on_data_received)
await room.connect(livekit_url, token)Ensure your audio track name matches INPUT_TRACK_NAME:
const track = await createLocalAudioTrack();
await room.localParticipant.publishTrack(track, {
name: 'mix', // Must match INPUT_TRACK_NAME
source: Track.Source.Microphone,
});For horizontal scaling, increase replicas (limited by GPU availability):
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: multitalker-agent-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: multitalker-livekit-agent
minReplicas: 1
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70# Check agent logs
kubectl logs -l app=multitalker-agent -f
# Verify GPU access
kubectl exec -it deployment/multitalker-livekit-agent -- nvidia-smi
# Test agent connectivity
kubectl exec -it deployment/multitalker-livekit-agent -- python -m agent.gpu_smoke_test| Issue | Solution |
|---|---|
| Agent not joining rooms | Check LIVEKIT_URL uses wss:// protocol |
| No transcripts received | Verify track name matches INPUT_TRACK_NAME |
| GPU out of memory | Reduce concurrent rooms or use larger GPU |
| Slow startup | Normal - NeMo models are large (~2-3GB) |
See LICENSE file for details.