Multitalker LiveKit Agent

GPU-enabled LiveKit Agent for streaming multi-speaker transcription using NVIDIA NeMo's multitalker ASR and diarization models.

Features

Real-time streaming multi-speaker transcription
Speaker diarization (up to 4 speakers)
Multiple audio tracks per room support
Multiple concurrent rooms via LiveKit AgentServer scaling
Model prewarming for efficient resource usage

Requirements

Docker with NVIDIA GPU support
NVIDIA GPU with CUDA 12.1+ support
LiveKit server (cloud or self-hosted)

Quick Start

Build the Docker Image

docker build -t multitalker-livekit-agent:latest .

Run the GPU Smoke Test

Verify the container can access the GPU and load the models:

docker run --gpus all --env-file .env multitalker-livekit-agent:latest \
  python -m agent.gpu_smoke_test

Run the Agent

docker run --gpus all --env-file .env multitalker-livekit-agent:latest

Configuration

Copy .env.example to .env and fill in your LiveKit credentials:

cp .env.example .env

Environment variables:

Variable	Description	Default
`LIVEKIT_URL`	LiveKit server URL	(required)
`LIVEKIT_API_KEY`	LiveKit API key	(required)
`LIVEKIT_API_SECRET`	LiveKit API secret	(required)
`INPUT_TRACK_NAME`	Name of audio track to transcribe	`mix`
`TRANSCRIPT_TOPIC`	Data topic for transcript messages	`multitalker_transcript`
`ROOM_PREFIX`	Only join rooms with this prefix (empty = all rooms)	(empty)

Testing

Offline Pipeline Test

Test the NeMo pipeline with a local audio file:

docker run --gpus all -v $(pwd)/tests/data:/audio \
  multitalker-livekit-agent:latest \
  python -m agent.multitalker_pipeline --audio /audio/sample.wav

Run Unit Tests

docker run --gpus all multitalker-livekit-agent:latest \
  pytest -q

Multi-Room Integration Test

With the agent running and connected to LiveKit:

python scripts/multi_room_test.py

Architecture

Scaling Model

The agent uses LiveKit's standard AgentServer scaling model:

Worker Processes: The AgentServer spawns worker processes
Model Prewarming: Each worker loads NeMo models once via prewarm()
Multiple Jobs: Each worker can handle multiple concurrent jobs (rooms)
Horizontal Scaling: Deploy multiple agent containers for higher capacity

State Management

Shared across jobs (in proc.userdata):
- MultitalkerTranscriptionConfig
- SortformerEncLabelModel (diarization)
- ASRModel (multitalker ASR)
Per-job state (local to each room):
- Sessions per audio track
- Asyncio tasks per track
- Frame count metrics

Multi-Track Support

One job (room) can handle multiple audio tracks concurrently
Each track gets its own MultitalkerStreamingSession
Transcripts include session/track identification

Transcript Format

Transcripts are published as JSON via LiveKit data packets:

{
  "type": "multitalker_transcript",
  "segments": [
    {
      "session_id": "room-name_track-sid",
      "speaker": "spk_0",
      "start_time": 1.5,
      "end_time": 3.2,
      "text": "Hello, how are you?",
      "is_final": true
    }
  ]
}

Development

Local Development Setup

# Create virtual environment
python -m venv .venv
source .venv/bin/activate

# Install dependencies
pip install -r requirements.txt

# Install NeMo (requires CUDA)
pip install git+https://github.com/NVIDIA/NeMo.git@main#egg=nemo_toolkit[asr]

Project Structure

.
├── agent/
│   ├── __init__.py
│   ├── agent.py                      # LiveKit agent entrypoint
│   ├── gpu_smoke_test.py             # GPU/model verification script
│   ├── multitalker_pipeline.py       # NeMo streaming wrapper
│   └── multitalker_transcript_config.py  # Configuration dataclass
├── scripts/
│   └── multi_room_test.py            # Integration test harness
├── tests/
│   ├── __init__.py
│   └── test_multitalker_pipeline.py  # Unit tests
├── Dockerfile
├── requirements.txt
├── .env.example
└── README.md

Kubernetes Deployment

Push Image to Registry

docker tag multitalker-livekit-agent:latest your-registry.com/multitalker-livekit-agent:latest
docker push your-registry.com/multitalker-livekit-agent:latest

Create Kubernetes Secret

kubectl create secret generic livekit-agent-secrets \
  --from-literal=LIVEKIT_URL=wss://your-livekit-server.com \
  --from-literal=LIVEKIT_API_KEY=your-api-key \
  --from-literal=LIVEKIT_API_SECRET=your-api-secret

Deployment Manifest

# multitalker-agent-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: multitalker-livekit-agent
  labels:
    app: multitalker-agent
spec:
  replicas: 1
  selector:
    matchLabels:
      app: multitalker-agent
  template:
    metadata:
      labels:
        app: multitalker-agent
    spec:
      containers:
      - name: agent
        image: your-registry.com/multitalker-livekit-agent:latest
        resources:
          limits:
            nvidia.com/gpu: 1
          requests:
            nvidia.com/gpu: 1
            memory: "8Gi"
            cpu: "2"
        envFrom:
        - secretRef:
            name: livekit-agent-secrets
        env:
        - name: INPUT_TRACK_NAME
          value: "mix"
        - name: TRANSCRIPT_TOPIC
          value: "multitalker_transcript"
        - name: ROOM_PREFIX
          value: "transcribe-"  # Only join rooms starting with "transcribe-"
      tolerations:
      - key: "nvidia.com/gpu"
        operator: "Exists"
        effect: "NoSchedule"
      nodeSelector:
        accelerator: nvidia-gpu

Apply with:

kubectl apply -f multitalker-agent-deployment.yaml

How It Works

The agent registers with LiveKit using the Agents SDK. When a room is created, LiveKit automatically dispatches an available agent worker to join. The agent:

Subscribes to audio tracks matching INPUT_TRACK_NAME
Runs real-time ASR with speaker diarization
Publishes transcripts via LiveKit data channel on TRANSCRIPT_TOPIC

Receiving Transcripts in Your Application

JavaScript/TypeScript:

import { Room } from 'livekit-client';

const room = new Room();

room.on('dataReceived', (payload, participant, kind, topic) => {
  if (topic === 'multitalker_transcript') {
    const transcript = JSON.parse(new TextDecoder().decode(payload));
    transcript.segments.forEach(segment => {
      console.log(`[${segment.speaker}]: ${segment.text}`);
    });
  }
});

await room.connect(LIVEKIT_URL, accessToken);

Python:

from livekit import rtc
import json

async def on_data_received(data: bytes, participant, kind, topic: str):
    if topic == "multitalker_transcript":
        transcript = json.loads(data.decode("utf-8"))
        for segment in transcript.get("segments", []):
            print(f"[{segment['speaker']}]: {segment['text']}")

room = rtc.Room()
room.on("data_received", on_data_received)
await room.connect(livekit_url, token)

Publishing Audio for Transcription

Ensure your audio track name matches INPUT_TRACK_NAME:

const track = await createLocalAudioTrack();
await room.localParticipant.publishTrack(track, {
  name: 'mix',  // Must match INPUT_TRACK_NAME
  source: Track.Source.Microphone,
});

Scaling

For horizontal scaling, increase replicas (limited by GPU availability):

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: multitalker-agent-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: multitalker-livekit-agent
  minReplicas: 1
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70

Troubleshooting

# Check agent logs
kubectl logs -l app=multitalker-agent -f

# Verify GPU access
kubectl exec -it deployment/multitalker-livekit-agent -- nvidia-smi

# Test agent connectivity
kubectl exec -it deployment/multitalker-livekit-agent -- python -m agent.gpu_smoke_test

Issue	Solution
Agent not joining rooms	Check LIVEKIT_URL uses `wss://` protocol
No transcripts received	Verify track name matches `INPUT_TRACK_NAME`
GPU out of memory	Reduce concurrent rooms or use larger GPU
Slow startup	Normal - NeMo models are large (~2-3GB)

License

See LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Multitalker LiveKit Agent

Features

Requirements

Quick Start

Build the Docker Image

Run the GPU Smoke Test

Run the Agent

Configuration

Testing

Offline Pipeline Test

Run Unit Tests

Multi-Room Integration Test

Architecture

Scaling Model

State Management

Multi-Track Support

Transcript Format

Development

Local Development Setup

Project Structure

Kubernetes Deployment

Push Image to Registry

Create Kubernetes Secret

Deployment Manifest

How It Works

Receiving Transcripts in Your Application

Publishing Audio for Transcription

Scaling

Troubleshooting

License

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
agent		agent
scripts		scripts
tests		tests
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
requirements.txt		requirements.txt

mayflower/multitalker-livekit-agent

Folders and files

Latest commit

History

Repository files navigation

Multitalker LiveKit Agent

Features

Requirements

Quick Start

Build the Docker Image

Run the GPU Smoke Test

Run the Agent

Configuration

Testing

Offline Pipeline Test

Run Unit Tests

Multi-Room Integration Test

Architecture

Scaling Model

State Management

Multi-Track Support

Transcript Format

Development

Local Development Setup

Project Structure

Kubernetes Deployment

Push Image to Registry

Create Kubernetes Secret

Deployment Manifest

How It Works

Receiving Transcripts in Your Application

Publishing Audio for Transcription

Scaling

Troubleshooting

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages