The LiveKit Voice Agent is a modern, production-ready voice tutoring platform built with a microservices architecture. The system combines real-time communication, AI-powered conversation, and comprehensive data persistence.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Client Layer β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β
β β Browser β β Mobile β β Desktop β β
β β (React App) β β (Future) β β (Future) β β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββΌβββββββββββ
β Load Balancer β
β (Nginx/ALB) β
βββββββββββ¬βββββββββββ
β
βββββββββββββββββββββββΌβββββββββββββββββββββ
β β β
βββββββββΌβββββββββ βββββββββΌββββββββββ βββββββΌββββββββ
β Frontend β β Backend β β LiveKit β
β React + Vite β β FastAPI β β Server β
β (Nginx) β β (Uvicorn) β β (WebRTC) β
ββββββββββββββββββ ββββββββ¬βββββββββββ βββββββ¬ββββββββ
β β
ββββββββββΌββββββββββββββββββββββ€
β β β
βββββββββΌβββββ ββΌβββββββββββ ββββββββΌβββββββ
β PostgreSQL β β Redis β β OpenAI β
β Database β β Cache β β Realtime APIβ
ββββββββββββββ βββββββββββββ βββββββββββββββ
β
βββββββββ΄βββββββββββ
β β
βββββββββΌβββββββββ βββββββΌβββββββββ
β Prometheus β β Grafana β
β (Metrics) β β (Dashboards) β
ββββββββββββββββββ ββββββββββββββββ
Technology: React 18 + Vite 6 + LiveKit Components
Responsibilities:
- User interface rendering
- WebRTC connection management
- Audio visualization
- Real-time transcription display
- Token acquisition from backend
Key Files:
frontend/src/App.jsx- Main application componentfrontend/src/components/LiveKitModal.jsx- Session managementfrontend/src/components/SimpleVoiceAssistant.jsx- Voice UI
Architecture Pattern: Component-based architecture with hooks
Technology: FastAPI + Uvicorn + Python 3.11
Responsibilities:
- JWT token generation for LiveKit access
- Room management (create, list, delete)
- Database operations (CRUD)
- API rate limiting
- Health monitoring
- Metrics collection
Key Files:
backend/server.py- FastAPI application and API endpointsbackend/agent.py- LiveKit agent entry pointbackend/api.py- TutorAgent with teaching toolsbackend/db_driver_enhanced.py- Database abstraction layer
Architecture Pattern: Layered architecture with async/await
βββββββββββββββββββββββββββββββββββ
β API Layer (FastAPI) β
β - Routing β
β - Validation (Pydantic) β
β - Rate Limiting β
ββββββββββββββ¬βββββββββββββββββββββ
β
ββββββββββββββΌβββββββββββββββββββββ
β Business Logic Layer β
β - Token Generation β
β - Room Management β
β - User Management β
ββββββββββββββ¬βββββββββββββββββββββ
β
ββββββββββββββΌβββββββββββββββββββββ
β Data Access Layer β
β - Database Driver β
β - Model Classes β
βββββββββββββββββββββββββββββββββββ
Technology: LiveKit Agents Framework + OpenAI Realtime API
Responsibilities:
- Handle real-time voice streams
- AI conversation management
- Teaching tool execution
- Conversation state tracking
- Subtopic management
Key Components:
TutorAgentclass - Main agent logic- Function tools - check_understanding, provide_example, etc.
- Event handlers - User speech processing
Flow:
User Speech β LiveKit β Agent β OpenAI Realtime API β Response β LiveKit β User
Primary: PostgreSQL 16 (Production) Fallback: SQLite (Development)
Schema:
-- Core Tables
subtopics # Educational content
conversations # Session tracking
messages # Conversation history
user_profiles # User data & preferences
session_analytics # Performance metrics
-- Indexes for Performance
idx_subtopics_topic
idx_conversations_room
idx_messages_conversation
idx_messages_timestampData Model:
Conversation (1) βββ (N) Message
β
ββββββ (1) UserProfile
Subtopic βββ (N) Conversation (via topic)
Technology: Redis 7
Use Cases:
- Session data caching
- Rate limiting counters
- LiveKit room state
- Temporary data storage
Cache Strategy:
- TTL-based expiration
- Write-through for critical data
- Cache-aside for read-heavy operations
Components:
-
Prometheus - Metrics collection
- Custom metrics from FastAPI
- System metrics
- LiveKit metrics
-
Grafana - Visualization
- Pre-built dashboards
- Alerting rules
- Query interface
Metrics Collected:
- Request rates and latencies
- Error rates by endpoint
- Token generation success/failure
- Room creation statistics
- Database query performance
1. User opens app β Frontend loads
2. User enters name β Form submission
3. Frontend requests token β GET /api/getToken
4. Backend generates JWT β With room grants
5. Frontend receives token β Initializes LiveKit
6. LiveKit establishes connection β WebRTC handshake
7. Agent joins room β Welcomes user
8. Conversation begins β Real-time audio
1. User speaks β Audio captured by browser
2. LiveKit encodes β Sent to server
3. Agent receives audio β Transcribed by OpenAI
4. Text analyzed β Determine response
5. OpenAI generates β Speech + text
6. Agent sends response β Via LiveKit
7. Browser plays audio β User hears response
8. Message saved β Database for history
1. API request received β FastAPI endpoint
2. Metrics updated β Prometheus counters/histograms
3. Metrics exposed β /api/metrics endpoint
4. Prometheus scrapes β Every 15 seconds
5. Grafana queries β Prometheus data
6. Dashboards updated β Real-time visualization
7. Alerts triggered β If thresholds exceeded
JWT Token Flow:
Client β Backend (/api/token) β Validate input
β Generate JWT with:
- Identity
- Room name
- Expiration (2 hours)
- Permissions
Backend β Client (Token)
Client β LiveKit (Token) β Validate signature
β Grant access
Security Features:
- Rate limiting (10 requests/minute per IP)
- Input validation (Pydantic models)
- CORS restrictions (configurable origins)
- Token expiration
- No sensitive data in JWT
HTTPS/WSS:
- All production traffic encrypted
- TLS 1.2+ required
- Certificate validation
Headers:
- X-Frame-Options: SAMEORIGIN
- X-Content-Type-Options: nosniff
- X-XSS-Protection: 1; mode=block
At Rest:
- PostgreSQL encryption available
- Redis password protection
- Volume encryption (Docker/Kubernetes)
In Transit:
- WebRTC DTLS encryption
- HTTPS for API calls
- Secure WebSocket (WSS)
Backend:
- Stateless design enables multiple instances
- Load balancer distributes requests
- Database connection pooling
- Redis for shared state
Frontend:
- Static files served by CDN
- Multiple Nginx instances
- Gzip compression
- Cache headers
Database:
- PostgreSQL read replicas
- Connection pooling (50-100 connections)
- Query optimization with indexes
- Partitioning for large tables
Redis:
- Persistence enabled (AOF)
- Memory limits configured
- Eviction policies (LRU)
| Metric | Target | Current |
|---|---|---|
| API Latency (p95) | < 200ms | ~100ms |
| Token Generation | < 50ms | ~30ms |
| Database Query | < 10ms | ~5ms |
| WebRTC Connection | < 3s | ~2s |
| Concurrent Users | 1000+ | Tested to 100 |
Local Machine:
- Docker Compose (all services)
- Hot reload enabled
- Debug logging
- SQLite database
Cloud Infrastructure:
- Kubernetes cluster
- PostgreSQL managed service
- Redis managed service
- LiveKit cloud
- Lower resource limits
Cloud Infrastructure:
- Multi-zone Kubernetes
- PostgreSQL with replicas
- Redis cluster mode
- LiveKit cloud (enterprise)
- Auto-scaling enabled
- CDN for static assets
- Full monitoring stack
| Layer | Technology | Purpose |
|---|---|---|
| Frontend | React 18 | UI framework |
| Vite 6 | Build tool | |
| LiveKit Components | Real-time UI | |
| Backend | FastAPI | API framework |
| Uvicorn | ASGI server | |
| Python 3.11 | Language | |
| Agent | LiveKit Agents | Voice handling |
| OpenAI Realtime | AI conversation | |
| Database | PostgreSQL 16 | Primary database |
| SQLite | Development | |
| Cache | Redis 7 | Session & cache |
| Monitoring | Prometheus | Metrics |
| Grafana | Dashboards | |
| Deployment | Docker | Containerization |
| Docker Compose | Local orchestration | |
| Kubernetes | Production orchestration |
- Separation of Concerns - Clear layer boundaries
- Async-First - Non-blocking operations throughout
- Fail Fast - Validate inputs early
- Observable - Comprehensive logging and metrics
- Scalable - Horizontal scaling by design
- Secure - Security in every layer
- Testable - High test coverage
- Documented - Code and API documentation
- Event-Driven Architecture - Message queue (RabbitMQ/Kafka)
- Microservices - Split monolith into services
- Service Mesh - Istio for service-to-service communication
- GraphQL API - Alternative to REST
- WebSocket Server - Real-time updates beyond LiveKit
- Machine Learning Pipeline - Conversation analysis
- Multi-Region Deployment - Global availability
- Edge Computing - CDN for dynamic content
For implementation details, see CHANGES.md For contribution guidelines, see CONTRIBUTING.md