Skip to content

Latest commit

Β 

History

History
401 lines (327 loc) Β· 12.7 KB

File metadata and controls

401 lines (327 loc) Β· 12.7 KB

πŸ—οΈ Architecture Documentation

System Overview

The LiveKit Voice Agent is a modern, production-ready voice tutoring platform built with a microservices architecture. The system combines real-time communication, AI-powered conversation, and comprehensive data persistence.

High-Level Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                         Client Layer                             β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”         β”‚
β”‚  β”‚   Browser    β”‚  β”‚    Mobile    β”‚  β”‚   Desktop    β”‚         β”‚
β”‚  β”‚  (React App) β”‚  β”‚   (Future)   β”‚  β”‚   (Future)   β”‚         β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                              β”‚
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚    Load Balancer    β”‚
                    β”‚   (Nginx/ALB)      β”‚
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                              β”‚
        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
        β”‚                     β”‚                     β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”
β”‚   Frontend     β”‚   β”‚    Backend      β”‚   β”‚  LiveKit    β”‚
β”‚ React + Vite   β”‚   β”‚    FastAPI      β”‚   β”‚   Server    β”‚
β”‚   (Nginx)      β”‚   β”‚  (Uvicorn)      β”‚   β”‚   (WebRTC)  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜
                             β”‚                     β”‚
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
                    β”‚        β”‚                     β”‚
            β”Œβ”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β” β”Œβ–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”
            β”‚ PostgreSQL β”‚ β”‚   Redis   β”‚  β”‚   OpenAI    β”‚
            β”‚  Database  β”‚ β”‚   Cache   β”‚  β”‚ Realtime APIβ”‚
            β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                    β”‚
            β”Œβ”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
            β”‚                  β”‚
    β”Œβ”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚   Prometheus   β”‚  β”‚   Grafana    β”‚
    β”‚    (Metrics)   β”‚  β”‚ (Dashboards) β”‚
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Component Details

1. Frontend Layer

Technology: React 18 + Vite 6 + LiveKit Components

Responsibilities:

  • User interface rendering
  • WebRTC connection management
  • Audio visualization
  • Real-time transcription display
  • Token acquisition from backend

Key Files:

  • frontend/src/App.jsx - Main application component
  • frontend/src/components/LiveKitModal.jsx - Session management
  • frontend/src/components/SimpleVoiceAssistant.jsx - Voice UI

Architecture Pattern: Component-based architecture with hooks

2. Backend Layer

Technology: FastAPI + Uvicorn + Python 3.11

Responsibilities:

  • JWT token generation for LiveKit access
  • Room management (create, list, delete)
  • Database operations (CRUD)
  • API rate limiting
  • Health monitoring
  • Metrics collection

Key Files:

  • backend/server.py - FastAPI application and API endpoints
  • backend/agent.py - LiveKit agent entry point
  • backend/api.py - TutorAgent with teaching tools
  • backend/db_driver_enhanced.py - Database abstraction layer

Architecture Pattern: Layered architecture with async/await

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚      API Layer (FastAPI)        β”‚
β”‚  - Routing                      β”‚
β”‚  - Validation (Pydantic)        β”‚
β”‚  - Rate Limiting                β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
             β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚     Business Logic Layer        β”‚
β”‚  - Token Generation             β”‚
β”‚  - Room Management              β”‚
β”‚  - User Management              β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
             β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚      Data Access Layer          β”‚
β”‚  - Database Driver              β”‚
β”‚  - Model Classes                β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

3. LiveKit Agent

Technology: LiveKit Agents Framework + OpenAI Realtime API

Responsibilities:

  • Handle real-time voice streams
  • AI conversation management
  • Teaching tool execution
  • Conversation state tracking
  • Subtopic management

Key Components:

  • TutorAgent class - Main agent logic
  • Function tools - check_understanding, provide_example, etc.
  • Event handlers - User speech processing

Flow:

User Speech β†’ LiveKit β†’ Agent β†’ OpenAI Realtime API β†’ Response β†’ LiveKit β†’ User

4. Database Layer

Primary: PostgreSQL 16 (Production) Fallback: SQLite (Development)

Schema:

-- Core Tables
subtopics           # Educational content
conversations       # Session tracking
messages            # Conversation history
user_profiles       # User data & preferences
session_analytics   # Performance metrics

-- Indexes for Performance
idx_subtopics_topic
idx_conversations_room
idx_messages_conversation
idx_messages_timestamp

Data Model:

Conversation (1) ─── (N) Message
     β”‚
     └───── (1) UserProfile

Subtopic ─── (N) Conversation (via topic)

5. Caching Layer

Technology: Redis 7

Use Cases:

  • Session data caching
  • Rate limiting counters
  • LiveKit room state
  • Temporary data storage

Cache Strategy:

  • TTL-based expiration
  • Write-through for critical data
  • Cache-aside for read-heavy operations

6. Monitoring & Observability

Components:

  1. Prometheus - Metrics collection

    • Custom metrics from FastAPI
    • System metrics
    • LiveKit metrics
  2. Grafana - Visualization

    • Pre-built dashboards
    • Alerting rules
    • Query interface

Metrics Collected:

  • Request rates and latencies
  • Error rates by endpoint
  • Token generation success/failure
  • Room creation statistics
  • Database query performance

Data Flow

1. User Join Flow

1. User opens app β†’ Frontend loads
2. User enters name β†’ Form submission
3. Frontend requests token β†’ GET /api/getToken
4. Backend generates JWT β†’ With room grants
5. Frontend receives token β†’ Initializes LiveKit
6. LiveKit establishes connection β†’ WebRTC handshake
7. Agent joins room β†’ Welcomes user
8. Conversation begins β†’ Real-time audio

2. Message Flow

1. User speaks β†’ Audio captured by browser
2. LiveKit encodes β†’ Sent to server
3. Agent receives audio β†’ Transcribed by OpenAI
4. Text analyzed β†’ Determine response
5. OpenAI generates β†’ Speech + text
6. Agent sends response β†’ Via LiveKit
7. Browser plays audio β†’ User hears response
8. Message saved β†’ Database for history

3. Monitoring Flow

1. API request received β†’ FastAPI endpoint
2. Metrics updated β†’ Prometheus counters/histograms
3. Metrics exposed β†’ /api/metrics endpoint
4. Prometheus scrapes β†’ Every 15 seconds
5. Grafana queries β†’ Prometheus data
6. Dashboards updated β†’ Real-time visualization
7. Alerts triggered β†’ If thresholds exceeded

Security Architecture

1. Authentication & Authorization

JWT Token Flow:

Client β†’ Backend (/api/token) β†’ Validate input
                               β†’ Generate JWT with:
                                   - Identity
                                   - Room name
                                   - Expiration (2 hours)
                                   - Permissions
Backend β†’ Client (Token)
Client β†’ LiveKit (Token) β†’ Validate signature
                          β†’ Grant access

Security Features:

  • Rate limiting (10 requests/minute per IP)
  • Input validation (Pydantic models)
  • CORS restrictions (configurable origins)
  • Token expiration
  • No sensitive data in JWT

2. Network Security

HTTPS/WSS:

  • All production traffic encrypted
  • TLS 1.2+ required
  • Certificate validation

Headers:

  • X-Frame-Options: SAMEORIGIN
  • X-Content-Type-Options: nosniff
  • X-XSS-Protection: 1; mode=block

3. Data Security

At Rest:

  • PostgreSQL encryption available
  • Redis password protection
  • Volume encryption (Docker/Kubernetes)

In Transit:

  • WebRTC DTLS encryption
  • HTTPS for API calls
  • Secure WebSocket (WSS)

Scalability Considerations

Horizontal Scaling

Backend:

  • Stateless design enables multiple instances
  • Load balancer distributes requests
  • Database connection pooling
  • Redis for shared state

Frontend:

  • Static files served by CDN
  • Multiple Nginx instances
  • Gzip compression
  • Cache headers

Vertical Scaling

Database:

  • PostgreSQL read replicas
  • Connection pooling (50-100 connections)
  • Query optimization with indexes
  • Partitioning for large tables

Redis:

  • Persistence enabled (AOF)
  • Memory limits configured
  • Eviction policies (LRU)

Performance Targets

Metric Target Current
API Latency (p95) < 200ms ~100ms
Token Generation < 50ms ~30ms
Database Query < 10ms ~5ms
WebRTC Connection < 3s ~2s
Concurrent Users 1000+ Tested to 100

Deployment Architecture

Development

Local Machine:
  - Docker Compose (all services)
  - Hot reload enabled
  - Debug logging
  - SQLite database

Staging

Cloud Infrastructure:
  - Kubernetes cluster
  - PostgreSQL managed service
  - Redis managed service
  - LiveKit cloud
  - Lower resource limits

Production

Cloud Infrastructure:
  - Multi-zone Kubernetes
  - PostgreSQL with replicas
  - Redis cluster mode
  - LiveKit cloud (enterprise)
  - Auto-scaling enabled
  - CDN for static assets
  - Full monitoring stack

Technology Stack Summary

Layer Technology Purpose
Frontend React 18 UI framework
Vite 6 Build tool
LiveKit Components Real-time UI
Backend FastAPI API framework
Uvicorn ASGI server
Python 3.11 Language
Agent LiveKit Agents Voice handling
OpenAI Realtime AI conversation
Database PostgreSQL 16 Primary database
SQLite Development
Cache Redis 7 Session & cache
Monitoring Prometheus Metrics
Grafana Dashboards
Deployment Docker Containerization
Docker Compose Local orchestration
Kubernetes Production orchestration

Design Principles

  1. Separation of Concerns - Clear layer boundaries
  2. Async-First - Non-blocking operations throughout
  3. Fail Fast - Validate inputs early
  4. Observable - Comprehensive logging and metrics
  5. Scalable - Horizontal scaling by design
  6. Secure - Security in every layer
  7. Testable - High test coverage
  8. Documented - Code and API documentation

Future Architecture Enhancements

  1. Event-Driven Architecture - Message queue (RabbitMQ/Kafka)
  2. Microservices - Split monolith into services
  3. Service Mesh - Istio for service-to-service communication
  4. GraphQL API - Alternative to REST
  5. WebSocket Server - Real-time updates beyond LiveKit
  6. Machine Learning Pipeline - Conversation analysis
  7. Multi-Region Deployment - Global availability
  8. Edge Computing - CDN for dynamic content

For implementation details, see CHANGES.md For contribution guidelines, see CONTRIBUTING.md