Part of the aiworlds.online ecosystem
aiworlds.online is an experimental platform for testing and applying cutting-edge AI technologies in interactive experiences. The platform focuses on creating immersive voice-driven RPG games that combine multiple AI services into seamless real-time interactions.
The aiworlds.online ecosystem consists of several interconnected services:
- lk-agent (this repository) - Real-time voice agent for AI-powered RPG game narration
- story-api - Backend API for game data, user management, and turn persistence
- story-front - Frontend web application for game interface and visualization
- StoryImageGen - AI-powered image generation service for scene visualization
lk-agent is a real-time voice agent built on LiveKit Agents SDK that powers interactive RPG experiences with AI-driven game master narration. The agent processes player voice input, generates contextual responses using LLM, and delivers natural speech output in multiple languages.
aiworlds.online platform home screen with game world selection
Try Live Demo AIWorlds.online
- Real-time Voice Interaction - Seamless voice-to-voice communication with sub-second latency
- Multi-language Support - Full support for English, Russian, Dutch, French, and Spanish
- Dynamic Language Switching - Change language on-the-fly during active sessions
- AI Game Master - Context-aware RPG narration powered by GPT-4o
- Scene Visualization - Automatic generation of scene images for each turn
- Intelligent Context Management - RAG-powered world lore integration with conversation summaries
- Persistent Game State - Integration with backend API for game progression tracking
Dynamic language and speech speed settings available during gameplay
The agent implements a sophisticated voice processing pipeline:
┌─────────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Player │────▶│ VAD │────▶│ STT │────▶│ LLM │
│ Audio │ │ (Silero) │ │ (Deepgram) │ │ (GPT-4o) │
└─────────────┘ └──────────────┘ └──────────────┘ └──────────────┘
│
▼
┌─────────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Player │◀────│ TTS │◀────│ Turn Detect │◀────│ Response │
│ Audio │ │ (Cartesia) │ │(Multilingual)│ │ Generation │
└─────────────┘ └──────────────┘ └──────────────┘ └──────────────┘
- VAD (Voice Activity Detection) - Silero VAD detects when player starts/stops speaking
- STT (Speech-to-Text) - Deepgram nova-3 model with multilingual support
- LLM (Language Model) - OpenAI GPT-4o generates contextual RPG responses
- Turn Detection - Multilingual model manages conversation flow
- TTS (Text-to-Speech) - Cartesia sonic-2 synthesizes natural voice output
┌─────────────────────────────────────────────────────────────────┐
│ aiworlds.online │
│ │
│ ┌────────────────┐ ┌───────────────┐ ┌──────────┐ │
│ │ story-front │◀────▶│ lk-agent │◀────▶│story-api │ │
│ │ (Web UI) │ │ (Voice Agent) │ │(Backend) │ │
│ └────────────────┘ └───────────────┘ └──────────┘ │
│ │ │ │ │
│ │ │ │ │
│ │ ▼ │ │
│ │ ┌──────────────────┐ │ │
│ └─────────────▶│ StoryImageGen │◀───────────┘ │
│ │ (Image AI) │ │
│ └──────────────────┘ │
│ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ LiveKit Server (Real-time Media) │ │
│ └──────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
- Game Initialization - Agent fetches game data from story-api
- Voice Input - Player speaks, audio processed through VAD → STT
- Context Building - Game state + conversation history + player input
- Response Generation - LLM generates RPG narration
- Image Generation - StoryImageGen creates scene visualization (async)
- Voice Output - TTS synthesizes response in selected language
- State Persistence - Turn saved to story-api with image URL
- Framework: LiveKit Agents SDK 1.x
- Language: Python 3.11+
- Real-time Media: LiveKit Cloud/Server
| Component | Provider | Model | Purpose |
|---|---|---|---|
| STT | Deepgram | nova-3 | Speech recognition with multilingual support |
| LLM | OpenAI | gpt-4o | RPG game master responses |
| TTS | Cartesia | sonic-2 | Natural voice synthesis (5 languages) |
| VAD | Silero | - | Voice activity detection |
| Image Gen | Custom | Stable Diffusion | Scene visualization |
The agent employs a sophisticated context management system to maintain coherent and immersive RPG narratives across extended gameplay sessions.
Each interaction with the LLM includes carefully structured context:
- System Prompt - Game master instructions with role, rules, and response style guidelines
- World Lore - Base description of the selected game world's setting, history, and atmosphere
- Latest Game Summary - Compressed narrative of previous gameplay (generated every 6 turns)
- Turn History - Detailed conversation history since the last summary
- RAG-Enhanced World Details - Relevant chunks from detailed world lore retrieved based on current context
This hierarchical approach ensures the AI has comprehensive context while staying within token limits.
A separate context pipeline manages scene visualization:
- Chat History Analysis - Current GM response and recent player actions
- Character Appearance - Persistent visual description of the player's character
- Illustration Style - Game-specific art style prompt for visual consistency
- Scene Context - Extracted key visual elements from the current narrative moment
The StoryImageGen service processes this context to generate contextually relevant scene images that match the game's aesthetic and current narrative state.
- Automatic Summarization - Every 6 turns, detailed history is compressed into narrative summaries
- Smart Context Pruning - Old turn history is replaced with summaries to prevent token overflow
- RAG Integration - Only relevant world lore chunks are included based on current gameplay
- Final Session Summary - Complete game session is summarized on disconnect for next session continuity
The platform is actively developing support for collaborative RPG experiences:
- Multi-user Sessions - Multiple players in the same game room
- Shared World State - Synchronized game progression across all participants
- Turn Management - Coordinated player turn system for group interactions
- Team Dynamics - AI game master adapts narration for group decision-making
- Individual Voice Channels - Separate voice processing per player while maintaining shared context
This feature will enable cooperative storytelling where multiple players can interact with the same AI game master in real-time.
Built with LiveKit Agents SDK | Part of aiworlds.online ecosystem
