Production-grade FastAPI backend powering the BetterAI platform — a multi-provider AI system that unifies text generation, image/video creation, voice chat, speech-to-text, text-to-speech, semantic search, health data aggregation, and agentic workflows behind a single async API.
Built as a personal project to explore and integrate the full spectrum of modern AI APIs into a cohesive, well-architected backend.
- Multi-provider AI orchestration — 40+ models across 8 providers (OpenAI, Anthropic, Google Gemini, xAI, Groq, Perplexity, DeepSeek, ElevenLabs) with a pluggable registry pattern that makes adding new providers a configuration exercise, not a rewrite
- Real-time streaming architecture — WebSocket and SSE streaming with token-based completion ownership to prevent race conditions in concurrent streams
- Clean layered architecture — strict separation into
core/(infrastructure),features/(business logic),infrastructure/(external integrations), andconfig/(centralized configuration) - Async-first design — everything from database queries (SQLAlchemy async) to AWS operations (aioboto3) to AI provider calls runs on async/await
- Comprehensive testing — 299 test files covering unit, integration, API, live provider, performance, and regression scenarios
- 18 self-contained feature modules, each following the same routes → services → repositories pattern
| Layer | Technologies |
|---|---|
| Framework | FastAPI, Uvicorn, Pydantic v2 |
| Language | Python 3.10+ |
| Databases | Postgres (Supabase), SQLAlchemy (async ORM), Qdrant (vector DB) |
| AI Providers | OpenAI, Anthropic, Google Gemini, xAI, Groq, Perplexity, DeepSeek, Deepgram, ElevenLabs, Stability AI, Flux, KlingAI |
| Cloud | AWS (S3, SQS) |
| Auth | JWT (python-jose) |
| Streaming | WebSockets, Server-Sent Events |
| Testing | pytest, pytest-asyncio, testcontainers |
- Chat — multi-provider text conversation with WebSocket/SSE streaming and session management
- Image Generation — OpenAI (GPT Image, DALL-E), Stability AI, Flux, Gemini, xAI Grok
- Video Generation — Gemini Veo, OpenAI Sora, KlingAI (text-to-video, image-to-video, video extension, native audio, lip-sync)
- Realtime Voice — bidirectional audio streaming via OpenAI Realtime API and Gemini Live
- Text-to-Speech — OpenAI and ElevenLabs with real-time WebSocket streaming (audio synthesis starts before text generation completes)
- Speech-to-Text — Deepgram, OpenAI Whisper, Gemini (static file and streaming transcription)
- Batch Processing — async batch API for OpenAI, Anthropic, and Gemini with 50% cost reduction
- Agentic Workflows — multi-iteration tool loops for browser automation, chart generation, and image/video creation
- Semantic Search — vector search with Qdrant + OpenAI embeddings, hybrid/keyword/semantic modes, tag and date filtering
- Proactive Agent — multi-character AI frameworks - one using Openclaw/Claude Agent SDK as brain, another one Claude Agent SDK with SQS-driven push notifications
- Garmin Health — sleep, activity, body composition, HRV, and training readiness data aggregation
- Blood Tracking — blood test result storage and analysis
- S3 Storage — file upload with signed URLs
storage-backend/
├── main.py # FastAPI app factory & entry point
├── core/ # Cross-cutting infrastructure
│ ├── providers/ # Pluggable AI provider registry
│ │ ├── text/ # 8 providers, 40+ models
│ │ ├── image/ # 5 providers (OpenAI, Stability, Flux, Gemini, xAI)
│ │ ├── video/ # 3 providers (Gemini, OpenAI, KlingAI)
│ │ ├── audio/ # 3 providers (Deepgram, OpenAI, Gemini)
│ │ ├── realtime/ # 2 providers (OpenAI, Gemini)
│ │ ├── tts/ # 2 providers (OpenAI, ElevenLabs)
│ │ ├── semantic/ # Vector embeddings & search
│ │ ├── batch/ # Batch processing (OpenAI, Anthropic, Gemini)
│ │ └── registry/ # Model registry & resolution
│ ├── streaming/ # Token-based completion ownership
│ ├── auth/ # JWT authentication
│ ├── clients/ # AI SDK client initialization
│ └── observability/ # Metrics, tracing, request logging
│
├── features/ # Domain-specific business logic
│ ├── chat/ # Sessions, messages, WebSocket/SSE
│ ├── realtime/ # Voice chat (OpenAI Realtime, Gemini Live)
│ ├── audio/ # Speech-to-text endpoints
│ ├── image/ # Image generation
│ ├── video/ # Video generation & extension
│ ├── tts/ # Text-to-speech with WebSocket streaming
│ ├── semantic_search/ # Vector search (Qdrant + embeddings)
│ ├── batch/ # Batch job submission & results
│ ├── proactive_agent/ # Multi-character AI framework
│ ├── garmin/ # Garmin health data integration
│ ├── db/ # Blood, Garmin, UFC data features
│ ├── storage/ # S3 file upload
│ ├── automation/ # Workflow scheduling
│ └── journal/ # Journal entries
│
├── infrastructure/ # External integrations
│ ├── db/ # MySQL session factories & migrations
│ └── aws/ # S3 & SQS clients
│
├── config/ # Centralized configuration by domain
│ ├── text/ # LLM provider configs
│ ├── audio/, tts/, image/ # Feature-specific settings
│ ├── database/ # DB connection config
│ └── aws/ # AWS credentials & endpoints
│
├── tests/ # 299 test files
│ ├── unit/ # Core & infrastructure unit tests
│ ├── integration/ # Multi-service integration tests
│ ├── api/ # Endpoint tests (httpx AsyncClient)
│ ├── features/ # Feature-specific tests
│ ├── live_api/ # Real provider API tests
│ ├── performance/ # Performance benchmarks
│ └── regression/ # Regression validation
│
└── DocumentationApp/ # 18 comprehensive handbooks
Provider Registry — AI providers are registered at import time and resolved dynamically via factory functions. Adding a new provider means defining model configs, implementing a provider class, and registering it — no changes to existing code.
# Registration
register_text_provider("openai", OpenAITextProvider)
# Resolution — factory picks the right provider based on model name
provider = get_text_provider(settings)Token-Based Completion Ownership — the StreamingManager prevents race conditions in concurrent WebSocket/TTS streams. Only the code holding the completion token can signal stream completion, enforced at runtime.
Feature Module Pattern — each feature is self-contained with a consistent internal structure:
features/<domain>/
├── routes.py # FastAPI router (HTTP/WebSocket)
├── services/ # Business logic
├── repositories/ # Database operations
├── schemas/ # Pydantic request/response models
└── utils/ # Feature-specific helpers
| Category | Providers | Example Models |
|---|---|---|
| Text | OpenAI, Anthropic, Google, xAI, Groq, Perplexity, DeepSeek | GPT-5, Claude Opus 4, Gemini 2.5 Pro, Grok 4, Sonar Deep Research |
| Image | OpenAI, Flux, Stability, Gemini, xAI | GPT Image 1.5, Flux 2 Pro, Stable Diffusion 3.5, Gemini Image |
| Video | Gemini, OpenAI, KlingAI | Veo 3.1, Sora 2, Kling V2.6 Pro |
| Voice | OpenAI, Google | GPT Realtime, Gemini Live |
| STT | Deepgram, OpenAI, Google | Nova-3, Whisper, Gemini 2.5 |
| TTS | OpenAI, ElevenLabs | GPT-4o TTS, 25+ custom voices |
Four isolated MySQL databases, each served by async SQLAlchemy sessions:
| Database | Purpose |
|---|---|
| Main | Chat sessions, messages, user data |
| Garmin | Health metrics (sleep, activity, body composition, HRV, training status) |
| Blood | Blood test results and analysis |
| UFC | Fighter data and subscriptions |
- Python 3.10+
- MySQL
- AWS account (S3, SQS)
- API keys for desired AI providers
cd storage-backend
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
cp .env.example .env # Configure your API keys and database URLsuvicorn main:app --host 0.0.0.0 --port 8000 --reload# Unit and integration tests
pytest
# Specific test categories
pytest -m "not live_api and not requires_docker"| Endpoint | Method | Description |
|---|---|---|
/health |
GET | Health check |
/chat/ws |
WS | Main chat WebSocket (standard + realtime modes) |
/chat/stream |
POST | SSE streaming chat |
/api/v1/audio/transcribe |
POST | Static file transcription |
/api/v1/audio/transcribe-stream |
WS | Streaming transcription |
/image/generate |
POST | Image generation |
/video/generate |
POST | Video generation |
/api/v1/tts/generate |
POST | Sync text-to-speech |
/tts/stream |
WS | Streaming text-to-speech |
/api/v1/batch/ |
POST | Submit batch job |
/api/v1/semantic/health |
GET | Semantic search status |
/api/v1/garmin/analysis/overview |
GET | Aggregated health data |
/api/v1/storage/upload |
POST | S3 file upload |
The DocumentationApp/ directory contains 18 detailed handbooks covering architecture, features, database design, testing strategy, WebSocket event contracts, and troubleshooting.
- Developer Handbook — ground truth for architecture, layered structure, and development patterns
- Backend Capabilities Reference — comprehensive feature and model catalog
- AI Reference (Minimal) — token-efficient summary of all features and models
- Code Review Instructions — FastAPI-specific review checklist
- Troubleshooting Guidelines — systematic 6-step debugging framework
- WebSocket Events Handbook — complete WebSocket event catalog and frontend contract
- WebSocket TTS Streaming — real-time TTS streaming architecture (parallel text + audio)
- Semantic Search Handbook — vector search architecture, Qdrant integration, and configuration
- Semantic Search Settings Guide — search mode and filter configuration reference
- Image & Video Generation — multi-provider image/video generation systems
- Batch API Handbook — batch processing guide (OpenAI, Anthropic, Gemini)
- Garmin Integration — health data aggregation and Garmin API integration
- Deep Research Handbook — deep research workflows
- Text Providers Config — LLM provider configuration details
- Database Handbook — database design, ORM setup, and migration strategy
- Testing Guide — test strategy, markers, conventions, and E2E scripts
- Manual & Live Test Readiness — provider test readiness checks