Backend for an AI-powered technical support agent that answers field technician questions from 89+ product manuals.
Solar energy installers working on rooftops with inverters, battery storage, and EV chargers need fast, reliable answers from product documentation. The current support process fails them: call the hotline, wait on hold, get told to email photos, call back the next day, reach a different agent, and start the explanation over. Meanwhile, the installer is still on the roof.
This backend replaces that loop with a single conversation. A technician types a question β or sends a photo of an error code β and Nik (named after Nikola Tesla), a ReAct agent, searches the complete documentation library, reasons about what it found, and delivers a precise answer with document name and page number citations. The agent is not a keyword search engine. It decides autonomously which tools to call, how many times to search, when to retrieve a full document for deeper context, and when to ask the technician a clarifying question before answering. All responses are generated in German, the working language of the installers.
The system currently indexes 89 product documents across 700+ searchable chunks, with health monitoring across four external services (PostgreSQL, MinIO, ChromaDB, Cohere).
| Layer | Technology |
|---|---|
| AI orchestration | LangGraph (ReAct agent, checkpointer, interrupt/resume) |
| Generation + vision | OpenAI GPT-4o |
| Embeddings | OpenAI text-embedding-3-large (3072 dimensions) |
| Reranking | Cohere rerank-v4.0-pro cross-encoder |
| API framework | FastAPI (fully async) |
| Streaming | SSE via sse-starlette |
| Relational DB | PostgreSQL 16 (asyncpg + SQLAlchemy async) |
| Object storage | MinIO (S3-compatible) |
| Vector DB | ChromaDB (cosine similarity) |
| Migrations | Alembic (async PostgreSQL) |
| Configuration | pydantic-settings, 12-factor env vars |
| Authentication | fastapi-users v15 (JWT bearer, email verification) |
| Admin panel | SQLAdmin (web-based database management) |
| Transactional email | AWS SES via boto3 (console fallback for dev) |
| Capability | Description |
|---|---|
| Agent Intelligence | |
| ReAct agent loop | LangGraph-based autonomous agent that reasons, selects tools, observes results, and iterates β not a fixed pipeline |
| Interrupt/resume clarification | Agent pauses mid-execution to ask the user a question, then resumes exactly where it left off on reply |
| Multimodal image analysis | Users upload photos of error codes, displays, or wiring; GPT-4o analyzes them inline with the conversation |
| Search and Retrieval | |
| Semantic search + reranking | ChromaDB vector search retrieves candidates; Cohere cross-encoder reranks to the most relevant chunks |
| Full document retrieval | Agent can pull a complete document with all sections, tables, image descriptions, and a PDF download link |
| Source citations | Every answer cites document name and page number β the agent never fabricates information |
| Real-Time Interaction | |
| SSE streaming | Token-by-token response streaming with live tool-call notifications via Server-Sent Events |
| Stateless one-shot mode | Separate endpoint for single questions without session overhead |
| Authentication and Access Control | |
| JWT authentication | fastapi-users with Bearer token; email verification required for login |
| Code-based verification | 8-digit verification codes via email (AWS SES) as alternative to token links |
| Role-based access control | Three tiers β active user, verified user, superuser β each protecting different endpoint groups |
| Admin panel | Web-based SQLAdmin UI and REST API for user and data management (superuser only) |
| Data and Persistence | |
| Multi-turn chat sessions | Dual storage: LangGraph checkpointer for agent memory, PostgreSQL for app records and message history |
| Image attachments | Per-session image uploads stored in MinIO with ownership-verified presigned URL access |
| Operations | |
| Per-user rate limiting | Sliding-window in-memory limiter on chat and upload endpoints (configurable per route) |
| Health monitoring | Dependency checks for PostgreSQL, MinIO, ChromaDB, and Cohere with a Kubernetes readiness probe |
flowchart TD
A[User sends question] --> B[ReAct Agent]
B --> C{Reason}
C -->|search docs| D[search_knowledge]
C -->|need full context| E[get_full_document]
C -->|ambiguous query| F[ask_user_clarification]
C -->|ready to answer| G[Generate answer with citations]
D --> H{Sufficient?}
E --> H
H -->|no| C
H -->|yes| G
F -->|user replies| B
G --> I[Persist + respond]
I --> J[JSON or SSE stream]
style A fill:#dbeafe,stroke:#93c5fd,color:#1e3a5f
style B fill:#f8fafc,stroke:#cbd5e1,color:#334155
style C fill:#f8fafc,stroke:#cbd5e1,color:#334155
style D fill:#fef3c7,stroke:#fcd34d,color:#713f12
style E fill:#fef3c7,stroke:#fcd34d,color:#713f12
style F fill:#ffedd5,stroke:#fdba74,color:#7c2d12
style G fill:#d1fae5,stroke:#6ee7b7,color:#064e3b
style H fill:#f8fafc,stroke:#cbd5e1,color:#334155
style I fill:#f8fafc,stroke:#cbd5e1,color:#334155
style J fill:#d1fae5,stroke:#6ee7b7,color:#064e3b
flowchart LR
API[FastAPI Backend] --> M[MinIO]
API --> C[ChromaDB]
API --> P[PostgreSQL 16]
M --- M1[PDF documents]
M --- M2[Parsed metadata JSON]
M --- M3[User photo uploads]
C --- C1[Vector embeddings]
C --- C2[700+ searchable chunks]
P --- P1[Users + auth + sessions + messages]
P --- P2[Image references + feedback]
P --- P3[Agent checkpoints]
P --- P4[Verification codes]
style API fill:#f8fafc,stroke:#cbd5e1,color:#334155
style M fill:#fef3c7,stroke:#fcd34d,color:#713f12
style C fill:#fef3c7,stroke:#fcd34d,color:#713f12
style P fill:#dbeafe,stroke:#93c5fd,color:#1e3a5f
Each layer serves a distinct purpose: PostgreSQL handles relational queries and ACID transactions (including user authentication, verification codes, and LangGraph agent state via a dedicated psycopg3 connection pool), MinIO stores large binary files with S3-compatible presigned URL access, and ChromaDB provides vector similarity search over document embeddings.
The POST /chat/stream endpoint delivers the agent's response progressively via Server-Sent Events.
| Event | Payload | When |
|---|---|---|
metadata |
{"session_id": "..."} |
Once, after session setup |
tool_start |
{"tool": "search_knowledge"} |
Agent calls a tool |
token |
{"content": "Die Installation..."} |
Each LLM response token |
sources |
[{"filename": "...", "page_number": 8}] |
After answer completes |
done |
{"success": true, "needs_clarification": false} |
Terminal event |
error |
{"message": "..."} |
On failure |
| Method | Endpoint | Auth | Description |
|---|---|---|---|
| Auth | |||
POST |
/auth/register |
-- | Register a new user account |
POST |
/auth/login |
-- | Login and receive a JWT access token |
POST |
/auth/logout |
Bearer | Invalidate the current token |
POST |
/auth/request-verify-token |
-- | Request an email verification token |
POST |
/auth/verify |
-- | Verify email with a JWT token |
POST |
/auth/verify-code |
-- | Verify email with an 8-digit code |
POST |
/auth/forgot-password |
-- | Request a password reset email |
POST |
/auth/reset-password |
-- | Reset password with a JWT token |
POST |
/auth/reset-password-code |
-- | Reset password with an 8-digit code |
GET |
/auth/me |
Bearer | Get current user profile |
PATCH |
/auth/me |
Bearer | Update current user profile |
| Chat | |||
POST |
/chat |
Verified | Send a message and get an AI response |
POST |
/chat/stream |
Verified | Stream the AI response via SSE |
POST |
/chat/upload |
Verified | Upload images for a chat session |
GET |
/chat/{session_id}/messages |
Verified | Get conversation history with images |
PATCH |
/chat/{session_id} |
Verified | Update session title or pinned status |
DELETE |
/chat/{session_id} |
Verified | Soft-delete a session |
GET |
/chat/images/{object_key} |
Verified | Fetch image via presigned URL redirect |
| Users | |||
GET |
/users/me/sessions |
Verified | List chat sessions for the authenticated user |
| Agent | |||
POST |
/agent/ask |
Active | One-shot AI answer β no session, no persistence |
| Search and Documents | |||
GET |
/search |
Active | Semantic search across document chunks |
GET |
/documents |
Active | List all documents with metadata |
GET |
/documents/{id} |
Active | Document details with presigned download URLs |
| Admin (User Management) | |||
GET |
/admin/users |
Superuser | List all users with session counts |
GET |
/admin/users/{user_id} |
Superuser | Get user details |
PATCH |
/admin/users/{user_id} |
Superuser | Update user flags or display name |
DELETE |
/admin/users/{user_id} |
Superuser | Delete a user and all associated data |
| Admin (Embedding) | |||
POST |
/admin/embed |
Superuser | Trigger embedding pipeline for all documents |
POST |
/admin/embed/{id} |
Superuser | Embed a single document |
GET |
/admin/embed/status |
Superuser | Embedding statistics |
| Admin (Web Panel) | |||
| -- | /admin |
Session | SQLAdmin web UI (superuser login) |
| Health | |||
GET |
/health |
-- | System health (PostgreSQL, MinIO, ChromaDB, Cohere) |
GET |
/health/ready |
-- | Kubernetes readiness probe |
Auth levels:
--= public,Active= any active user,Verified= active + email verified,Superuser= active + verified + superuser,Session= superuser session cookie (web UI),Bearer= any authenticated user.
Example: Full Chat Flow
# 1. Register a new account
curl -X POST "http://localhost:8000/auth/register" \
-H "Content-Type: application/json" \
-d '{"email": "techniker@example.com", "display_name": "Max Mustermann", "password": "SecurePass123!"}'
# 2. Verify email (use the 8-digit code from the verification email)
curl -X POST "http://localhost:8000/auth/verify-code" \
-H "Content-Type: application/json" \
-d '{"email": "techniker@example.com", "code": "12345678"}'
# 3. Login to get a JWT token
curl -X POST "http://localhost:8000/auth/login" \
-H "Content-Type: application/x-www-form-urlencoded" \
-d "username=techniker@example.com&password=SecurePass123!"
# Response: {"access_token": "eyJ...", "token_type": "bearer"}
# 4. Send a message (starts a new session automatically)
curl -X POST "http://localhost:8000/chat" \
-H "Authorization: Bearer eyJ..." \
-H "Content-Type: application/json" \
-d '{"question": "Wie installiere ich den X3-Hybrid G4?"}'
# 5. Get conversation history
curl "http://localhost:8000/chat/SESSION_UUID/messages" \
-H "Authorization: Bearer eyJ..."π app/
βββ config/settings.py # pydantic-settings configuration
βββ main.py # Application entry point and lifespan management
β
βββ auth/ # JWT authentication (fastapi-users)
β βββ router.py # Route aggregation (login, register, verify, reset, me)
β βββ code_router.py # Code-based verify-code and reset-password-code endpoints
β βββ backend.py # JWT strategy and Bearer transport configuration
β βββ manager.py # UserManager with email hooks (verification, password reset)
β βββ dependencies.py # Auth deps: current_active_user, verified_user, superuser
β βββ email.py # AWS SES email sending with console fallback
β βββ verification_codes.py # 8-digit code generation and redemption logic
β βββ schemas.py # UserRead, UserCreate, UserUpdate
β
βββ admin/ # Admin panel and user management
β βββ router.py # REST API: list/get/update/delete users (superuser only)
β βββ service.py # AdminService with user CRUD and cascade deletion
β βββ setup.py # SQLAdmin mounting and view registration
β βββ sqladmin_auth.py # Session-based superuser authentication for web panel
β βββ views.py # ModelView definitions for all models
β βββ schemas.py # AdminUserSummary, AdminUserDetail, AdminUserUpdateRequest
β
βββ agent/ # ReAct agent (LangGraph)
β βββ graph.py # AgentState, dual graph compilation (chat + stateless)
β βββ tools.py # search_knowledge, get_full_document, ask_user_clarification
β βββ prompts.py # German system prompt and fallback messages
β βββ checkpointer.py # AsyncPostgresSaver lifecycle (psycopg3 pool)
β βββ client.py # Cohere reranking client
β βββ utils.py # Source extraction from tool messages
β βββ schemas.py
β βββ router.py
β
βββ chat/ # Persistent conversations
β βββ service.py # Orchestration: prepare, invoke/stream, persist
β βββ streaming.py # SSE event protocol over graph.astream()
β βββ schemas.py
β βββ router.py
β
βββ middleware/
β βββ rate_limit.py # Per-user sliding-window limiter
β
βββ users/
β βββ service.py # Session listing with search and pinning
β βββ schemas.py
β βββ router.py # GET /users/me/sessions (authenticated)
β
βββ postgres/
β βββ client.py # Async engine and session factory (asyncpg)
β βββ base.py # SQLAlchemy Base, UUID and Timestamp mixins
β βββ models.py # User, Session, Message, MessageImage, Feedback, VerificationCode
β
βββ minio/client.py # S3-compatible file operations and presigned URLs
βββ chroma/client.py # ChromaDB collection management
β
βββ vectorstore/
β βββ chunking.py # Title-based document sectioning
β βββ service.py # Embedding and semantic search
β βββ router.py
β
βββ documents/ # Document listing and retrieval
βββ health/ # Health check endpoints
π alembic/
βββ env.py # Async migration runner
βββ versions/ # 0001-0004: schema, pinned_at, auth fields, verification codes
π scripts/
βββ seed_minio.py # Load product documents into MinIO
βββ seed_chromadb.py # Generate and store embeddings
The agent calls search_knowledge, which queries ChromaDB with the user's question using text-embedding-3-large (3072 dimensions). ChromaDB returns candidates ranked by cosine similarity, then Cohere's rerank-v4.0-pro cross-encoder rescores them to surface the most relevant chunks.
The ReAct loop evaluates the search results and decides autonomously what to do next. If the chunks lack sufficient detail, the agent calls get_full_document to retrieve the complete document structure β all sections, tables, image descriptions, and a presigned PDF download link. If the query is ambiguous (e.g., multiple product models match), the agent calls ask_user_clarification, which triggers a LangGraph interrupt() to pause execution and wait for the user's reply before resuming.
Once the agent has sufficient context, GPT-4o generates a precise answer in German, citing every source by document name and page number. If the documentation does not contain the answer, the agent states this explicitly rather than fabricating information.
- Python 3.10+
- Docker and Docker Compose
- OpenAI API key (embeddings + generation)
- Cohere API key (reranking β free tier available)
# 1. Clone and configure
cp .env.example .env
# Edit .env β add OPENAI_API_KEY, COHERE_API_KEY, and set JWT_SECRET
# (Optional: configure AWS SES credentials for email verification)
# 2. Create virtual environment
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
# 3. Start infrastructure (MinIO + PostgreSQL)
docker compose up -d
# MinIO Console: http://localhost:9001 (minioadmin/minioadmin)
# PostgreSQL: localhost:5432 (postgres/postgres)
# 4. Run database migrations
alembic upgrade head
# 5. Seed product documents
python scripts/seed_minio.py
# 6. Start the API
uvicorn app.main:app --reload
# API: http://localhost:8000
# Docs: http://localhost:8000/docs
# Admin Panel: http://localhost:8000/admin (requires superuser account)
# 7. Generate embeddings
python scripts/seed_chromadb.py
# 8. Verify health
curl http://localhost:8000/healthBuilt in focused, mergeable increments β each PR self-contained and deployable:
| PR | Title | What was built |
|---|---|---|
| #1 | MinIO storage | S3-compatible blob storage with FastAPI scaffold |
| #2 | Vector embeddings | ChromaDB integration with semantic search endpoint |
| #3 | PostgreSQL + Alembic | Relational models and async migrations |
| #4 | RAG pipeline | LangGraph pipeline with Cohere reranking |
| #5 | Chat persistence | Sessions and messages connected to the agent pipeline |
| #6 | ReAct agent | Autonomous agent replacing the fixed 4-stage pipeline |
| #7-8 | Interrupt/resume | ask_user_clarification tool with LangGraph interrupt flow |
| #9 | Image uploads | Multimodal support with base64 content blocks |
| #10 | Production hardening | SSE streaming, per-user rate limiting, secure image proxy |
| #18 | Session management | Pinned sessions, session update/delete, title search |
| #19 | Authentication | JWT auth with fastapi-users, email verification, code-based flows, AWS SES |
| -- | Admin panel | SQLAdmin web UI, REST user management API, superuser session auth |
The codebase uses async Python throughout with type hints on all functions. Request/response validation uses Pydantic v2, modules follow a domain-driven structure (app/{domain}/router.py, service.py, schemas.py), and services use the singleton pattern. All external dependencies have health checks, AI prompts are externalized for maintainability, and configuration follows the 12-factor app methodology via environment variables.
Key environment variables (see .env.example for the full list):
| Variable | Description |
|---|---|
OPENAI_API_KEY |
Required β embeddings and answer generation |
OPENAI_CHAT_MODEL |
LLM model (default: gpt-4o) |
COHERE_API_KEY |
Required β relevance reranking |
POSTGRES_HOST / POSTGRES_DB |
PostgreSQL connection |
MINIO_ENDPOINT |
MinIO server address (default: localhost:9000) |
CHROMA_PERSIST_DIRECTORY |
ChromaDB storage path (default: ./chroma_data) |
RATE_LIMIT_CHAT |
Chat rate limit (default: 10/minute) |
RATE_LIMIT_UPLOAD |
Upload rate limit (default: 20/minute) |
IMAGE_PRESIGN_EXPIRES_MINUTES |
Image URL expiration (default: 5) |
JWT_SECRET |
Required β JWT signing key (change in production) |
JWT_LIFETIME_SECONDS |
Token expiry (default: 3600) |
AWS_ACCESS_KEY_ID |
Optional β AWS credentials for SES email |
AWS_SECRET_ACCESS_KEY |
Optional β AWS credentials for SES email |
AWS_REGION |
AWS region for SES (default: eu-central-1) |
SES_FROM_EMAIL |
Sender email for verification emails |
VERIFICATION_CODE_LIFETIME_SECONDS |
Code expiry (default: 3600) |
SQLADMIN_SECRET |
Session cookie key for admin panel |
Released under the MIT License.