Skip to content

kiSchlag/Nik-Backend

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

45 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Python FastAPI LangGraph OpenAI Cohere PostgreSQL ChromaDB MinIO Docker fastapi-users SQLAdmin

Nik β€” Technical Support Agent Backend

Backend for an AI-powered technical support agent that answers field technician questions from 89+ product manuals.


Overview

Solar energy installers working on rooftops with inverters, battery storage, and EV chargers need fast, reliable answers from product documentation. The current support process fails them: call the hotline, wait on hold, get told to email photos, call back the next day, reach a different agent, and start the explanation over. Meanwhile, the installer is still on the roof.

This backend replaces that loop with a single conversation. A technician types a question β€” or sends a photo of an error code β€” and Nik (named after Nikola Tesla), a ReAct agent, searches the complete documentation library, reasons about what it found, and delivers a precise answer with document name and page number citations. The agent is not a keyword search engine. It decides autonomously which tools to call, how many times to search, when to retrieve a full document for deeper context, and when to ask the technician a clarifying question before answering. All responses are generated in German, the working language of the installers.

The system currently indexes 89 product documents across 700+ searchable chunks, with health monitoring across four external services (PostgreSQL, MinIO, ChromaDB, Cohere).


Tech Stack

Layer Technology
AI orchestration LangGraph (ReAct agent, checkpointer, interrupt/resume)
Generation + vision OpenAI GPT-4o
Embeddings OpenAI text-embedding-3-large (3072 dimensions)
Reranking Cohere rerank-v4.0-pro cross-encoder
API framework FastAPI (fully async)
Streaming SSE via sse-starlette
Relational DB PostgreSQL 16 (asyncpg + SQLAlchemy async)
Object storage MinIO (S3-compatible)
Vector DB ChromaDB (cosine similarity)
Migrations Alembic (async PostgreSQL)
Configuration pydantic-settings, 12-factor env vars
Authentication fastapi-users v15 (JWT bearer, email verification)
Admin panel SQLAdmin (web-based database management)
Transactional email AWS SES via boto3 (console fallback for dev)

Key Capabilities

Capability Description
Agent Intelligence
ReAct agent loop LangGraph-based autonomous agent that reasons, selects tools, observes results, and iterates β€” not a fixed pipeline
Interrupt/resume clarification Agent pauses mid-execution to ask the user a question, then resumes exactly where it left off on reply
Multimodal image analysis Users upload photos of error codes, displays, or wiring; GPT-4o analyzes them inline with the conversation
Search and Retrieval
Semantic search + reranking ChromaDB vector search retrieves candidates; Cohere cross-encoder reranks to the most relevant chunks
Full document retrieval Agent can pull a complete document with all sections, tables, image descriptions, and a PDF download link
Source citations Every answer cites document name and page number β€” the agent never fabricates information
Real-Time Interaction
SSE streaming Token-by-token response streaming with live tool-call notifications via Server-Sent Events
Stateless one-shot mode Separate endpoint for single questions without session overhead
Authentication and Access Control
JWT authentication fastapi-users with Bearer token; email verification required for login
Code-based verification 8-digit verification codes via email (AWS SES) as alternative to token links
Role-based access control Three tiers β€” active user, verified user, superuser β€” each protecting different endpoint groups
Admin panel Web-based SQLAdmin UI and REST API for user and data management (superuser only)
Data and Persistence
Multi-turn chat sessions Dual storage: LangGraph checkpointer for agent memory, PostgreSQL for app records and message history
Image attachments Per-session image uploads stored in MinIO with ownership-verified presigned URL access
Operations
Per-user rate limiting Sliding-window in-memory limiter on chat and upload endpoints (configurable per route)
Health monitoring Dependency checks for PostgreSQL, MinIO, ChromaDB, and Cohere with a Kubernetes readiness probe

Architecture

Agent Decision Flow

flowchart TD
    A[User sends question] --> B[ReAct Agent]
    B --> C{Reason}
    C -->|search docs| D[search_knowledge]
    C -->|need full context| E[get_full_document]
    C -->|ambiguous query| F[ask_user_clarification]
    C -->|ready to answer| G[Generate answer with citations]
    D --> H{Sufficient?}
    E --> H
    H -->|no| C
    H -->|yes| G
    F -->|user replies| B
    G --> I[Persist + respond]
    I --> J[JSON or SSE stream]

    style A fill:#dbeafe,stroke:#93c5fd,color:#1e3a5f
    style B fill:#f8fafc,stroke:#cbd5e1,color:#334155
    style C fill:#f8fafc,stroke:#cbd5e1,color:#334155
    style D fill:#fef3c7,stroke:#fcd34d,color:#713f12
    style E fill:#fef3c7,stroke:#fcd34d,color:#713f12
    style F fill:#ffedd5,stroke:#fdba74,color:#7c2d12
    style G fill:#d1fae5,stroke:#6ee7b7,color:#064e3b
    style H fill:#f8fafc,stroke:#cbd5e1,color:#334155
    style I fill:#f8fafc,stroke:#cbd5e1,color:#334155
    style J fill:#d1fae5,stroke:#6ee7b7,color:#064e3b
Loading

Data Architecture

flowchart LR
    API[FastAPI Backend] --> M[MinIO]
    API --> C[ChromaDB]
    API --> P[PostgreSQL 16]

    M --- M1[PDF documents]
    M --- M2[Parsed metadata JSON]
    M --- M3[User photo uploads]

    C --- C1[Vector embeddings]
    C --- C2[700+ searchable chunks]

    P --- P1[Users + auth + sessions + messages]
    P --- P2[Image references + feedback]
    P --- P3[Agent checkpoints]
    P --- P4[Verification codes]

    style API fill:#f8fafc,stroke:#cbd5e1,color:#334155
    style M fill:#fef3c7,stroke:#fcd34d,color:#713f12
    style C fill:#fef3c7,stroke:#fcd34d,color:#713f12
    style P fill:#dbeafe,stroke:#93c5fd,color:#1e3a5f
Loading

Each layer serves a distinct purpose: PostgreSQL handles relational queries and ACID transactions (including user authentication, verification codes, and LangGraph agent state via a dedicated psycopg3 connection pool), MinIO stores large binary files with S3-compatible presigned URL access, and ChromaDB provides vector similarity search over document embeddings.


Streaming Protocol

The POST /chat/stream endpoint delivers the agent's response progressively via Server-Sent Events.

Event Payload When
metadata {"session_id": "..."} Once, after session setup
tool_start {"tool": "search_knowledge"} Agent calls a tool
token {"content": "Die Installation..."} Each LLM response token
sources [{"filename": "...", "page_number": 8}] After answer completes
done {"success": true, "needs_clarification": false} Terminal event
error {"message": "..."} On failure

API Reference

Method Endpoint Auth Description
Auth
POST /auth/register -- Register a new user account
POST /auth/login -- Login and receive a JWT access token
POST /auth/logout Bearer Invalidate the current token
POST /auth/request-verify-token -- Request an email verification token
POST /auth/verify -- Verify email with a JWT token
POST /auth/verify-code -- Verify email with an 8-digit code
POST /auth/forgot-password -- Request a password reset email
POST /auth/reset-password -- Reset password with a JWT token
POST /auth/reset-password-code -- Reset password with an 8-digit code
GET /auth/me Bearer Get current user profile
PATCH /auth/me Bearer Update current user profile
Chat
POST /chat Verified Send a message and get an AI response
POST /chat/stream Verified Stream the AI response via SSE
POST /chat/upload Verified Upload images for a chat session
GET /chat/{session_id}/messages Verified Get conversation history with images
PATCH /chat/{session_id} Verified Update session title or pinned status
DELETE /chat/{session_id} Verified Soft-delete a session
GET /chat/images/{object_key} Verified Fetch image via presigned URL redirect
Users
GET /users/me/sessions Verified List chat sessions for the authenticated user
Agent
POST /agent/ask Active One-shot AI answer β€” no session, no persistence
Search and Documents
GET /search Active Semantic search across document chunks
GET /documents Active List all documents with metadata
GET /documents/{id} Active Document details with presigned download URLs
Admin (User Management)
GET /admin/users Superuser List all users with session counts
GET /admin/users/{user_id} Superuser Get user details
PATCH /admin/users/{user_id} Superuser Update user flags or display name
DELETE /admin/users/{user_id} Superuser Delete a user and all associated data
Admin (Embedding)
POST /admin/embed Superuser Trigger embedding pipeline for all documents
POST /admin/embed/{id} Superuser Embed a single document
GET /admin/embed/status Superuser Embedding statistics
Admin (Web Panel)
-- /admin Session SQLAdmin web UI (superuser login)
Health
GET /health -- System health (PostgreSQL, MinIO, ChromaDB, Cohere)
GET /health/ready -- Kubernetes readiness probe

Auth levels: -- = public, Active = any active user, Verified = active + email verified, Superuser = active + verified + superuser, Session = superuser session cookie (web UI), Bearer = any authenticated user.

Example: Full Chat Flow
# 1. Register a new account
curl -X POST "http://localhost:8000/auth/register" \
  -H "Content-Type: application/json" \
  -d '{"email": "techniker@example.com", "display_name": "Max Mustermann", "password": "SecurePass123!"}'

# 2. Verify email (use the 8-digit code from the verification email)
curl -X POST "http://localhost:8000/auth/verify-code" \
  -H "Content-Type: application/json" \
  -d '{"email": "techniker@example.com", "code": "12345678"}'

# 3. Login to get a JWT token
curl -X POST "http://localhost:8000/auth/login" \
  -H "Content-Type: application/x-www-form-urlencoded" \
  -d "username=techniker@example.com&password=SecurePass123!"
# Response: {"access_token": "eyJ...", "token_type": "bearer"}

# 4. Send a message (starts a new session automatically)
curl -X POST "http://localhost:8000/chat" \
  -H "Authorization: Bearer eyJ..." \
  -H "Content-Type: application/json" \
  -d '{"question": "Wie installiere ich den X3-Hybrid G4?"}'

# 5. Get conversation history
curl "http://localhost:8000/chat/SESSION_UUID/messages" \
  -H "Authorization: Bearer eyJ..."

Project Structure

πŸ“ app/
β”œβ”€β”€ config/settings.py          # pydantic-settings configuration
β”œβ”€β”€ main.py                     # Application entry point and lifespan management
β”‚
β”œβ”€β”€ auth/                       # JWT authentication (fastapi-users)
β”‚   β”œβ”€β”€ router.py               # Route aggregation (login, register, verify, reset, me)
β”‚   β”œβ”€β”€ code_router.py          # Code-based verify-code and reset-password-code endpoints
β”‚   β”œβ”€β”€ backend.py              # JWT strategy and Bearer transport configuration
β”‚   β”œβ”€β”€ manager.py              # UserManager with email hooks (verification, password reset)
β”‚   β”œβ”€β”€ dependencies.py         # Auth deps: current_active_user, verified_user, superuser
β”‚   β”œβ”€β”€ email.py                # AWS SES email sending with console fallback
β”‚   β”œβ”€β”€ verification_codes.py   # 8-digit code generation and redemption logic
β”‚   └── schemas.py              # UserRead, UserCreate, UserUpdate
β”‚
β”œβ”€β”€ admin/                      # Admin panel and user management
β”‚   β”œβ”€β”€ router.py               # REST API: list/get/update/delete users (superuser only)
β”‚   β”œβ”€β”€ service.py              # AdminService with user CRUD and cascade deletion
β”‚   β”œβ”€β”€ setup.py                # SQLAdmin mounting and view registration
β”‚   β”œβ”€β”€ sqladmin_auth.py        # Session-based superuser authentication for web panel
β”‚   β”œβ”€β”€ views.py                # ModelView definitions for all models
β”‚   └── schemas.py              # AdminUserSummary, AdminUserDetail, AdminUserUpdateRequest
β”‚
β”œβ”€β”€ agent/                      # ReAct agent (LangGraph)
β”‚   β”œβ”€β”€ graph.py                # AgentState, dual graph compilation (chat + stateless)
β”‚   β”œβ”€β”€ tools.py                # search_knowledge, get_full_document, ask_user_clarification
β”‚   β”œβ”€β”€ prompts.py              # German system prompt and fallback messages
β”‚   β”œβ”€β”€ checkpointer.py         # AsyncPostgresSaver lifecycle (psycopg3 pool)
β”‚   β”œβ”€β”€ client.py               # Cohere reranking client
β”‚   β”œβ”€β”€ utils.py                # Source extraction from tool messages
β”‚   β”œβ”€β”€ schemas.py
β”‚   └── router.py
β”‚
β”œβ”€β”€ chat/                       # Persistent conversations
β”‚   β”œβ”€β”€ service.py              # Orchestration: prepare, invoke/stream, persist
β”‚   β”œβ”€β”€ streaming.py            # SSE event protocol over graph.astream()
β”‚   β”œβ”€β”€ schemas.py
β”‚   └── router.py
β”‚
β”œβ”€β”€ middleware/
β”‚   └── rate_limit.py           # Per-user sliding-window limiter
β”‚
β”œβ”€β”€ users/
β”‚   β”œβ”€β”€ service.py              # Session listing with search and pinning
β”‚   β”œβ”€β”€ schemas.py
β”‚   └── router.py               # GET /users/me/sessions (authenticated)
β”‚
β”œβ”€β”€ postgres/
β”‚   β”œβ”€β”€ client.py               # Async engine and session factory (asyncpg)
β”‚   β”œβ”€β”€ base.py                 # SQLAlchemy Base, UUID and Timestamp mixins
β”‚   └── models.py               # User, Session, Message, MessageImage, Feedback, VerificationCode
β”‚
β”œβ”€β”€ minio/client.py             # S3-compatible file operations and presigned URLs
β”œβ”€β”€ chroma/client.py            # ChromaDB collection management
β”‚
β”œβ”€β”€ vectorstore/
β”‚   β”œβ”€β”€ chunking.py             # Title-based document sectioning
β”‚   β”œβ”€β”€ service.py              # Embedding and semantic search
β”‚   └── router.py
β”‚
β”œβ”€β”€ documents/                  # Document listing and retrieval
└── health/                     # Health check endpoints

πŸ“ alembic/
β”œβ”€β”€ env.py                      # Async migration runner
└── versions/                   # 0001-0004: schema, pinned_at, auth fields, verification codes

πŸ“ scripts/
β”œβ”€β”€ seed_minio.py               # Load product documents into MinIO
└── seed_chromadb.py            # Generate and store embeddings

How the Agent Works

1. Search

The agent calls search_knowledge, which queries ChromaDB with the user's question using text-embedding-3-large (3072 dimensions). ChromaDB returns candidates ranked by cosine similarity, then Cohere's rerank-v4.0-pro cross-encoder rescores them to surface the most relevant chunks.

2. Reason

The ReAct loop evaluates the search results and decides autonomously what to do next. If the chunks lack sufficient detail, the agent calls get_full_document to retrieve the complete document structure β€” all sections, tables, image descriptions, and a presigned PDF download link. If the query is ambiguous (e.g., multiple product models match), the agent calls ask_user_clarification, which triggers a LangGraph interrupt() to pause execution and wait for the user's reply before resuming.

3. Answer

Once the agent has sufficient context, GPT-4o generates a precise answer in German, citing every source by document name and page number. If the documentation does not contain the answer, the agent states this explicitly rather than fabricating information.


Getting Started

Prerequisites

  • Python 3.10+
  • Docker and Docker Compose
  • OpenAI API key (embeddings + generation)
  • Cohere API key (reranking β€” free tier available)

Setup

# 1. Clone and configure
cp .env.example .env
# Edit .env β€” add OPENAI_API_KEY, COHERE_API_KEY, and set JWT_SECRET
# (Optional: configure AWS SES credentials for email verification)

# 2. Create virtual environment
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt

# 3. Start infrastructure (MinIO + PostgreSQL)
docker compose up -d
# MinIO Console: http://localhost:9001 (minioadmin/minioadmin)
# PostgreSQL: localhost:5432 (postgres/postgres)

# 4. Run database migrations
alembic upgrade head

# 5. Seed product documents
python scripts/seed_minio.py

# 6. Start the API
uvicorn app.main:app --reload
# API: http://localhost:8000
# Docs: http://localhost:8000/docs
# Admin Panel: http://localhost:8000/admin (requires superuser account)

# 7. Generate embeddings
python scripts/seed_chromadb.py

# 8. Verify health
curl http://localhost:8000/health

Development Approach

Built in focused, mergeable increments β€” each PR self-contained and deployable:

PR Title What was built
#1 MinIO storage S3-compatible blob storage with FastAPI scaffold
#2 Vector embeddings ChromaDB integration with semantic search endpoint
#3 PostgreSQL + Alembic Relational models and async migrations
#4 RAG pipeline LangGraph pipeline with Cohere reranking
#5 Chat persistence Sessions and messages connected to the agent pipeline
#6 ReAct agent Autonomous agent replacing the fixed 4-stage pipeline
#7-8 Interrupt/resume ask_user_clarification tool with LangGraph interrupt flow
#9 Image uploads Multimodal support with base64 content blocks
#10 Production hardening SSE streaming, per-user rate limiting, secure image proxy
#18 Session management Pinned sessions, session update/delete, title search
#19 Authentication JWT auth with fastapi-users, email verification, code-based flows, AWS SES
-- Admin panel SQLAdmin web UI, REST user management API, superuser session auth

The codebase uses async Python throughout with type hints on all functions. Request/response validation uses Pydantic v2, modules follow a domain-driven structure (app/{domain}/router.py, service.py, schemas.py), and services use the singleton pattern. All external dependencies have health checks, AI prompts are externalized for maintainability, and configuration follows the 12-factor app methodology via environment variables.


Configuration

Key environment variables (see .env.example for the full list):

Variable Description
OPENAI_API_KEY Required β€” embeddings and answer generation
OPENAI_CHAT_MODEL LLM model (default: gpt-4o)
COHERE_API_KEY Required β€” relevance reranking
POSTGRES_HOST / POSTGRES_DB PostgreSQL connection
MINIO_ENDPOINT MinIO server address (default: localhost:9000)
CHROMA_PERSIST_DIRECTORY ChromaDB storage path (default: ./chroma_data)
RATE_LIMIT_CHAT Chat rate limit (default: 10/minute)
RATE_LIMIT_UPLOAD Upload rate limit (default: 20/minute)
IMAGE_PRESIGN_EXPIRES_MINUTES Image URL expiration (default: 5)
JWT_SECRET Required β€” JWT signing key (change in production)
JWT_LIFETIME_SECONDS Token expiry (default: 3600)
AWS_ACCESS_KEY_ID Optional β€” AWS credentials for SES email
AWS_SECRET_ACCESS_KEY Optional β€” AWS credentials for SES email
AWS_REGION AWS region for SES (default: eu-central-1)
SES_FROM_EMAIL Sender email for verification emails
VERIFICATION_CODE_LIFETIME_SECONDS Code expiry (default: 3600)
SQLADMIN_SECRET Session cookie key for admin panel

Released under the MIT License.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors