A FastAPI-based backend for the Evolving Personal AI Assistant powered by Letta Learning SDK and Claude Agent SDK for persistent memory, continual learning, and agentic tool use.
Built on the Letta Learning SDK pattern (inspired by claude_research_agent):
- Claude Agent SDK: Provides native tool execution (Bash, Read, Write, Edit, Search, Glob)
- Letta Learning SDK: Wraps agent calls to provide automatic memory persistence and retrieval
- FastAPI: Exposes streaming SSE and WebSocket endpoints for real-time interaction
- Session Management: Native Claude SDK session continuity with automatic resumption
- Continual Learning: Automatic memory persistence across sessions via Letta
- Two-Layer Memory: Short-term (session history) + Long-term (Letta memory blocks)
- Streaming Chat: Real-time streaming responses via SSE and WebSocket
- Claude Agent SDK: Native tool execution (Bash, Read, Write, Edit, Search, Glob)
- Claude Code Router: Intelligent model routing (DeepSeek, Claude, Gemini, local models)
- Persistent Memory: Letta-managed memory blocks (human, persona, preferences, knowledge)
- Workspace Management: File tree browsing and operations within a sandboxed workspace
- Conversation History: SQLite-backed conversation storage
- Python 3.11+
- Anthropic API key
- Docker & Docker Compose (for containerized deployment)
- Node.js 20+ (for Claude Code Router CLI)
-
Clone and setup:
cd ai-companion-server python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate pip install -r requirements.txt
-
Configure environment:
cp .env.example .env # Edit .env and add your OPENROUTER_API_KEY and LETTA_API_KEY # Get OpenRouter API key at: https://openrouter.ai/keys # Get Letta API key at: https://app.letta.com # Note: Model selection is handled by Claude Code Router (see router/config.json)
-
Run the server:
python -m app.main # Or with uvicorn directly: uvicorn app.main:app --reload --host 0.0.0.0 --port 8000 -
Access the API:
- API docs: http://localhost:8000/docs
- Health check: http://localhost:8000/health
Standard deployment (AI Companion Server only):
# Set your API keys in .env
cp .env.example .env
# Edit .env and add your keys
# Build and run
docker-compose up -d ai-companionFull deployment with Claude Code Router:
# Run the setup script
./scripts/setup-router.sh
# Or manually:
docker-compose up -dThis starts:
- AI Companion Container - Single container running:
- FastAPI backend (port 8000) - Letta + Claude Agent SDK
- Claude Code Router (port 3000) - Model routing proxy
- Ollama (port 11434) - Local model inference (optional)
See router/README.md for Claude Code Router setup and usage.
POST /chat/stream- Stream a chat response (SSE)GET /chat/conversations- List conversationsGET /chat/conversations/{id}- Get conversation detailsDELETE /chat/conversations/{id}- Delete a conversation
GET /memory- List memory blocksPOST /memory- Create a memory blockGET /memory/context- Get formatted memory contextGET /memory/search?q=query- Search memoriesPOST /memory/upsert- Create or update memoryPUT /memory/{id}- Update a memory blockDELETE /memory/{id}- Delete a memory block
GET /workspace/tree- Get file treeGET /workspace/files- List filesGET /workspace/stats- Get workspace statisticsGET /workspace/file?path=...- Read a file
GET /tools- List available toolsPOST /tools/execute- Execute a toolPOST /tools/bash- Execute a bash command
Connect to /ws for real-time bidirectional communication.
Actions:
{"action": "chat", "message": "Hello"}- Send a chat message{"action": "get_memory"}- Get memory context{"action": "list_files", "path": "."}- List files{"action": "ping"}- Health check
The assistant has access to Claude Agent SDK native tools:
| Tool | Description |
|---|---|
Bash |
Execute shell commands in the workspace |
Read |
Read file contents |
Write |
Create or overwrite files |
Edit |
Find and replace text in files |
Glob |
List files matching patterns |
Search |
Search for patterns in files |
Two-layer conversation continuity:
- Short-term (Claude SDK Sessions): Native conversation history, tool context, file edits
- Long-term (Letta Memory): Cross-session facts, preferences, learned knowledge
Sessions are automatically managed by default:
# First message - creates session
curl -X POST http://localhost:8000/chat/stream \
-d '{"message": "My favorite color is blue", "conversation_id": "conv-1"}'
# Returns: session_id event
# Second message - auto-resumes
curl -X POST http://localhost:8000/chat/stream \
-d '{"message": "What is my favorite color?", "conversation_id": "conv-1"}'
# Agent remembers: "You said your favorite color is blue"Pass session_id to override automatic behavior:
# Resume specific session
curl -X POST http://localhost:8000/chat/stream \
-d '{
"message": "Continue from where we left off",
"conversation_id": "conv-1",
"session_id": "4dc88e4a-26d5-42f9-b58b-88bb880bbad2"
}'
# Start fresh (fork conversation)
curl -X POST http://localhost:8000/chat/stream \
-d '{
"message": "Try a different approach",
"conversation_id": "conv-1",
"session_id": null
}'π See docs/SESSION_MANAGEMENT.md for complete documentation
Memory is automatically managed by Letta Learning SDK:
- Automatic Capture: All conversations are automatically captured
- Semantic Retrieval: Relevant context is injected into prompts based on memory labels
- Persistent Across Sessions: Memory persists even after server restarts
- Memory Labels:
human,persona,preferences,knowledge - Per-Agent Isolation: Each conversation ID maintains separate memory
ai-companion-server/
βββ app/
β βββ __init__.py
β βββ main.py # FastAPI application + WebSocket handler
β βββ config.py # Configuration settings
β βββ models/
β β βββ __init__.py
β β βββ schemas.py # Pydantic models
β β βββ database.py # SQLAlchemy models
β βββ routers/
β β βββ __init__.py
β β βββ chat.py # SSE streaming chat endpoint
β β βββ memory.py # Memory endpoints
β β βββ workspace.py # Workspace endpoints
β β βββ tools.py # Tool endpoints
β βββ services/
β βββ __init__.py
β βββ agent_service.py # Simplified Claude Agent SDK + Letta wrapper
β βββ memory_service.py # Local memory management (SQLite)
β βββ workspace_service.py # Workspace file operations
βββ router/
β βββ config.json # Router configuration
β βββ README.md # Router documentation
βββ scripts/
β βββ setup-router.sh # Router setup script
β βββ start-services.sh # Multi-service startup script
βββ workspace/ # Sandboxed workspace directory
βββ data/ # Database storage
βββ requirements.txt
βββ Dockerfile
βββ docker-compose.yml
βββ .env.example
βββ README.md
import httpx
async with httpx.AsyncClient() as client:
async with client.stream(
"POST",
"http://localhost:8000/chat/stream",
json={"message": "Hello, what can you help me with?"},
) as response:
async for line in response.aiter_lines():
if line.startswith("data: "):
print(line[6:])const ws = new WebSocket("ws://localhost:8000/ws");
ws.onmessage = (event) => {
const data = JSON.parse(event.data);
console.log(data.event, data.data);
};
ws.onopen = () => {
ws.send(JSON.stringify({
action: "chat",
message: "Hello!"
}));
};MIT