Read the full Case Study & Architecture Breakdown here: edycu.dev/work/aegis
A multi-agent AI system that acts as a Tier-2 Support Engineer. Investigates complex issues via SQL + documentation, proposes financial/technical actions, and waits for human approval before executing.
| Feature | Description |
|---|---|
| Human-in-the-Loop (HITL) | Agent pauses execution and waits for human approval before taking destructive actions (refunds, suspensions). Non-destructive actions are auto-approved. |
| Dynamic Model Routing | Routes simple intents to Groq Llama-3 ( |
| Smart Customer Validation | Handles 8 edge cases: ID+name match, fuzzy name matching, typo correction, name-only search, disambiguation, suspended/cancelled accounts, not-found, and ID mismatch |
| Self-Healing SQL | Generates SQL from natural language, executes against Supabase, and auto-retries up to 3× by feeding errors back to the LLM |
| Semantic Caching | Identical queries served from Redis cache in <50ms at $0.00 cost — failures are never cached |
| Real-time Streaming | Watch the agent's thought process step-by-step via Server-Sent Events (SSE) |
| Dual-Mode ThoughtStream | Toggle between clean User mode and detailed Dev mode with color-coded agent badges |
| Observability Dashboard | Track token usage, cost per request, cache hit ratio, model distribution, and database status |
flowchart TD
A["Next.js Frontend"] -- REST + SSE --> B["FastAPI Backend"]
B -- cache check --> D["Redis Cache"]
B -- cache miss --> F["LangGraph Agent"]
F --> G["Classify → Validate → Write SQL → Execute"]
G --> G2["Search Docs → Propose → ⏸ HITL Approval"]
G2 --> G3["Execute Action → Respond → Frontend"]
F --> C["Model Router → LLM APIs"]
G --> I["Supabase PostgreSQL"]
F -. traces .-> J["LangSmith"]
🧠 Agent ThoughtStream (Real-time Processing)
Watch the agent think step-by-step: intent classification → customer validation → SQL generation → policy search → action proposal
⚡ Full Resolution Workflow
Agent resolves a ticket end-to-end: intent classification → customer validation → SQL execution → policy search → human approval → resolution
🔒 Human-in-the-Loop Approval Modal
The agent pauses and waits for human authorization before executing any action requiring strict oversight
🔧 Multi-Ticket Type Support (Technical, Billing, Upgrade, Reactivate, Suspend)
Technical ticket: Investigates API rate limiting errors with SQL queries and resolves automatically (no HITL needed)
Account reactivation: HITL approval required before restoring suspended enterprise accounts
Account suspension: HITL approval required before suspending accounts for ToS violations
✍️ Smart Customer Validation (Edge Cases)
Typo correction: "Davd Martines" fuzzy-matched to "David Martinez" (≥80% similarity)
Customer #999 not found — the agent stops gracefully with a clear error message
Name/ID mismatch: Customer #8 is David Martinez, not Sarah Chen — agent flags the security mismatch
📊 Observability Metrics
Real-time observability: total tokens, cost per request, cache hit ratio, model distribution, HITL wait times
🔭 LangSmith Traces
Built-in LangSmith traces panel with run details, latency, token counts, and status per trace
Expanded trace: node-level spans for every agent step — classify → validate → SQL → search → propose → approve → execute
LLM call detail: inputs, outputs, token counts, model name, and latency per invocation
🗄️ Database Explorer & Ticket History
Database explorer: browse Supabase tables (customers, billing, tickets, docs) directly from the dashboard
Ticket history: all processed tickets persisted in localStorage with status and response preview
Aegis organizes its workflow as 4 specialized agents collaborating in sequence. Each agent has a clear responsibility and reports its progress via the real-time thought stream:
| Agent | Role | Nodes |
|---|---|---|
| 🏷 Triage Agent | Classifies incoming tickets into billing, technical, account, or general | classify_intent |
| 🔍 Investigator Agent | Validates customer identity (8 edge cases), generates & executes SQL with self-healing retry | validate_customer, write_sql, execute_sql |
| 📚 Knowledge Agent | Searches internal docs for relevant policies, procedures, and guidelines | search_docs |
| ⚡ Resolution Agent | Proposes actions, manages HITL approval, executes approved actions, generates summary | propose_action, await_approval, execute_action, generate_response |
[Triage] Classified intent: billing (95%)
→ [Investigator] Customer validated: #8 David Martinez (pro, active)
→ [Investigator] SQL executed successfully — found 3 records
→ [Knowledge] Found 2 relevant internal documents
→ [Resolution] Proposed action: refund — Refund $29.99 duplicate charge
→ [Resolution] ⏸ Awaiting human approval...
→ [Resolution] Action executed: Refund processed (TXN-04821)
→ [Resolution] Generated resolution summary
The Investigator Agent handles these scenarios robustly:
| Scenario | Behavior |
|---|---|
Customer #8 David Martinez |
✅ Direct ID+name match |
Customer #8 Davd Martines |
✅ Fuzzy match (typo auto-corrected, ≥80% similarity) |
Emily Davis (no ID) |
✅ Name search → exact match found |
Customer #8 Sarah Chen (wrong name) |
|
Customer #999 |
|
Customer #5 (suspended) |
|
Customer #20 (cancelled) |
|
Smith (ambiguous name) |
🔀 Multiple matches → returns candidates for disambiguation |
| Model | Used For | Cost per Request |
|---|---|---|
| Llama-3.1-8B (Groq) | Intent classification, search, response | ~$0.00003 |
| Gemini 2.5 Flash | Fallback fast tasks | ~$0.0001 |
| GPT-4.1 / Claude | SQL generation + reasoning | ~$0.008 |
| Total avg per ticket | ~$0.009 | |
| With semantic cache hit | $0.00 |
Simple intents (billing_inquiry, general) → Groq Llama-3.3-70B (fast, free)
Complex intents (refund, account, technical) → Gemini 2.5 Flash (accurate)
SQL generation + action proposal → GPT-4.1 / Claude (smart)
Groq unavailable? → Automatic fallback to Gemini
| Layer | Technology |
|---|---|
| Backend | Python 3.12+, FastAPI, LangGraph, LangChain |
| Frontend | Next.js 16, React 19, TypeScript, Tailwind CSS 4 |
| Database | Supabase (PostgreSQL) |
| Cache | Redis (semantic deduplication) |
| LLMs | Groq/Llama-3 (fast), GPT-4.1/Claude (complex), Gemini (fallback) |
| Observability | LangSmith tracing + built-in token/cost tracking |
| Testing | pytest + pytest-cov (backend), Vitest + React Testing Library (frontend) |
| CI/CD | GitHub Actions — lint, test, coverage, Docker build |
100% coverage across both backend and frontend — fully offline, no API keys or network needed.
make test # run backend + frontend tests
make ci # full pipeline: lint → test → buildmake test-backend
# or directly:
cd backend && python -m pytest tests/ --cov=app --cov-fail-under=100 -v| Module | Stmts | Cover |
|---|---|---|
classifier.py (Triage Agent) |
29 | 100% |
investigator.py (Investigator Agent) |
138 | 100% |
researcher.py (Knowledge Agent) |
14 | 100% |
resolver.py (Resolution Agent) |
150 | 100% |
main.py (API + SSE + HITL) |
270 | 100% |
model_router.py |
43 | 100% |
semantic.py (cache) |
73 | 100% |
tracker.py (observability) |
71 | 100% |
supabase.py |
45 | 100% |
| All other modules | 123 | 100% |
| Total | 956 | 100% |
make test-frontend
# or directly:
cd frontend && npm test -- --coverage| Test Suite | Tests |
|---|---|
page.test.tsx |
Dashboard rendering, submission, preset buttons |
ApprovalModal.test.tsx |
HITL approve/deny flow, animations |
AnimatedNumber.test.tsx |
Number formatting, animations, cleanup requests |
MetricsPanel.test.tsx |
Metrics display, cache clear, DB explorer |
ThoughtStream.test.tsx |
Dev/User mode toggle, message simplification, idle empty states |
TicketHistory.test.tsx |
History persistence, clear, selection |
useTicketHistory.test.ts |
Hook behavior, localStorage |
api.test.ts |
API client, SSE connection, error handling |
- Python 3.12+
- Node.js 22+
- Docker & Docker Compose (for Redis)
- API keys (minimum 2):
- Groq — free tier, handles fast tasks (classification, docs, response)
- OpenAI or Anthropic — one is enough for complex tasks (SQL, action proposal)
- Google AI / Gemini — optional fallback
git clone https://github.com/edycutjong/aegis.git
cd aegiscp backend/.env.example backend/.env
# Fill in your API keys (SUPABASE_URL, SUPABASE_KEY, GROQ_API_KEY, etc.)make upStarts backend (port 8000), frontend (port 3000), and Redis. Rebuilds Docker images automatically.
Run seed.sql in the Supabase SQL Editor to populate sample data. To reset and reseed at any time:
make db-reset # requires SUPABASE_MANAGEMENT_KEY in backend/.envVisit http://localhost:3000 and submit a support ticket.
aegis/
├── backend/
│ ├── app/
│ │ ├── agent/
│ │ │ ├── agents/ # 4 specialized agents
│ │ │ │ ├── classifier.py # Triage Agent — intent classification
│ │ │ │ ├── investigator.py # Investigator — customer validation + SQL
│ │ │ │ ├── researcher.py # Knowledge Agent — doc search
│ │ │ │ └── resolver.py # Resolution Agent — actions + HITL
│ │ │ ├── graph.py # LangGraph workflow definition
│ │ │ ├── state.py # AgentState TypedDict
│ │ │ └── nodes.py # Re-export shim for backward compat
│ │ ├── cache/semantic.py # Redis semantic caching
│ │ ├── db/supabase.py # Async Supabase client
│ │ ├── routing/model_router.py # Dynamic LLM routing + pricing
│ │ ├── observability/tracker.py # Token/cost tracking
│ │ ├── config.py # Pydantic Settings
│ │ └── main.py # FastAPI app + SSE endpoints
│ ├── tests/ # 8 test files, 100% coverage
│ ├── Dockerfile
│ └── requirements.txt
├── frontend/
│ ├── src/
│ │ ├── app/page.tsx # Main dashboard
│ │ ├── components/ # 6 React components
│ │ │ ├── AnimatedNumber.tsx # Smooth animated value counter
│ │ │ ├── ApprovalModal.tsx # HITL approval UI
│ │ │ ├── DatabaseStatus.tsx # DB table explorer
│ │ │ ├── MetricsPanel.tsx # Observability dashboard
│ │ │ ├── ThoughtStream.tsx # Agent progress + Dev/User toggle
│ │ │ └── TicketHistory.tsx # Recent tickets (localStorage)
│ │ ├── hooks/useTicketHistory.ts
│ │ └── lib/api.ts # API client + SSE
│ ├── src/components/__tests__/ # 8 test files (Vitest + RTL)
│ ├── src/app/__tests__/ # 1 test file (Vitest + RTL)
│ ├── Dockerfile # Multi-stage standalone build
│ └── package.json
├── docker-compose.yml # Backend + Frontend + Redis
├── seed.sql # Sample data for Supabase
└── .github/workflows/ci.yml # Ruff + pytest + ESLint + Docker build
All day-to-day workflows are managed via make. Run make help to see the full list.
| Command | Description |
|---|---|
make up |
🚀 Start full stack (backend + frontend + Redis) |
make down |
🛑 Stop stack and remove images + dangling layers |
make restart |
🔄 Restart stack (preserves DB state) |
make logs |
📋 Tail backend logs (make logs s=frontend for frontend) |
make clean |
🧹 Nuclear clean — remove everything including base images |
make db-reset |
🗄️ Reset & reseed Supabase database |
| Command | Description |
|---|---|
make ci |
🔁 Full CI pipeline: lint → test → build |
make test |
✅ Run all tests (backend + frontend) |
make test-backend |
🐍 Backend pytest with 100% coverage enforcement |
make test-frontend |
⚛️ Frontend Vitest with coverage |
make lint |
🔍 Lint backend (ruff) + frontend (eslint) |
make build |
🏗️ Build Docker images (no cache) |
| Command | Description |
|---|---|
make screenshots |
📸 Capture all UI screenshots (requires stack running) |
make ss-dashboard |
Shot 01: Dashboard overview |
make ss-refund |
Shots 02: Refund HITL suite |
make ss-technical |
Shots 03: Technical HITL suite |
make ss-billing |
Shot 04: Billing resolution |
make ss-upgrade |
Shots 05: Upgrade HITL suite |
make ss-reactivate |
Shot 06: Reactivate resolution |
make ss-suspend |
Shot 07: Suspend HITL suite |
make ss-edge |
Shots 09–13: All edge cases |
make ss-cache |
Shot 15: Semantic cache hit |
make ss-metrics |
Shot 18: Observability metrics |
make ss-traces |
Shot 19: LangSmith traces |
make ss-database |
Shot 21: Database explorer |
make ss-tickets |
Shot 23: Recent tickets |
Every LangGraph run produces a full trace in LangSmith showing the complete pipeline with token counts and latency per step:
classify_intent → validate_customer → write_sql → execute_sql
→ search_docs → propose_action → await_approval → execute_action → generate_response
- Create a free account at smith.langchain.com
- Get your API key from Settings → API Keys
- Add to your
backend/.env:
LANGCHAIN_TRACING_V2=true
LANGCHAIN_API_KEY=lsv2_pt_...
LANGCHAIN_PROJECT=aegis- Verify connectivity:
curl http://localhost:8000/api/tracing-status
# → {"enabled": true, "project": "aegis", "connected": true}- Node-level spans via
@traceabledecorators on all agent nodes - LLM calls auto-traced by LangChain (input/output, token counts, model name)
- Graph execution with
run_name="aegis-support-workflow"for easy filtering
Copy backend/.env.example to backend/.env and configure:
| Variable | Required | Description |
|---|---|---|
SUPABASE_URL |
✅ | Supabase project URL |
SUPABASE_KEY |
✅ | Supabase anon/public key |
SUPABASE_MANAGEMENT_KEY |
➖ | Management API key — needed for make db-reset |
GOOGLE_API_KEY |
⚡ | Google Gemini key (recommended — free tier available) |
OPENAI_API_KEY |
⚡ | OpenAI key (GPT-4.1 as smart model) |
ANTHROPIC_API_KEY |
⚡ | Anthropic key (Claude as alternative smart model) |
GROQ_API_KEY |
⚡ | Groq key (free, fast inference — good for classification) |
FAST_MODEL |
➖ | Fast model name (default: llama-3.1-8b-instant) |
SMART_MODEL |
➖ | Smart model name (default: gpt-4.1) |
REDIS_URL |
➖ | Redis connection URL (default: redis://localhost:6379) |
CACHE_TTL_SECONDS |
➖ | Cache TTL in seconds (default: 3600) |
FRONTEND_URL |
➖ | CORS origin (default: http://localhost:3000) |
LANGCHAIN_TRACING_V2 |
➖ | Enable LangSmith tracing (default: true) |
LANGCHAIN_API_KEY |
➖ | LangSmith API key for tracing |
LANGCHAIN_PROJECT |
➖ | LangSmith project name (default: aegis) |
DEBUG |
➖ | Enable debug logging (default: false) |
✅ = required, ⚡ = need at least one LLM key, ➖ = optional
MIT
















