Skip to content

edycutjong/aegis

Repository files navigation

CI Coverage Python Next.js License: MIT

⛊ Aegis — Autonomous Enterprise Action Engine

Read the full Case Study & Architecture Breakdown here: edycu.dev/work/aegis

A multi-agent AI system that acts as a Tier-2 Support Engineer. Investigates complex issues via SQL + documentation, proposes financial/technical actions, and waits for human approval before executing.

✨ Key Features

Feature Description
Human-in-the-Loop (HITL) Agent pauses execution and waits for human approval before taking destructive actions (refunds, suspensions). Non-destructive actions are auto-approved.
Dynamic Model Routing Routes simple intents to Groq Llama-3 ($0.00003), complex intents to GPT-4.1/Gemini ($0.008) — with automatic fallback
Smart Customer Validation Handles 8 edge cases: ID+name match, fuzzy name matching, typo correction, name-only search, disambiguation, suspended/cancelled accounts, not-found, and ID mismatch
Self-Healing SQL Generates SQL from natural language, executes against Supabase, and auto-retries up to 3× by feeding errors back to the LLM
Semantic Caching Identical queries served from Redis cache in <50ms at $0.00 cost — failures are never cached
Real-time Streaming Watch the agent's thought process step-by-step via Server-Sent Events (SSE)
Dual-Mode ThoughtStream Toggle between clean User mode and detailed Dev mode with color-coded agent badges
Observability Dashboard Track token usage, cost per request, cache hit ratio, model distribution, and database status

🏗️ Architecture

flowchart TD
    A["Next.js Frontend"] -- REST + SSE --> B["FastAPI Backend"]
    B -- cache check --> D["Redis Cache"]
    B -- cache miss --> F["LangGraph Agent"]
    F --> G["Classify → Validate → Write SQL → Execute"]
    G --> G2["Search Docs → Propose → ⏸ HITL Approval"]
    G2 --> G3["Execute Action → Respond → Frontend"]
    F --> C["Model Router → LLM APIs"]
    G --> I["Supabase PostgreSQL"]
    F -. traces .-> J["LangSmith"]
Loading

📸 Demo

Aegis Dashboard — Autonomous Enterprise Action Engine

🧠 Agent ThoughtStream (Real-time Processing)

Agent ThoughtStream — real-time step-by-step processing with intent classification, customer validation, SQL execution

Watch the agent think step-by-step: intent classification → customer validation → SQL generation → policy search → action proposal

⚡ Full Resolution Workflow

Aegis Dashboard — Full agent workflow with ThoughtStream, observability metrics, and model distribution

Agent resolves a ticket end-to-end: intent classification → customer validation → SQL execution → policy search → human approval → resolution

🔒 Human-in-the-Loop Approval Modal

HITL Approval Modal — Agent pauses for human authorization before executing destructive actions

The agent pauses and waits for human authorization before executing any action requiring strict oversight

🔧 Multi-Ticket Type Support (Technical, Billing, Upgrade, Reactivate, Suspend)

Technical ticket resolution — API rate limiting investigation

Technical ticket: Investigates API rate limiting errors with SQL queries and resolves automatically (no HITL needed)


Reactivation HITL — Account reactivation requires human approval

Account reactivation: HITL approval required before restoring suspended enterprise accounts


Suspension HITL — Account suspension requires human approval

Account suspension: HITL approval required before suspending accounts for ToS violations

✍️ Smart Customer Validation (Edge Cases)

Typo correction — fuzzy name matching auto-corrects misspellings

Typo correction: "Davd Martines" fuzzy-matched to "David Martinez" (≥80% similarity)


Customer not found — graceful error handling for nonexistent customers

Customer #999 not found — the agent stops gracefully with a clear error message


Name/ID mismatch — security check catches wrong name for customer ID

Name/ID mismatch: Customer #8 is David Martinez, not Sarah Chen — agent flags the security mismatch

⚡ Semantic Cache

Semantic cache hit — identical query served instantly at $0.00 cost

Identical query served from Redis cache in <50ms at $0.00 cost

📊 Observability Metrics

Observability metrics — token usage, cost tracking, model distribution, cache hit ratio

Real-time observability: total tokens, cost per request, cache hit ratio, model distribution, HITL wait times

🔭 LangSmith Traces

LangSmith traces panel — full pipeline visibility with latency and token counts

Built-in LangSmith traces panel with run details, latency, token counts, and status per trace


LangSmith trace expanded — node-level spans showing each agent step with latency

Expanded trace: node-level spans for every agent step — classify → validate → SQL → search → propose → approve → execute


LangSmith trace detail — LLM call inputs, outputs, token counts and model name

LLM call detail: inputs, outputs, token counts, model name, and latency per invocation

🗄️ Database Explorer & Ticket History

Database explorer — browse Supabase tables directly from the dashboard

Database explorer: browse Supabase tables (customers, billing, tickets, docs) directly from the dashboard


Ticket history — localStorage-persisted recent ticket log

Ticket history: all processed tickets persisted in localStorage with status and response preview

🤖 Multi-Agent Architecture

Aegis organizes its workflow as 4 specialized agents collaborating in sequence. Each agent has a clear responsibility and reports its progress via the real-time thought stream:

Agent Role Nodes
🏷 Triage Agent Classifies incoming tickets into billing, technical, account, or general classify_intent
🔍 Investigator Agent Validates customer identity (8 edge cases), generates & executes SQL with self-healing retry validate_customer, write_sql, execute_sql
📚 Knowledge Agent Searches internal docs for relevant policies, procedures, and guidelines search_docs
Resolution Agent Proposes actions, manages HITL approval, executes approved actions, generates summary propose_action, await_approval, execute_action, generate_response

Agent Execution Trace

[Triage] Classified intent: billing (95%)
  → [Investigator] Customer validated: #8 David Martinez (pro, active)
  → [Investigator] SQL executed successfully — found 3 records
  → [Knowledge] Found 2 relevant internal documents
  → [Resolution] Proposed action: refund — Refund $29.99 duplicate charge
  → [Resolution] ⏸ Awaiting human approval...
  → [Resolution] Action executed: Refund processed (TXN-04821)
  → [Resolution] Generated resolution summary

Customer Validation Edge Cases

The Investigator Agent handles these scenarios robustly:

Scenario Behavior
Customer #8 David Martinez ✅ Direct ID+name match
Customer #8 Davd Martines ✅ Fuzzy match (typo auto-corrected, ≥80% similarity)
Emily Davis (no ID) ✅ Name search → exact match found
Customer #8 Sarah Chen (wrong name) ⚠️ Name mismatch → stops with error
Customer #999 ⚠️ Not found → stops with error
Customer #5 (suspended) ⚠️ Proceeds with suspension warning
Customer #20 (cancelled) ⚠️ Proceeds with cancellation warning
Smith (ambiguous name) 🔀 Multiple matches → returns candidates for disambiguation

📊 Cost Analysis

Model Used For Cost per Request
Llama-3.1-8B (Groq) Intent classification, search, response ~$0.00003
Gemini 2.5 Flash Fallback fast tasks ~$0.0001
GPT-4.1 / Claude SQL generation + reasoning ~$0.008
Total avg per ticket ~$0.009
With semantic cache hit $0.00

Model Routing Strategy

Simple intents (billing_inquiry, general)  →  Groq Llama-3.3-70B  (fast, free)
Complex intents (refund, account, technical)  →  Gemini 2.5 Flash  (accurate)
SQL generation + action proposal  →  GPT-4.1 / Claude  (smart)

Groq unavailable?  →  Automatic fallback to Gemini

🛠 Tech Stack

Layer Technology
Backend Python 3.12+, FastAPI, LangGraph, LangChain
Frontend Next.js 16, React 19, TypeScript, Tailwind CSS 4
Database Supabase (PostgreSQL)
Cache Redis (semantic deduplication)
LLMs Groq/Llama-3 (fast), GPT-4.1/Claude (complex), Gemini (fallback)
Observability LangSmith tracing + built-in token/cost tracking
Testing pytest + pytest-cov (backend), Vitest + React Testing Library (frontend)
CI/CD GitHub Actions — lint, test, coverage, Docker build

🧪 Testing

100% coverage across both backend and frontend — fully offline, no API keys or network needed.

make test        # run backend + frontend tests
make ci          # full pipeline: lint → test → build

Backend (pytest)

make test-backend
# or directly:
cd backend && python -m pytest tests/ --cov=app --cov-fail-under=100 -v
Module Stmts Cover
classifier.py (Triage Agent) 29 100%
investigator.py (Investigator Agent) 138 100%
researcher.py (Knowledge Agent) 14 100%
resolver.py (Resolution Agent) 150 100%
main.py (API + SSE + HITL) 270 100%
model_router.py 43 100%
semantic.py (cache) 73 100%
tracker.py (observability) 71 100%
supabase.py 45 100%
All other modules 123 100%
Total 956 100%

Frontend (Vitest + React Testing Library)

make test-frontend
# or directly:
cd frontend && npm test -- --coverage
Test Suite Tests
page.test.tsx Dashboard rendering, submission, preset buttons
ApprovalModal.test.tsx HITL approve/deny flow, animations
AnimatedNumber.test.tsx Number formatting, animations, cleanup requests
MetricsPanel.test.tsx Metrics display, cache clear, DB explorer
ThoughtStream.test.tsx Dev/User mode toggle, message simplification, idle empty states
TicketHistory.test.tsx History persistence, clear, selection
useTicketHistory.test.ts Hook behavior, localStorage
api.test.ts API client, SSE connection, error handling

🚀 Quick Start

Prerequisites

  • Python 3.12+
  • Node.js 22+
  • Docker & Docker Compose (for Redis)
  • API keys (minimum 2):
    • Groq — free tier, handles fast tasks (classification, docs, response)
    • OpenAI or Anthropic — one is enough for complex tasks (SQL, action proposal)
    • Google AI / Gemini — optional fallback

1. Clone & Setup

git clone https://github.com/edycutjong/aegis.git
cd aegis

2. Configure environment

cp backend/.env.example backend/.env
# Fill in your API keys (SUPABASE_URL, SUPABASE_KEY, GROQ_API_KEY, etc.)

3. Start the stack

make up

Starts backend (port 8000), frontend (port 3000), and Redis. Rebuilds Docker images automatically.

4. Seed the database

Run seed.sql in the Supabase SQL Editor to populate sample data. To reset and reseed at any time:

make db-reset   # requires SUPABASE_MANAGEMENT_KEY in backend/.env

5. Open the dashboard

Visit http://localhost:3000 and submit a support ticket.

📁 Project Structure

aegis/
├── backend/
│   ├── app/
│   │   ├── agent/
│   │   │   ├── agents/          # 4 specialized agents
│   │   │   │   ├── classifier.py    # Triage Agent — intent classification
│   │   │   │   ├── investigator.py  # Investigator — customer validation + SQL
│   │   │   │   ├── researcher.py    # Knowledge Agent — doc search
│   │   │   │   └── resolver.py      # Resolution Agent — actions + HITL
│   │   │   ├── graph.py         # LangGraph workflow definition
│   │   │   ├── state.py         # AgentState TypedDict
│   │   │   └── nodes.py         # Re-export shim for backward compat
│   │   ├── cache/semantic.py    # Redis semantic caching
│   │   ├── db/supabase.py       # Async Supabase client
│   │   ├── routing/model_router.py  # Dynamic LLM routing + pricing
│   │   ├── observability/tracker.py # Token/cost tracking
│   │   ├── config.py            # Pydantic Settings
│   │   └── main.py              # FastAPI app + SSE endpoints
│   ├── tests/                   # 8 test files, 100% coverage
│   ├── Dockerfile
│   └── requirements.txt
├── frontend/
│   ├── src/
│   │   ├── app/page.tsx         # Main dashboard
│   │   ├── components/          # 6 React components
│   │   │   ├── AnimatedNumber.tsx    # Smooth animated value counter
│   │   │   ├── ApprovalModal.tsx     # HITL approval UI
│   │   │   ├── DatabaseStatus.tsx    # DB table explorer
│   │   │   ├── MetricsPanel.tsx      # Observability dashboard
│   │   │   ├── ThoughtStream.tsx     # Agent progress + Dev/User toggle
│   │   │   └── TicketHistory.tsx     # Recent tickets (localStorage)
│   │   ├── hooks/useTicketHistory.ts
│   │   └── lib/api.ts           # API client + SSE
│   ├── src/components/__tests__/ # 8 test files (Vitest + RTL)
│   ├── src/app/__tests__/       # 1 test file (Vitest + RTL)
│   ├── Dockerfile               # Multi-stage standalone build
│   └── package.json
├── docker-compose.yml           # Backend + Frontend + Redis
├── seed.sql                     # Sample data for Supabase
└── .github/workflows/ci.yml    # Ruff + pytest + ESLint + Docker build

⚙️ Development Commands

All day-to-day workflows are managed via make. Run make help to see the full list.

Stack

Command Description
make up 🚀 Start full stack (backend + frontend + Redis)
make down 🛑 Stop stack and remove images + dangling layers
make restart 🔄 Restart stack (preserves DB state)
make logs 📋 Tail backend logs (make logs s=frontend for frontend)
make clean 🧹 Nuclear clean — remove everything including base images
make db-reset 🗄️ Reset & reseed Supabase database

Testing & Lint

Command Description
make ci 🔁 Full CI pipeline: lint → test → build
make test ✅ Run all tests (backend + frontend)
make test-backend 🐍 Backend pytest with 100% coverage enforcement
make test-frontend ⚛️ Frontend Vitest with coverage
make lint 🔍 Lint backend (ruff) + frontend (eslint)
make build 🏗️ Build Docker images (no cache)

Screenshots

Command Description
make screenshots 📸 Capture all UI screenshots (requires stack running)
make ss-dashboard Shot 01: Dashboard overview
make ss-refund Shots 02: Refund HITL suite
make ss-technical Shots 03: Technical HITL suite
make ss-billing Shot 04: Billing resolution
make ss-upgrade Shots 05: Upgrade HITL suite
make ss-reactivate Shot 06: Reactivate resolution
make ss-suspend Shot 07: Suspend HITL suite
make ss-edge Shots 09–13: All edge cases
make ss-cache Shot 15: Semantic cache hit
make ss-metrics Shot 18: Observability metrics
make ss-traces Shot 19: LangSmith traces
make ss-database Shot 21: Database explorer
make ss-tickets Shot 23: Recent tickets

Observability

Every LangGraph run produces a full trace in LangSmith showing the complete pipeline with token counts and latency per step:

classify_intent → validate_customer → write_sql → execute_sql
  → search_docs → propose_action → await_approval → execute_action → generate_response

Setup

  1. Create a free account at smith.langchain.com
  2. Get your API key from Settings → API Keys
  3. Add to your backend/.env:
LANGCHAIN_TRACING_V2=true
LANGCHAIN_API_KEY=lsv2_pt_...
LANGCHAIN_PROJECT=aegis
  1. Verify connectivity:
curl http://localhost:8000/api/tracing-status
# → {"enabled": true, "project": "aegis", "connected": true}

What's Traced

  • Node-level spans via @traceable decorators on all agent nodes
  • LLM calls auto-traced by LangChain (input/output, token counts, model name)
  • Graph execution with run_name="aegis-support-workflow" for easy filtering

⚙️ Environment Variables

Copy backend/.env.example to backend/.env and configure:

Variable Required Description
SUPABASE_URL Supabase project URL
SUPABASE_KEY Supabase anon/public key
SUPABASE_MANAGEMENT_KEY Management API key — needed for make db-reset
GOOGLE_API_KEY Google Gemini key (recommended — free tier available)
OPENAI_API_KEY OpenAI key (GPT-4.1 as smart model)
ANTHROPIC_API_KEY Anthropic key (Claude as alternative smart model)
GROQ_API_KEY Groq key (free, fast inference — good for classification)
FAST_MODEL Fast model name (default: llama-3.1-8b-instant)
SMART_MODEL Smart model name (default: gpt-4.1)
REDIS_URL Redis connection URL (default: redis://localhost:6379)
CACHE_TTL_SECONDS Cache TTL in seconds (default: 3600)
FRONTEND_URL CORS origin (default: http://localhost:3000)
LANGCHAIN_TRACING_V2 Enable LangSmith tracing (default: true)
LANGCHAIN_API_KEY LangSmith API key for tracing
LANGCHAIN_PROJECT LangSmith project name (default: aegis)
DEBUG Enable debug logging (default: false)

✅ = required, ⚡ = need at least one LLM key, ➖ = optional

📄 License

MIT

About

Autonomous Enterprise Action Engine — Multi-agent AI with Human-in-the-Loop approval, dynamic model routing, semantic caching, and real-time observability. FastAPI + LangGraph + Next.js + Supabase.

Topics

Resources

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors