Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
65 changes: 65 additions & 0 deletions demo/grade-your-agent/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
# Demo: Grade Your Agent

Scan an existing Python agent and audit its compliance. Produces a satisfying
**F grade** with 15+ violations across OWASP LLM Top 10, model resilience, and
memory hygiene rule packs.

## What this demo shows

1. `agentspec scan` analyses `app.py` and auto-generates an `agent.yaml` manifest
2. `agentspec audit` scores the manifest against compliance rules
3. The flawed agent fails on: missing guardrails, no fallback model, API key via
`$env:` instead of `$secret:`, no PII scrubbing, no memory TTL, missing cost
controls, missing tool annotations, and no evaluation framework

## Prerequisites

- Node.js 20+
- `ANTHROPIC_API_KEY` environment variable (used by `scan` to analyse source code)
- AgentSpec CLI: `npm i -g @agentspec/cli` (or use `npx`)

## Recording instructions

```bash
# Terminal setup: 100x30, dark background, 14pt font
cd demo/grade-your-agent

export ANTHROPIC_API_KEY=sk-ant-...

# Step 1: Scan the Python agent to generate a manifest
npx agentspec scan --dir . --out agent.yaml

# Step 2: Audit the generated manifest
npx agentspec audit agent.yaml
```

## Expected output

**Scan** shows a spinner while Claude analyses the source code, then writes
`agent.yaml`.

**Audit** prints a coloured compliance report:

- Overall score: ~11/100
- Grade: **F**
- 15+ violations across 3 rule packs:
- `owasp-llm-top10`: no input/output guardrails, API key not in secret
manager, missing tool annotations, no eval/CI gate, no cost controls
- `model-resilience`: no fallback model, no cost controls
- `memory-hygiene`: no PII scrub fields, no memory TTL, no audit log

## Why this agent scores an F

The `app.py` file is deliberately designed with every common anti-pattern:

| Anti-pattern | Audit rules triggered |
|---|---|
| API key via `os.environ.get()` | SEC-LLM-10 |
| No guardrails imports/code | SEC-LLM-01, SEC-LLM-02 |
| No fallback model | MODEL-01 |
| Redis + Postgres memory, no PII scrub | SEC-LLM-06, MEM-01 |
| No memory TTL | MEM-02 |
| No audit log | MEM-03 |
| Tools without annotations | SEC-LLM-07, SEC-LLM-08 |
| No cost controls or rate limits | SEC-LLM-04, MODEL-03 |
| No evaluation framework | SEC-LLM-09 |
173 changes: 173 additions & 0 deletions demo/grade-your-agent/app.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,173 @@
"""
Support Ticket Agent - routes and responds to customer support tickets.
Connects to Redis for conversation history and Postgres for ticket storage.
"""

import os
import json

import redis
import psycopg2
from openai import OpenAI

# ---- Model setup (no fallback, key via env var, not secret manager) ----------

client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))

# ---- Memory (no TTL, no PII scrubbing, no audit log) ------------------------

redis_client = redis.Redis.from_url(
os.environ.get("REDIS_URL", "redis://localhost:6379")
)
db_conn = psycopg2.connect(os.environ.get("DATABASE_URL"))

# ---- System prompt (inline, not loaded from file) ----------------------------

SYSTEM_PROMPT = """You are a customer support agent for Acme Corp.
Help users search, create, escalate, and close support tickets.
Be helpful and resolve issues as quickly as possible."""


# ---- Tools (no annotations, includes destructive operations) -----------------


def search_tickets(query: str) -> list[dict]:
"""Search existing support tickets by keyword."""
cur = db_conn.cursor()
cur.execute(
"SELECT id, subject, status FROM tickets WHERE subject ILIKE %s",
(f"%{query}%",),
)
return [{"id": r[0], "subject": r[1], "status": r[2]} for r in cur.fetchall()]


def create_ticket(subject: str, body: str, priority: str = "medium") -> dict:
"""Create a new support ticket."""
cur = db_conn.cursor()
cur.execute(
"INSERT INTO tickets (subject, body, priority) VALUES (%s, %s, %s) RETURNING id",
(subject, body, priority),
)
db_conn.commit()
return {"id": cur.fetchone()[0], "status": "created"}


def close_ticket(ticket_id: int) -> dict:
"""Close and permanently delete a support ticket."""
cur = db_conn.cursor()
cur.execute("DELETE FROM tickets WHERE id = %s", (ticket_id,))
db_conn.commit()
return {"id": ticket_id, "status": "deleted"}


def escalate_ticket(ticket_id: int, team: str) -> dict:
"""Escalate a ticket to another team."""
cur = db_conn.cursor()
cur.execute(
"UPDATE tickets SET assigned_team = %s, status = 'escalated' WHERE id = %s",
(team, ticket_id),
)
db_conn.commit()
return {"id": ticket_id, "status": "escalated", "team": team}


TOOLS = [
{
"type": "function",
"function": {
"name": "search_tickets",
"description": "Search existing support tickets by keyword",
"parameters": {
"type": "object",
"properties": {"query": {"type": "string"}},
},
},
},
{
"type": "function",
"function": {
"name": "create_ticket",
"description": "Create a new support ticket",
"parameters": {
"type": "object",
"properties": {
"subject": {"type": "string"},
"body": {"type": "string"},
"priority": {"type": "string", "enum": ["low", "medium", "high"]},
},
},
},
},
{
"type": "function",
"function": {
"name": "close_ticket",
"description": "Close and permanently delete a support ticket",
"parameters": {
"type": "object",
"properties": {"ticket_id": {"type": "integer"}},
},
},
},
{
"type": "function",
"function": {
"name": "escalate_ticket",
"description": "Escalate a ticket to another team",
"parameters": {
"type": "object",
"properties": {
"ticket_id": {"type": "integer"},
"team": {"type": "string"},
},
},
},
},
]


# ---- Chat loop (no guardrails, no cost controls, high temperature) -----------


def chat(user_message: str, session_id: str = "default") -> str:
"""Send a message to the support agent and get a response."""
history_key = f"chat:{session_id}"
history = json.loads(redis_client.get(history_key) or "[]")

history.append({"role": "user", "content": user_message})

response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "system", "content": SYSTEM_PROMPT}] + history,
tools=TOOLS,
temperature=1.8,
)

reply = response.choices[0].message.content or ""
history.append({"role": "assistant", "content": reply})

# Persist to Redis (no TTL, no PII scrub)
redis_client.set(history_key, json.dumps(history))

# Persist to Postgres for long-term storage (no PII scrub, no audit log)
cur = db_conn.cursor()
cur.execute(
"INSERT INTO conversations (session_id, role, content) VALUES (%s, %s, %s)",
(session_id, "user", user_message),
)
cur.execute(
"INSERT INTO conversations (session_id, role, content) VALUES (%s, %s, %s)",
(session_id, "assistant", reply),
)
db_conn.commit()

return reply


if __name__ == "__main__":
print("Support Ticket Agent ready. Type 'quit' to exit.")
while True:
msg = input("> ")
if msg.lower() == "quit":
break
print(chat(msg))
79 changes: 79 additions & 0 deletions demo/yaml-to-agent/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
# Demo: YAML to Running Agent

Go from a spec file to a running agent in four commands. Generates a full
LangGraph Python agent with FastAPI server, guardrails, and a chat endpoint.

## What this demo shows

1. `agentspec validate` confirms the manifest is schema-valid
2. `agentspec generate` produces a complete Python project from the spec
3. The generated `server.py` starts a FastAPI server with a `/v1/chat` endpoint
4. `curl` sends a message and receives a response

## Prerequisites

- Node.js 20+
- Python 3.10+
- `ANTHROPIC_API_KEY` environment variable (used by `generate` for code generation)
- `OPENAI_API_KEY` environment variable (used by the generated agent at runtime)
- AgentSpec CLI: `npm i -g @agentspec/cli` (or use `npx`)

## Recording instructions

```bash
# Terminal setup: 100x30, dark background, 14pt font
cd demo/yaml-to-agent

export ANTHROPIC_API_KEY=sk-ant-...
export OPENAI_API_KEY=sk-...

# Step 1: Validate the manifest
npx agentspec validate agent.yaml

# Step 2: Generate a LangGraph agent
npx agentspec generate agent.yaml --framework langgraph --output ./agent/

# Step 3: Install dependencies and start the server
cd agent
pip install -r requirements.txt
uvicorn server:app --port 8000 &

# Step 4: Send a message
curl -s http://localhost:8000/v1/chat \
-H 'Content-Type: application/json' \
-d '{"message": "hello"}'
```

## Expected output

**Validate** prints a green checkmark confirming the schema is valid.

**Generate** shows a progress spinner while Claude generates the code, then
lists the created files (agent.py, server.py, tools.py, guardrails.py,
requirements.txt, etc.).

**Server** starts uvicorn on port 8000.

**Curl** returns a JSON response with the agent's reply.

## What gets generated

The `generate` command produces a complete Python project in `./agent/`:

| File | Purpose |
|---|---|
| `agent.py` | LangGraph workflow definition |
| `server.py` | FastAPI server with `/v1/chat` endpoint |
| `guardrails.py` | Input/output guardrail enforcement |
| `requirements.txt` | Runtime dependencies |
| `.env.example` | Environment variable template |
| `README.md` | Generated project documentation |

## Alternatives

To scaffold a new manifest from scratch instead of using the provided one:

```bash
npx agentspec init
# Follow the interactive wizard, enable "API endpoint" when prompted
```
34 changes: 34 additions & 0 deletions demo/yaml-to-agent/agent.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
apiVersion: agentspec.io/v1
kind: AgentSpec

metadata:
name: my-agent
version: 0.1.0
description: "A helpful AI assistant that answers questions"

spec:
model:
provider: openai
id: gpt-4o-mini
apiKey: $env:OPENAI_API_KEY
parameters:
temperature: 0.7
maxTokens: 1024

prompts:
system: $file:prompts/system.md

guardrails:
input:
- type: prompt-injection
action: reject
message: "This request was blocked by the input guardrail."
output:
- type: toxicity-filter
threshold: 0.8
action: reject

api:
type: rest
port: 8000
streaming: true
4 changes: 4 additions & 0 deletions demo/yaml-to-agent/prompts/system.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
You are a helpful AI assistant.

Answer questions clearly and concisely. If you are unsure about something, say so
rather than guessing. Be friendly and professional.
Loading