Agent verification and trust layer — causal traces, behavioral assertions, and structured trust records for AI agent systems.
Flight evaluates whether what your agents do is correct, expected, and safe. It wraps agent runs through multiple ingress paths, producing causally-linked Trace v2 documents that you can assert against, compare across strategies, and summarize into a trust record.
Agents (TS SDK, Python SDK, MCP Proxy, Claude Code Hooks)
↓
TraceSession (causal attribution)
↓
~/.flight/traces/<run_id>.trace.json
↓
~/.flight/trust/<run_id>.trust.json
Four core capabilities:
- Causal trace attribution — every event links back to what caused it
- Inline behavioral assertions — declarative YAML rules, evaluated post-event
- Comparative experiment harness — YAML spec,
flight experiment run - Structured trust records — content-hashed JSON summaries, one per run
During the MathWorks M3 competition, I leaned on AI assistants for brainstorming and data lookup — only to discover, too late, that many of the "facts" and numerical results were hallucinated. The model produced confident, statistically formatted outputs. There was no way to inspect what it had actually done.
That frustration directly led to Flight. It started as a recorder — what did the agent do? — and is now evolving into a verifier: was what the agent did correct?
git clone https://github.com/lewisnsmith/flight.git
cd flight && npm install && npm run build && npm link
# Run the 2-variant, 3-repetition experiment end-to-end
flight experiment run examples/verified-agent/experiment.yaml
# Inspect a captured trace
flight trace ls
flight trace show <run_id># Wrap any MCP server transparently — records every call as Trace v2
flight proxy --cmd npx -- -y @modelcontextprotocol/server-filesystem /tmp
# Inspect the causal tree
flight trace show <run_id># Evaluate YAML behavioral rules against a trace file
flight assert check ~/.flight/traces/<run_id>.trace.json --rules flight.assertions.yamlExample flight.assertions.yaml:
version: 1
rules:
- name: search_before_answer
kind: sequence
when:
event_kind: tool_call
tool_name: answer
require:
prior_event:
kind: tool_call
tool_name: search
within_span: runflight trust show <run_id>flight experiment run experiment.yaml
flight experiment compare <run_a> <run_b>git clone https://github.com/lewisnsmith/flight.git
cd flight && npm install && npm run build && npm link
# Install hooks — records every Claude Code session as Trace v2
flight claude installThe canonical schema lives at packages/flight-proxy/src/schema/trace-v2.schema.json.
Every trace contains:
- Events — ordered, with
kind,payload,causal_link, and span IDs - AssertionResults — YAML rule outcomes appended post-evaluation
- Anomalies — loop detection, error-recovery, assertion failures, schema violations
- Metadata — model, agent_id, token counts, cost
Every Event carries an optional causal_link:
{
"event_id": "01J1...",
"kind": "tool_call",
"timestamp": "2026-04-26T00:00:01Z",
"causal_link": {
"caused_by_event_id": "01J0...",
"reason": "llm_output_emitted"
},
"payload": { "tool_name": "search", "args": { "q": "Mars diameter" } }
}Causal attribution rules:
tool_result→ links to the matchingtool_callbycall_idtool_call→ links to the most recentllm.resultllm.call→ links to the most recentagent.decisionorlifecycle.run_start
# Ingress
flight proxy --cmd <server> -- <args> # MCP stdio proxy → Trace v2
flight serve [--port 4242] # HTTP collector → Trace v2
# Assertions
flight assert check <trace> # Run YAML assertions against a recorded trace
flight assert watch # Live-evaluate assertions on incoming events
# Trust records
flight trust show <run_id> # Print the trust record for a run
flight trust list # List all trust records
# Experiments
flight experiment run <spec.yaml> # Run a YAML experiment spec
flight experiment compare <run_a> <run_b> # Pairwise trust-record comparison
# Trace inspection
flight trace show <trace> # Inspect a Trace v2 file (causal tree)
flight trace ls # List traces under ~/.flight/traces/
# Claude Code integration
flight claude install # Install hooks + slash commands
flight claude uninstall # Remove hooks + slash commands
# Internal hooks (used by installed hooks)
flight hook session-start|session-end|user-prompt-submit|post-tool-use- Trace v2 schema (TS types + canonical JSON Schema)
- Causal attribution via
TraceSession(cause IDs + parent spans) - Four ingress paths: MCP stdio proxy, HTTP collector, TypeScript SDK, Python SDK, Claude Code hooks
- YAML behavioral assertions (sequence, threshold, regex, precondition) with async post-event evaluation
- Content-hashed trust records, written one per run
- YAML experiment harness with
flight experiment run|compare - End-to-end verified-agent example under
examples/verified-agent/
~/.flight/traces/<run_id>.trace.json— Trace v2 files~/.flight/trust/<run_id>.trust.json— trust records
git clone https://github.com/lewisnsmith/flight.git
cd flight && npm install && npm run build && npm linkRequires Node.js 20+. No external database or cloud service required.
For the Python SDK:
pip install -e sdk/python/npm run testPython SDK tests:
cd sdk/python && python3 -m pytest tests/ -vARCHITECTURE.md— internal architecture and design decisionsdocs/refactor-plan.md— v2 refactor plandocs/build-narrative.md— how it was built and why
- Visualization / UI / dashboards
- Cryptographic signing of trust records
- Cloud sync or multi-tenant infrastructure
- Pre-execution interception or guardrails
- Backwards compatibility with v1
.jsonltraces
MIT