Flight

Agent verification and trust layer — causal traces, behavioral assertions, and structured trust records for AI agent systems.

Flight evaluates whether what your agents do is correct, expected, and safe. It wraps agent runs through multiple ingress paths, producing causally-linked Trace v2 documents that you can assert against, compare across strategies, and summarize into a trust record.

Agents (TS SDK, Python SDK, MCP Proxy, Claude Code Hooks)
                    ↓
            TraceSession (causal attribution)
                    ↓
     ~/.flight/traces/<run_id>.trace.json
                    ↓
     ~/.flight/trust/<run_id>.trust.json

Four core capabilities:

Causal trace attribution — every event links back to what caused it
Inline behavioral assertions — declarative YAML rules, evaluated post-event
Comparative experiment harness — YAML spec, flight experiment run
Structured trust records — content-hashed JSON summaries, one per run

Origin

During the MathWorks M3 competition, I leaned on AI assistants for brainstorming and data lookup — only to discover, too late, that many of the "facts" and numerical results were hallucinated. The model produced confident, statistically formatted outputs. There was no way to inspect what it had actually done.

That frustration directly led to Flight. It started as a recorder — what did the agent do? — and is now evolving into a verifier: was what the agent did correct?

Quick Start

Try the verified-agent example

git clone https://github.com/lewisnsmith/flight.git
cd flight && npm install && npm run build && npm link

# Run the 2-variant, 3-repetition experiment end-to-end
flight experiment run examples/verified-agent/experiment.yaml

# Inspect a captured trace
flight trace ls
flight trace show <run_id>

Record a trace via the MCP proxy

# Wrap any MCP server transparently — records every call as Trace v2
flight proxy --cmd npx -- -y @modelcontextprotocol/server-filesystem /tmp

# Inspect the causal tree
flight trace show <run_id>

Assert against a recorded trace

# Evaluate YAML behavioral rules against a trace file
flight assert check ~/.flight/traces/<run_id>.trace.json --rules flight.assertions.yaml

Example flight.assertions.yaml:

version: 1
rules:
  - name: search_before_answer
    kind: sequence
    when:
      event_kind: tool_call
      tool_name: answer
    require:
      prior_event:
        kind: tool_call
        tool_name: search
        within_span: run

Inspect a trust record

flight trust show <run_id>

Run a comparison experiment

flight experiment run experiment.yaml
flight experiment compare <run_a> <run_b>

Claude Code integration

git clone https://github.com/lewisnsmith/flight.git
cd flight && npm install && npm run build && npm link

# Install hooks — records every Claude Code session as Trace v2
flight claude install

Trace v2 Schema

The canonical schema lives at packages/flight-proxy/src/schema/trace-v2.schema.json.

Every trace contains:

Events — ordered, with kind, payload, causal_link, and span IDs
AssertionResults — YAML rule outcomes appended post-evaluation
Anomalies — loop detection, error-recovery, assertion failures, schema violations
Metadata — model, agent_id, token counts, cost

Every Event carries an optional causal_link:

{
  "event_id": "01J1...",
  "kind": "tool_call",
  "timestamp": "2026-04-26T00:00:01Z",
  "causal_link": {
    "caused_by_event_id": "01J0...",
    "reason": "llm_output_emitted"
  },
  "payload": { "tool_name": "search", "args": { "q": "Mars diameter" } }
}

Causal attribution rules:

tool_result → links to the matching tool_call by call_id
tool_call → links to the most recent llm.result
llm.call → links to the most recent agent.decision or lifecycle.run_start

CLI Reference

# Ingress
flight proxy --cmd <server> -- <args>     # MCP stdio proxy → Trace v2
flight serve [--port 4242]                # HTTP collector → Trace v2

# Assertions
flight assert check <trace>               # Run YAML assertions against a recorded trace
flight assert watch                       # Live-evaluate assertions on incoming events

# Trust records
flight trust show <run_id>                # Print the trust record for a run
flight trust list                         # List all trust records

# Experiments
flight experiment run <spec.yaml>         # Run a YAML experiment spec
flight experiment compare <run_a> <run_b> # Pairwise trust-record comparison

# Trace inspection
flight trace show <trace>                 # Inspect a Trace v2 file (causal tree)
flight trace ls                           # List traces under ~/.flight/traces/

# Claude Code integration
flight claude install                     # Install hooks + slash commands
flight claude uninstall                   # Remove hooks + slash commands

# Internal hooks (used by installed hooks)
flight hook session-start|session-end|user-prompt-submit|post-tool-use

Features

Trace v2 schema (TS types + canonical JSON Schema)
Causal attribution via TraceSession (cause IDs + parent spans)
Four ingress paths: MCP stdio proxy, HTTP collector, TypeScript SDK, Python SDK, Claude Code hooks
YAML behavioral assertions (sequence, threshold, regex, precondition) with async post-event evaluation
Content-hashed trust records, written one per run
YAML experiment harness with flight experiment run|compare
End-to-end verified-agent example under examples/verified-agent/

Data Locations

~/.flight/traces/<run_id>.trace.json — Trace v2 files
~/.flight/trust/<run_id>.trust.json — trust records

Install

git clone https://github.com/lewisnsmith/flight.git
cd flight && npm install && npm run build && npm link

Requires Node.js 20+. No external database or cloud service required.

For the Python SDK:

pip install -e sdk/python/

Testing

npm run test

Python SDK tests:

cd sdk/python && python3 -m pytest tests/ -v

Documentation

ARCHITECTURE.md — internal architecture and design decisions
docs/refactor-plan.md — v2 refactor plan
docs/build-narrative.md — how it was built and why

Out of Scope

Visualization / UI / dashboards
Cryptographic signing of trust records
Cloud sync or multi-tenant infrastructure
Pre-execution interception or guardrails
Backwards compatibility with v1 .jsonl traces

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 127 Commits
.claude/commands		.claude/commands
.github/workflows		.github/workflows
docs		docs
examples/verified-agent		examples/verified-agent
packages		packages
sdk/python		sdk/python
.gitattributes		.gitattributes
.gitignore		.gitignore
ARCHITECTURE.md		ARCHITECTURE.md
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Flight

Origin

Quick Start

Try the verified-agent example

Record a trace via the MCP proxy

Assert against a recorded trace

Inspect a trust record

Run a comparison experiment

Claude Code integration

Trace v2 Schema

CLI Reference

Features

Data Locations

Install

Testing

Documentation

Out of Scope

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Flight

Origin

Quick Start

Try the verified-agent example

Record a trace via the MCP proxy

Assert against a recorded trace

Inspect a trust record

Run a comparison experiment

Claude Code integration

Trace v2 Schema

CLI Reference

Features

Data Locations

Install

Testing

Documentation

Out of Scope

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages