Skip to content

lewisnsmith/flight

Repository files navigation

CI

Flight

Agent verification and trust layer — causal traces, behavioral assertions, and structured trust records for AI agent systems.

Flight evaluates whether what your agents do is correct, expected, and safe. It wraps agent runs through multiple ingress paths, producing causally-linked Trace v2 documents that you can assert against, compare across strategies, and summarize into a trust record.

Agents (TS SDK, Python SDK, MCP Proxy, Claude Code Hooks)
                    ↓
            TraceSession (causal attribution)
                    ↓
     ~/.flight/traces/<run_id>.trace.json
                    ↓
     ~/.flight/trust/<run_id>.trust.json

Four core capabilities:

  1. Causal trace attribution — every event links back to what caused it
  2. Inline behavioral assertions — declarative YAML rules, evaluated post-event
  3. Comparative experiment harness — YAML spec, flight experiment run
  4. Structured trust records — content-hashed JSON summaries, one per run

Origin

During the MathWorks M3 competition, I leaned on AI assistants for brainstorming and data lookup — only to discover, too late, that many of the "facts" and numerical results were hallucinated. The model produced confident, statistically formatted outputs. There was no way to inspect what it had actually done.

That frustration directly led to Flight. It started as a recorder — what did the agent do? — and is now evolving into a verifier: was what the agent did correct?


Quick Start

Try the verified-agent example

git clone https://github.com/lewisnsmith/flight.git
cd flight && npm install && npm run build && npm link

# Run the 2-variant, 3-repetition experiment end-to-end
flight experiment run examples/verified-agent/experiment.yaml

# Inspect a captured trace
flight trace ls
flight trace show <run_id>

Record a trace via the MCP proxy

# Wrap any MCP server transparently — records every call as Trace v2
flight proxy --cmd npx -- -y @modelcontextprotocol/server-filesystem /tmp

# Inspect the causal tree
flight trace show <run_id>

Assert against a recorded trace

# Evaluate YAML behavioral rules against a trace file
flight assert check ~/.flight/traces/<run_id>.trace.json --rules flight.assertions.yaml

Example flight.assertions.yaml:

version: 1
rules:
  - name: search_before_answer
    kind: sequence
    when:
      event_kind: tool_call
      tool_name: answer
    require:
      prior_event:
        kind: tool_call
        tool_name: search
        within_span: run

Inspect a trust record

flight trust show <run_id>

Run a comparison experiment

flight experiment run experiment.yaml
flight experiment compare <run_a> <run_b>

Claude Code integration

git clone https://github.com/lewisnsmith/flight.git
cd flight && npm install && npm run build && npm link

# Install hooks — records every Claude Code session as Trace v2
flight claude install

Trace v2 Schema

The canonical schema lives at packages/flight-proxy/src/schema/trace-v2.schema.json.

Every trace contains:

  • Events — ordered, with kind, payload, causal_link, and span IDs
  • AssertionResults — YAML rule outcomes appended post-evaluation
  • Anomalies — loop detection, error-recovery, assertion failures, schema violations
  • Metadata — model, agent_id, token counts, cost

Every Event carries an optional causal_link:

{
  "event_id": "01J1...",
  "kind": "tool_call",
  "timestamp": "2026-04-26T00:00:01Z",
  "causal_link": {
    "caused_by_event_id": "01J0...",
    "reason": "llm_output_emitted"
  },
  "payload": { "tool_name": "search", "args": { "q": "Mars diameter" } }
}

Causal attribution rules:

  • tool_result → links to the matching tool_call by call_id
  • tool_call → links to the most recent llm.result
  • llm.call → links to the most recent agent.decision or lifecycle.run_start

CLI Reference

# Ingress
flight proxy --cmd <server> -- <args>     # MCP stdio proxy → Trace v2
flight serve [--port 4242]                # HTTP collector → Trace v2

# Assertions
flight assert check <trace>               # Run YAML assertions against a recorded trace
flight assert watch                       # Live-evaluate assertions on incoming events

# Trust records
flight trust show <run_id>                # Print the trust record for a run
flight trust list                         # List all trust records

# Experiments
flight experiment run <spec.yaml>         # Run a YAML experiment spec
flight experiment compare <run_a> <run_b> # Pairwise trust-record comparison

# Trace inspection
flight trace show <trace>                 # Inspect a Trace v2 file (causal tree)
flight trace ls                           # List traces under ~/.flight/traces/

# Claude Code integration
flight claude install                     # Install hooks + slash commands
flight claude uninstall                   # Remove hooks + slash commands

# Internal hooks (used by installed hooks)
flight hook session-start|session-end|user-prompt-submit|post-tool-use

Features

  • Trace v2 schema (TS types + canonical JSON Schema)
  • Causal attribution via TraceSession (cause IDs + parent spans)
  • Four ingress paths: MCP stdio proxy, HTTP collector, TypeScript SDK, Python SDK, Claude Code hooks
  • YAML behavioral assertions (sequence, threshold, regex, precondition) with async post-event evaluation
  • Content-hashed trust records, written one per run
  • YAML experiment harness with flight experiment run|compare
  • End-to-end verified-agent example under examples/verified-agent/

Data Locations

  • ~/.flight/traces/<run_id>.trace.json — Trace v2 files
  • ~/.flight/trust/<run_id>.trust.json — trust records

Install

git clone https://github.com/lewisnsmith/flight.git
cd flight && npm install && npm run build && npm link

Requires Node.js 20+. No external database or cloud service required.

For the Python SDK:

pip install -e sdk/python/

Testing

npm run test

Python SDK tests:

cd sdk/python && python3 -m pytest tests/ -v

Documentation


Out of Scope

  • Visualization / UI / dashboards
  • Cryptographic signing of trust records
  • Cloud sync or multi-tenant infrastructure
  • Pre-execution interception or guardrails
  • Backwards compatibility with v1 .jsonl traces

License

MIT

About

MCP/tool call flight recorder | transparent STDIO proxy that logs every AI agent tool call for inspection, debugging, and research

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors