Show and Tell: OpenAgentAudit — audit evidence layer for AI agents (spec + profiles now open) #4

telleroutlook · 2026-06-28T02:29:48Z

telleroutlook
Jun 28, 2026
Maintainer

Show and Tell: OpenAgentAudit — audit evidence layer for AI agents (spec + profiles now open)

Hey everyone,

I've been working on a recurring problem in production AI agent deployments: the gap between what LLM observability gives you and what an actual audit requires.

The short version: agent logs are not audit evidence. They're built to answer engineering questions (latency, cost, prompt version). Audit requires different things — tamper-evident records, signed policy decisions captured before tool execution, rubric-referenced severity, reproducible findings with stable IDs.

I just open-sourced the specification and tooling that fills this gap:

→ WasmAgent/open-agent-audit

What's in the repo

Canonical evidence schema (spec/versions/v0.1/SPEC.md) — hash-chained, versioned, adapter-agnostic
JSON Schema artifacts at schemas/v0.1/
Four regulatory mapping profiles:
- OWASP Agentic Top 10 (2026 draft)
- NIST AI RMF 1.0
- ISO/IEC 42001
- EU AI Act Annex IV documentation requirements
Evidence Admission Score — a rubric for rating how defensible a piece of evidence is
Adapter contracts for AEP v0.2, OpenTelemetry GenAI spans, Langfuse, LangSmith, bscode rollout traces
Cloudflare-native reference architecture — no GPU, no Python runtime, runs on Workers + D1

Why the "audit ≠ observability" framing

Four concrete gaps I keep running into:

Spans record what happened, not what was authorized. There's no structural guarantee the policy ran before the tool, not after.
Traces are mutable. Most observability backends give you no way to detect if a span was edited post-hoc.
Severities without rubrics are meaningless. If you can't cite the rubric, a reviewer can't challenge the severity.
Benchmark deltas without paired statistics are not evidence. I had a +10pp claim collapse to -1pp after fixing protocol consistency and running McNemar. The eval dashboard showed a pretty bar chart.

Status

The spec is in release-candidate shape. The TypeScript implementation packages are deliberate skeletons — code lands after the canonical model has held shape for a release-candidate window.

The production deployment is at trustavo.com — running on Cloudflare Workers.

Questions for this community

I'm most interested in hearing from people who have been on the receiving end of an AI deployment audit:

What did your security / compliance team actually ask for? What format would have saved you a week?
Which regulatory framework matters most to your customers right now — EU AI Act, ISO 42001, NIST AI RMF, OWASP, SOC 2?
If this tooling could produce one artifact for a procurement reviewer, what would be most useful?

Open an issue, comment here, or email me. Star the repo if you want to follow along — it's also useful signal for whether to keep building.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WasmAgent

Show and Tell: OpenAgentAudit — audit evidence layer for AI agents (spec + profiles now open) #4

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

WasmAgent

Show and Tell: OpenAgentAudit — audit evidence layer for AI agents (spec + profiles now open) #4

Uh oh!

telleroutlook Jun 28, 2026 Maintainer

Show and Tell: OpenAgentAudit — audit evidence layer for AI agents (spec + profiles now open)

What's in the repo

Why the "audit ≠ observability" framing

Status

Questions for this community

Replies: 0 comments

telleroutlook
Jun 28, 2026
Maintainer