From ba31b9f83f22b130e18cb67ec1de82a851fffea1 Mon Sep 17 00:00:00 2001 From: claude-bot-go Date: Fri, 3 Jul 2026 21:28:10 +0800 Subject: [PATCH] Fix #4: Document integration with Agent Trust Infrastructure --- README.md | 3 ++ docs/audit-integration.md | 75 +++++++++++++++++++++++++++++++++++++++ 2 files changed, 78 insertions(+) diff --git a/README.md b/README.md index 8ed8d20..6531dac 100644 --- a/README.md +++ b/README.md @@ -87,9 +87,12 @@ See [`docs/protocol-faep.md`](docs/protocol-faep.md) for the full schema. |---|---|---| | `wasmagent-js` | Sandbox / tool-use runtime reference | No | | `open-agent-audit` | Evidence record enhancement layer | Optional | +| `agent-trust-infra` | Trust Passport & AgentBOM standards for evidence | See docs | | `trace-pipeline` | Export failure traces as training data | Phase 2 | | `bscode` | Coding task source / solver baseline | Phase 2 | +See [`docs/audit-integration.md`](docs/audit-integration.md) for details on how FreshArena FAEP records map to the Trust Passport schema. + --- ## License diff --git a/docs/audit-integration.md b/docs/audit-integration.md index c82ef8c..2fba025 100644 --- a/docs/audit-integration.md +++ b/docs/audit-integration.md @@ -29,3 +29,78 @@ FreshArena produces FAEP evaluation records. `open-agent-audit` can serve as an When connecting, FreshArena emits one `faep_record` JSONL line per evaluation run. open-agent-audit ingests it via its standard evidence adapter. No structural changes to FAEP records are required — open-agent-audit wraps them, it does not replace them. See [`docs/protocol-faep.md`](protocol-faep.md) for the full FAEP record schema. + +--- + +## Agent Trust Infrastructure Integration + +FreshArena evaluation records are designed to align with the **Agent Trust Infrastructure**'s Trust Passport and AgentBOM specifications defined in the sibling `agent-trust-infra` repository. + +### Mapping FAEP Records to Trust Passport + +FreshArena's `FaepRecord` serves as evidence artifacts that can be embedded within a Trust Passport: + +| FAEP Field | Trust Passport Concept | Purpose | +|---|---|---| +| `run_id` | `evaluation_run_id` | Links evaluation to a specific test execution | +| `task.id` + `task.seed_hash` | `test_case_id` | Uniquely identifies the evaluated task instance | +| `solver.id` + `solver.track` | `agent_identifier` | Identifies which agent was evaluated | +| `solver.model_metadata_hash` | `agent.config_hash` | Links to AgentBOM component configuration | +| `solver.workflow_hash` | `agent.workflow_hash` | Attests to the agent's workflow/prompt configuration | +| `solver.artifact_hash` | `agent.binary_hash` | Links to the executable artifact | +| `score.canonical_pass` | `evaluation_result.pass` | Primary correctness verdict | +| `score.adversarial_pass` | `evaluation_result.adversarial_check` | Post-commit robustness evidence | +| `verifier.package` + `verifier.version` | `verifier_reference` | Links to deterministic verification standard | +| `verifier.result_hash` | `evidence_fingerprint` | Cryptographic fingerprint of the verification result | +| `replay.command` + `replay.log_hash` | `reproducibility_artifact` | Enables third-party verification | + +### FAEP as AgentBOM Evidence Source + +FreshArena records provide evidence that can be referenced in an AgentBOM: + +1. **Component Verification**: The `solver.model_metadata_hash` and `solver.workflow_hash` provide provenance for the agent's configuration at test time. + +2. **Version Evidence**: The `generator.version`, `tester.version`, and `verifier.version` fields document the full evaluation stack. + +3. **Deterministic Verification**: The `verifier.result_hash` combined with `task.seed_hash` creates a reproducible fingerprint that can be independently verified. + +### Consumption Pattern + +To integrate FreshArena results with a Trust Passport: + +```json +{ + "trust_passport": { + "agent_id": "solver:my-agent-v1", + "evaluations": [ + { + "source": "FreshArena", + "faep_record_ref": "faep:run_abc123_task_xyz789", + "task_family": "json_transform.normalize.v0", + "evidence_type": "deterministic_verification", + "timestamp": "2025-01-15T10:30:00Z", + "result": { + "canonical_pass": true, + "adversarial_pass": false, + "fresh_fixed_gap": 0.15 + } + } + ] + } +} +``` + +### Key Differences in Focus + +| Aspect | FreshArena (FAEP) | Agent Trust Infrastructure | +|---|---|---| +| Primary Goal | Detect overfitting via fresh task generation | Aggregate and verify agent claims across projects | +| Evidence Type | Per-task evaluation records with adversarial checks | Cross-domain attestation and provenance | +| Replayability | Full deterministic replay via seed + verifier package | Claim verification via linked evidence artifacts | +| Freshness Check | Core: compares fresh vs fixed task performance | Optional: one of many evidence sources | + +### References + +- **Agent Trust Infrastructure**: https://github.com/WasmAgent/agent-trust-infra +- **Trust Passport Spec**: Trust Passport defines the standard schema for agent evaluation evidence +- **AgentBOM Spec**: AgentBOM defines the standard schema for agent component documentation