agents-oss
diff --git a/‎docs/.vitepress/config.mts‎
Lines changed: 1 addition & 0 deletions b/‎docs/.vitepress/config.mts‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎docs/RUNBOOK.md‎
Lines changed: 38 additions & 13 deletions b/‎docs/RUNBOOK.md‎
Lines changed: 38 additions & 13 deletions
diff --git a/‎docs/concepts/compliance.md‎
Lines changed: 27 additions & 0 deletions b/‎docs/concepts/compliance.md‎
Lines changed: 27 additions & 0 deletions
diff --git a/‎docs/concepts/opa.md‎
Lines changed: 64 additions & 22 deletions b/‎docs/concepts/opa.md‎
Lines changed: 64 additions & 22 deletions
@@ -50,6 +50,7 @@ export default defineConfig({
           { text: 'Add Runtime Health', link: '/guides/add-runtime-health' },
           { text: 'Add Push Mode', link: '/guides/add-push-mode' },
           { text: 'CI Integration', link: '/guides/ci-integration' },
+          { text: 'E2E Testing', link: '/guides/e2e-testing' },
           { text: 'Migrate an Existing Agent', link: '/guides/migrate-existing-agent' },
           { text: 'Migrate GymCoach', link: '/guides/migrate-gymcoach' },
           { text: 'Migrate OpenAGI', link: '/guides/migrate-openagi' },
 
@@ -69,30 +69,34 @@ helm upgrade  my-agent ./out/ -f out/values.yaml
 | `CONTROL_PLANE_PORT` | No | `4001` | Control plane listen port |
 | `ANTHROPIC_API_KEY` | No | — | Enables LLM gap analysis (`GET /agentspec/gap`) |
 | `AUDIT_RING_SIZE` | No | `1000` | Max audit ring entries retained in memory |
-| `OPA_URL` | No | — | OPA base URL (e.g. `http://localhost:8181`). When set, `/gap` calls OPA for behavioral violations AND the proxy evaluates every request. Fails-open if OPA is unreachable. |
-| `OPA_PROXY_MODE` | No | `track` | Per-request OPA mode on the proxy (port 4000). `track` — record violations in the audit ring and add `X-AgentSpec-OPA-Violations` header, but forward the request. `enforce` — block with `403 PolicyViolation` before forwarding. `off` — disable proxy OPA checks entirely. |
+| `OPA_URL` | No | — | OPA base URL (e.g. `http://localhost:8181`). When set, `/gap` calls OPA for behavioral violations AND the proxy evaluates agent response headers. Fails-open if OPA is unreachable. |
+| `OPA_PROXY_MODE` | No | `track` | HeaderReporting OPA mode on the proxy (port 4000). `track` — record violations in audit ring + `X-AgentSpec-OPA-Violations` header, never blocks. `enforce` — replace agent response with `403 PolicyViolation` when OPA denies. `off` — disable proxy OPA checks entirely. OPA is only called when the agent sets `X-AgentSpec-*` response headers (sdk-langgraph `AgentSpecMiddleware`). |
 
 `UPSTREAM_URL` and `MANIFEST_PATH` must be set correctly. The sidecar will fail to start if `UPSTREAM_URL` is not a valid `http://` or `https://` URL, or if port values are non-integer.
 
 ---
 
 ---
 
-## OPA Request Headers
+## OPA Behavioral Observation (HeaderReporting + EventPush)
 
-When `OPA_URL` is set, the proxy reads these headers from the incoming request to populate the OPA input document. Set them from your agent code (or `GuardrailMiddleware`) to give OPA the full runtime context it needs to enforce policies accurately.
+OPA now evaluates **real agent behavior** — not honor-system client headers. There are two reporting paths:
 
-| Header | Example | Description |
-|--------|---------|-------------|
-| `X-AgentSpec-Guardrails-Invoked` | `pii-detector,toxicity-filter` | Comma-separated list of guardrail types actually run on this request |
-| `X-AgentSpec-Tools-Called` | `plan-workout,log-session` | Comma-separated list of tools invoked |
-| `X-AgentSpec-User-Confirmed` | `true` | Set to `true` if the user explicitly confirmed a destructive action |
+### HeaderReporting — Agent response headers (sdk-langgraph `AgentSpecMiddleware`)
 
-When these headers are absent, the proxy uses worst-case defaults (`guardrails_invoked: []`, `tools_called: []`). In `track` mode this records a violation. In `enforce` mode, any declared guardrail will cause a 403.
+The `agentspec-langgraph` `AgentSpecMiddleware` sets internal headers on the agent's HTTP **response** after processing:
 
-The proxy sets `X-AgentSpec-OPA-Violations` on every response where violations fired (regardless of mode), so clients and upstream tooling can observe policy gaps.
+| Response header (agent → sidecar) | Description |
+|-----------------------------------|-------------|
+| `X-AgentSpec-Guardrails-Invoked` | Comma-separated guardrail types that actually ran |
+| `X-AgentSpec-Tools-Called` | Comma-separated tool names that were called |
+| `X-AgentSpec-User-Confirmed` | `true` if user confirmed a destructive action |
 
-In `enforce` mode, the sidecar returns a structured error **before** forwarding to the upstream agent:
+The sidecar proxy reads these in its `onResponse` callback and **strips them before forwarding to the client**. Clients never see these headers. OPA is only called when at least one behavioral header is present.
+
+The proxy sets `X-AgentSpec-OPA-Violations` on every response where violations fired (regardless of mode), so clients can observe policy gaps.
+
+In `enforce` mode, when OPA denies based on agent response headers:
 
 ```
 HTTP/1.1 403 Forbidden
@@ -102,7 +106,28 @@ Content-Type: application/json
 {"error":"PolicyViolation","blocked":true,"violations":["pii_detector_not_invoked"],"message":"Request blocked by OPA policy: pii_detector_not_invoked"}
 ```
 
-When OPA is unreachable the proxy **fails open** (forwards the request with a warning log) regardless of mode. Set `OPA_PROXY_MODE=off` to silence OPA calls entirely while keeping `OPA_URL` set for `/gap`.
+> **Note:** Unlike the old implementation, `enforce` mode evaluates the agent's response headers. The upstream agent **always** processes the request. Only the client-visible response is blocked (replaced with 403).
+
+### EventPush — Out-of-band event push (sdk-langgraph `SidecarClient`)
+
+The agent pushes behavioral events after each request via `POST /agentspec/events` on the control plane (port 4001). EventPush always records regardless of `OPA_PROXY_MODE`.
+
+```bash
+curl -X POST http://localhost:4001/agentspec/events \
+  -H "Content-Type: application/json" \
+  -d '{
+    "requestId": "<x-request-id from proxy>",
+    "agentName": "gymcoach",
+    "events": [
+      {"type":"guardrail","guardrailType":"pii-detector","invoked":true,"blocked":false},
+      {"type":"tool","name":"plan-workout","success":true,"latencyMs":82}
+    ]
+  }'
+# 200 {"requestId":"...","found":true,"opaViolations":[]}
+# 202 {"requestId":"...","found":false}  ← race (no retry needed)
+```
+
+When OPA is unreachable the proxy **fails open** (forwards the request) regardless of mode. Set `OPA_PROXY_MODE=off` to silence HeaderReporting OPA calls entirely while keeping `OPA_URL` set for `/gap` and EventPush.
 
 ---
 
 
@@ -30,6 +30,33 @@ Output:
     https://owasp.org/www-project-top-10-for-large-language-model-applications/
 ```
 
+## Evidence Tiers
+
+Every audit rule and gap issue carries an **evidence tier** label that tells you what kind of evidence backs the finding:
+
+| Badge | Tier | Meaning |
+|-------|------|---------|
+| `[D]` | Declarative | Manifest analysis only — we read the YAML, no I/O required |
+| `[P]` | Probed | Health check verified at infrastructure level (`agentspec health`) |
+| `[B]` | Behavioral | Runtime events confirmed actual execution (sdk-langgraph + EventPush) |
+
+All current audit rules are `[D]` — declarative. The grade (A–F) reflects manifest declarations only.
+
+The `agentspec audit` output shows `[D]` badges next to each violation:
+
+```
+  [critical] [D] SEC-LLM-06 — Sensitive data disclosure
+    Long-term memory declared without piiScrubFields
+    → Add spec.memory.hygiene.piiScrubFields: [ssn, credit_card, bank_account]
+
+  Evidence Breakdown
+    [D] Declarative  18/22  (manifest declarations)
+    [P] Probed        N/A   (run `agentspec health <file>` for live checks)
+    [B] Behavioral    N/A   (no runtime events — deploy with sdk-langgraph + EventPush)
+```
+
+See [Probe Coverage](./probe-coverage.md) for a complete field-by-field matrix of what each tier verifies.
+
 ## Compliance Packs
 
 ### `owasp-llm-top10`
 
@@ -131,15 +131,61 @@ The `agentspec-langgraph` Python package provides this for LangGraph agents. It
 
 See [LangGraph Runtime Instrumentation](../adapters/langgraph.md#runtime-behavioral-instrumentation) for the full integration guide.
 
-## Per-request proxy enforcement
+## Behavioral observation pipeline
 
-The sidecar proxy (port 4000) evaluates OPA on **every request** when `OPA_URL` is set. The mode is controlled by the `OPA_PROXY_MODE` env var:
+OPA needs to know what the agent *actually did* — which guardrails fired, which tools were called. This data comes from the `agentspec-langgraph` sub-SDK via one of two reporting paths:
 
-| Mode | Behaviour |
-|------|-----------|
-| `track` (default) | Record violations in the audit ring; add `X-AgentSpec-OPA-Violations` response header; forward the request. Safe for initial rollout — never blocks traffic. |
-| `enforce` | Block with `403 PolicyViolation` **before forwarding to the upstream agent**. Use after validating policies in `track` mode. |
-| `off` | Skip proxy OPA checks entirely. `/gap` still calls OPA if `OPA_URL` is set. |
+### HeaderReporting — Agent response headers
+
+`AgentSpecMiddleware` (FastAPI/Starlette) sets internal headers on the agent's HTTP response after each request completes:
+
+```
+X-AgentSpec-Guardrails-Invoked: pii-detector,toxicity-filter
+X-AgentSpec-Tools-Called: plan-workout
+X-AgentSpec-User-Confirmed: true
+```
+
+The sidecar proxy reads these in its `onResponse` callback, then **strips them before forwarding to the client**. Clients never see these headers.
+
+```python
+from fastapi import FastAPI
+from agentspec_langgraph import AgentSpecMiddleware
+
+app = FastAPI()
+app.add_middleware(AgentSpecMiddleware, guardrail_middleware=guardrail_mw)
+```
+
+### EventPush — Out-of-band event push
+
+`SidecarClient` pushes a batch of behavioral events to `POST /agentspec/events` after each request. This is fire-and-forget and swallows all errors.
+
+```python
+from agentspec_langgraph import GuardrailMiddleware, SidecarClient
+
+sidecar = SidecarClient(url="http://localhost:4001")
+middleware = GuardrailMiddleware(agent_name="gymcoach")
+
+async with middleware.new_request_context(
+    request_id=request.headers.get("x-request-id"),
+    sidecar_client=sidecar,
+) as ctx:
+    content = ctx.wrap("pii-detector", pii_fn)(user_input)
+# → On exit: events pushed to POST /agentspec/events
+```
+
+EventPush always records behavioral data regardless of `OPA_PROXY_MODE`. HeaderReporting (response headers) triggers OPA evaluation in the proxy.
+
+## Per-request proxy enforcement (HeaderReporting)
+
+The sidecar proxy (port 4000) evaluates OPA on agent response headers when `OPA_URL` is set. The mode is controlled by the `OPA_PROXY_MODE` env var:
+
+| Mode | Trigger | Behaviour |
+|------|---------|-----------|
+| `track` (default) | Agent response headers present | Record violations in the audit ring; add `X-AgentSpec-OPA-Violations` response header; forward the response to client. Safe for initial rollout — never blocks. |
+| `enforce` | Agent response headers present | If OPA denies: sidecar replaces agent response with `403 PolicyViolation`. Agent always processes the request; only the client-visible response is blocked. |
+| `off` | — | Skip proxy OPA checks entirely. `/gap` still calls OPA if `OPA_URL` is set. |
+
+> **Note:** If the agent does not set `X-AgentSpec-*` response headers (e.g. not using sdk-langgraph), OPA is not called and the request passes through regardless of mode. Use EventPush (`SidecarClient`) for agents that cannot use middleware.
 
 Configure globally (docker-compose or Helm):
 
@@ -163,7 +209,7 @@ Override per-pod with annotation: `agentspec.io/opa-proxy-mode: enforce`.
 
 ### 403 PolicyViolation response
 
-When `enforce` mode blocks a request, the sidecar returns before the upstream agent ever sees the request:
+When `enforce` mode blocks a request based on agent response headers, the sidecar replaces the upstream response with a 403:
 
 ```
 HTTP/1.1 403 Forbidden
@@ -180,25 +226,21 @@ Content-Type: application/json
 }
 ```
 
-### The honor system — and why it matters
-
-The sidecar builds the OPA input from **request headers**. It does not observe what the agent actually executed. OPA knows `pii-detector` was invoked only because the caller said so via a header:
-
-| Incoming request header | OPA `input` field |
-|-------------------------|-------------------|
-| `X-AgentSpec-Guardrails-Invoked: pii-detector` | `guardrails_invoked: ["pii-detector"]` |
-| `X-AgentSpec-Tools-Called: plan-workout` | `tools_called: ["plan-workout"]` |
-| `X-AgentSpec-User-Confirmed: true` | `user_confirmed: true` |
-
-If a header is **absent**, the field defaults to empty. With `pii-detector` declared in `agent.yaml` and `guardrails_invoked: []`, OPA fires `pii_detector_not_invoked` immediately — because the caller did not declare that the guardrail ran.
+### Enforcement model
 
-This is **declaration-based enforcement**, not execution-verified enforcement. A caller that sets the header without actually running the guardrail passes OPA. To close that gap, use a framework sub-SDK (`agentspec-langgraph` etc.) that sets these headers automatically from real guardrail invocations inside the agent's execution path.
+| Path | Mechanism | Real-time blocking |
+|------|-----------|-------------------|
+| `off` | No OPA calls | — |
+| `track` (HeaderReporting) | Record violations in audit ring + `X-AgentSpec-OPA-Violations` header | Never blocks |
+| `enforce` (HeaderReporting) | OPA evaluates agent response headers; if deny → 403 to client | ✅ Yes (client-side) |
+| EventPush | OPA evaluates pushed events retroactively; updates audit ring | ❌ No (observation) |
+| Agent-side | `GuardrailMiddleware.enforce_opa()` raises `PolicyViolationError` | ✅ Yes (in-process) |
 
 ## Framework sub-SDKs: the other half
 
-OPA evaluates an input document on every request. That document needs live runtime data — which guardrails were invoked, how many tokens were used, which tools were called. The sidecar builds a partial input from the manifest and probe data; for full behavioral coverage you also need a **framework sub-SDK** that intercepts the agent's execution path and sets the headers automatically.
+OPA evaluates an input document on every request. That document needs live runtime data — which guardrails were invoked, how many tokens were used, which tools were called. The sidecar builds a partial input from the manifest and probe data; for full behavioral coverage you also need a **framework sub-SDK** that intercepts the agent's execution path.
 
-The `agentspec-langgraph` Python package provides this for LangGraph agents. It intercepts tool calls, LLM calls, and guardrail invocations and sets `X-AgentSpec-*` headers on outgoing requests so that OPA receives ground truth rather than self-reported data.
+The `agentspec-langgraph` Python package provides this for LangGraph agents. It intercepts tool calls, LLM calls, and guardrail invocations and reports them via HeaderReporting (response headers) or EventPush (out-of-band event push).
 
 See [LangGraph Runtime Instrumentation](../adapters/langgraph.md#runtime-behavioral-instrumentation) for the full integration guide.