RFC: aep/v0.3 — side-effect class, state-digest kind, drift evidence, approval mode

## Design principle (the only litmus test for fields)

The v0.3 schema is graded by whether a downstream reader can answer
three post-run questions from the persisted record alone, without
replaying the app:

1. **What authority was available?**
2. **What authority was exercised?**
3. **What changed between approval and execution?**

Each new field below exists to make exactly one of those answerable.
(Framing credited to [@armorer-labs](https://github.com/armorer-labs)
in [a comment on this RFC](https://github.com/WasmAgent/wasmagent-js/issues/7#issuecomment-4808624245).)

## Acceptance criterion — three-state comparability test

For every `state_digest_*` field a v0.3 record carries, a
third-party reader must be able to decide exactly **one** of three
states without further side-channel context:

1. **comparable** — the digest can be equality-checked against
   another digest of the same kind, without further assumption.
2. **not comparable** — the digest is recorded for observation
   only; equality is not asserted.
3. **comparable under a named weaker assumption** — equality
   holds only if the named assumption (e.g., "best-effort
   snapshot", "durable subset only") is accepted.

If a record forces the reader into "I don't know" rather than
one of the three above, the schema has failed at that record.
The per-kind coverage shapes in Gap 2 are graded against this
test, not against field-count exhaustiveness.

(Acceptance criterion credited to
[@armorer-labs's fourth-round review](https://github.com/WasmAgent/wasmagent-js/issues/7#issuecomment-4810001035).)

## Summary

Draft RFC for `aep/v0.3`, motivated by external review of v0.2 from
[@armorer-labs](https://github.com/armorer-labs) on
[`WasmAgent/wasmagent` discussions #1](https://github.com/WasmAgent/wasmagent/discussions/1)
and [#2](https://github.com/WasmAgent/wasmagent/discussions/2),
followed by a second-round design review on this RFC itself.

The current `aep/v0.2` schema covers most of what an authority receipt
needs, but four field-level gaps make the record harder to audit
after the fact than it should be. This RFC proposes targeted
additions; nothing in v0.2 is removed. A fifth gap (`decision_envelope`)
is intentionally deferred to v0.4 in projection-first form (see below).

## Gap 1 — `side_effect_class` beyond a boolean

**Today:** `state_changing: boolean` on `ActionEvidence`.

**Problem:** A single bit collapses
`read-only`, `mutate-local-sandbox`, `mutate-external-service`, and
`network-egress` into the same bucket. Auditors and downstream
training filters can't recover which class a record actually was.

**Proposal:**

```ts
// On each ActionEvidence — canonical, source of truth:
side_effect_class: z.enum([
  "read",
  "mutate-local",
  "mutate-external",
  "network-egress",
  "unknown",
]).default("unknown");

// On the top-level AEPRecord — derived summary for training-pipeline
// cohort filters, never authoritative:
run_side_effect_class_max: z.enum([...same values...]).optional();
```

The `unknown` value is the **default** when a tool descriptor doesn't
type itself — paired with a runtime policy rule that downgrades
unknown side effects to require explicit approval at one level higher
than they would have if declared. (See [Gap 4](#gap-4--approval_mode-replacing-the-single-ask_user-enum).)

`state_changing: boolean` stays for back-compat — derived from
`side_effect_class != "read"`.

`run_side_effect_class_max` is explicitly documented as derived from
`actions[].side_effect_class` (the max under the read < mutate-local
< mutate-external < network-egress < unknown ordering). Consumers
that recompute it must get the same value, or the record is
inconsistent. The summary is for cheap filtering; the actions remain
the truth.

## Gap 2 — `state_digest_kind` + `state_digest_coverage`

**Today:** `pre_state_digest` / `post_state_digest` are bare hex strings.

**Problem:** A consumer reading the record can't tell whether the
digest covers a git tree, a sandbox snapshot, a database row set, or
a browser DOM. **The digest can stay opaque; the coverage cannot.**
Two digests of different kinds, or of the same kind over different
scopes, are not comparable — so the record must say which kind and
over what scope.

**Design principle for coverage descriptors:**

> A digest without a coverage descriptor is not a comparable digest.
> A coverage descriptor without an identity-vs-selector split is not
> a portable coverage descriptor.

Each coverage descriptor classifies its fields into four buckets,
enforced at the type level (intersection types per kind) even though
the serialized JSON stays flat:

| Bucket | What it does | Used for equality? |
|---|---|---|
| **identity** | The "what" being observed; stable, joinable across runs (e.g., `database_id`, `base_image_digest`, `origin`+`route_path`+`frame_path`, `namespace`) | **yes — required** |
| **selector** | The subset chosen (predicates, excludes, selector roots) | yes — must match for equality |
| **boundary** | Comparability frame: snapshot id, capture phase, normalization policy | yes — when present, must match |
| **observational** | Diagnostic only (row count, file count, timestamps) | **never sufficient for equality** |

TypeScript shape (every per-kind coverage type is a 4-way intersection):

```ts
type DbRowsetCoverage =
  CoverageIdentity<"db-rowset"> &
  CoverageSelector<"db-rowset"> &
  CoverageBoundary<"db-rowset"> &
  CoverageObservational;
```

`git-tree` is the gold standard — fully recomputable by an
independent reader given the same identity + selector. Other kinds
are graded by how close they get under the three-state acceptance
criterion above.

**Proposal:** Two paired fields. The kind is a closed enum:

```ts
state_digest_kind: z.enum([
  "git-tree",
  "sandbox-fs",
  "db-rowset",
  "browser-dom",
  "kv-snapshot",
  "memory-bag",
  "other",
]);
```

The coverage descriptor's shape varies by kind:

#### `git-tree` (gold standard)

```ts
{
  // identity
  tree_sha: string,            // git tree SHA
  tree_root: string,           // "/" or a subpath
  // selector
  path_predicate?: string,     // e.g. "src/**"
  excludes?: string[],         // e.g. [".git/**", "node_modules/**"]
}
```

Boring on purpose — easiest kind for any consumer to recompute
independently. Equality: tuple of all four fields.

#### `db-rowset`

```ts
{
  // identity (all required)
  database_id: string,         // operator-stable label, NOT a connection string
  schema: string,
  table: string,
  // selector
  rows_predicate: string,      // SQL or DSL
  query_hash: string,          // sha256(canonicalized predicate)
  // boundary
  snapshot_id?: string,        // Postgres txid / Oracle SCN / SQL Server LSN / ...
  isolation_level?: string,    // "snapshot" | "repeatable-read" | ...
  // observational
  row_count?: number,          // diagnostic only
}
```

**Equality rule:** two `db-rowset` digests are `comparable` iff
`(database_id, schema, table, query_hash, snapshot_id)` all match.
With `snapshot_id` missing, they are `comparable under "best-effort,
no snapshot"` named weaker assumption. The hex digest alone is never
the authority on equality; the identity tuple is.

Raw connection strings are explicitly **not** used — they include
credentials, vary across replicas, and break equality under failover.

#### `sandbox-fs`

```ts
{
  // identity
  base_image_digest: string,   // OCI digest of sandbox base
  sandbox_root: string,
  // selector
  include_predicates: string[],
  exclude_predicates: string[],
  // boundary (silent-disagreement closers)
  symlink_policy: "follow" | "no-follow" | "reject",
  generated_paths_included: boolean,
  // observational
  file_count?: number,
}
```

Equality: identity + selector + boundary tuple. The base image +
symlink policy + generated-paths inclusion are what prevent two
runtimes producing different digests over byte-identical trees.

#### `browser-dom`

```ts
{
  // identity
  origin: string,              // https://example.com
  route_path: string,          // templated: "/orders/{id}/lines"
  frame_path: string,          // /iframe[0]/iframe[1]/...
  // selector
  selector_root: string,
  selector_kind: "css" | "xpath",
  // boundary (timing changes meaning of same selector)
  capture_phase: "pre-hydration" | "post-hydration" | "post-tool-result" | "stable-quiescent",
  capture_timestamp_ms?: number,
  attribute_order: "as-is" | "sorted-asc",
  whitespace_policy: "preserve" | "collapse",
  // observational
  text_only?: boolean,
}
```

`route_path` uses templated form (`/orders/{id}/lines`, not
`/orders/12345/lines`) so two requests to the same app route under
different IDs join naturally; the specific ID stays in the digest
itself. Fixes the same-origin-different-route failure mode (an agent
that approves on `/orders/{id}/draft` and drifts to executing on
`/orders/{id}/posted` — same origin, same frame, but different
application contract).

#### `memory-bag` and `kv-snapshot`

```ts
{
  // identity
  namespace: string,
  // selector
  key_predicate?: string,
  // boundary (one required if backend supports it)
  vector_clock?: Record<string, number>,
  read_timestamp_ms?: number,
  generation_number?: number,
  // boundary — lifecycle
  redaction_profile: string,   // references AEPRecord.redaction_profile
  durable_vs_scratch: "durable-only" | "scratch-only" | "mixed",
  // boundary — mixed-mode partition (REQUIRED when durable_vs_scratch === "mixed")
  mixed_partition?: {
    durable_keys: string[],    // or a predicate
    scratch_keys: string[],    // or a predicate
    audit_equality_basis: "durable-only" | "none",
  },
}
```

Mixed-mode digests stay emittable (some runtimes can't cleanly
partition without breaking observation) but are audit-comparable
**only when `mixed_partition` is present**. Without it, a mixed
digest is `not comparable`. With `audit_equality_basis:
"durable-only"`, it's `comparable under named weaker assumption
"durable subset only"` — maps directly onto the three-state criterion.

The `durable_vs_scratch` field is the explicit fix for "operator
intent vs ephemeral scratch state" getting pooled.

#### `kv-snapshot`

Same shape as `memory-bag`; the difference is operator-managed vs
agent-internal namespace conventions.

#### `other`

Escape hatch for kinds not yet in the enum, with explicit
self-policing:

```ts
{
  other_kind: string,                // DNS-style namespace, e.g. "com.example.timestream"
  coverage_schema_version: string,   // vendor-managed semver
  description: string,               // human-readable
}
```

Namespaced `other_kind` means two vendors picking the same word
don't collide. The `coverage_schema_version` lets consumers decide
comparable / not-comparable / weaker-assumption for vendor-specific
kinds without first-class support. Self-policing rule: any namespace
hitting >5% of records is a graduation signal — promote into the
closed enum on the next minor bump.

---

Both `pre_state_digest` and `post_state_digest` share a single
`state_digest_kind` and `state_digest_coverage` per action (they
represent the same scope before vs after).

Validation rule: any record carrying a `pre_state_digest` or
`post_state_digest` **must** also carry both `state_digest_kind` and
`state_digest_coverage`. Records that can't supply the coverage
descriptor must omit the digest entirely. This forces interpretable
state evidence; opaque digest fields without scope are disallowed.

## Gap 3 — `argument_drift` with one-action-per-record semantics

**Today:** `approval_context_hash` is a hash carried on the record;
the runtime checks at gate time that the call matches. If a call
gets blocked because of drift, the gate denies — but the persisted
record reveals it only implicitly.

**Problem:** A historical reader can't distinguish three cases:
(a) call matched approved args exactly,
(b) gate didn't run drift check,
(c) drift was detected and a re-approval happened.

**Proposal:** Add a typed field:

```ts
argument_drift: z.object({
  detected: z.boolean(),
  approved_args_digest: z.string(),
  observed_args_digest: z.string(),
  resolution: z.enum(["matched", "denied"]),
}).optional();
```

**One-action-per-record rule:** each `ActionEvidence` describes
exactly one semantic event. A drift that's denied produces a
self-contained record (`decision: "deny"`, `argument_drift.resolution:
"denied"`). If a re-approval then executes a changed payload, that's
a **separate** new `ActionEvidence` whose `parent_action_id` points
back to the denied one. The new record has its own
`approval_context_hash` matching the re-approval, and no
`argument_drift` field — it isn't drifting from anything.

This keeps each record describing one outcome (the denial OR the
later success, not both), and reuses the existing `parent_action_id`
linkage for the audit trail.

`argument_drift.resolution` is closed to `"matched" | "denied"`. The
old `"re-approved"` value from the v1 draft is removed — that case
is now expressed as two linked records.

## Gap 4 — `approval_mode` + `approval_extension`

**Today:** `decision: enum(["allow", "deny", "ask_user", "dry_run"])`.

**Problem:** `ask_user` collapses at least five behaviourally
distinct production cases (per @armorer-labs review of v0.2):

1. one-shot approval for an exact payload (payload hash bound)
2. bounded lease for a class of actions (TTL + invocation limit + scope)
3. policy allow with receipt (no prompt, but recorded reason)
4. policy deny with evidence (typed reason: identity / args / taint / scope / delegation)
5. re-approval on drift (model changed args after prior approval)

**Proposal:** Keep `decision` as the gross outcome
(`allow | deny | ask_user | dry_run`) and add an orthogonal mode as
a **closed-set enum**. Vendor-specific richness goes in a separate
extension object, NOT as `other-*` enum values — the closed enum is
what makes policy and audit queries stable across runtimes.

```ts
approval_mode: z.enum([
  "one-shot-payload",
  "bounded-lease",
  "policy-allow-with-receipt",
  "policy-deny-with-evidence",
  "re-approval-on-drift",
  "none",
]).default("none");

// Vendor / runtime-specific richness — does NOT widen the enum:
approval_extension: z.object({
  namespace: z.string(),     // vendor identifier, e.g. "armorer-labs"
  mode: z.string(),          // vendor-specific mode within that namespace
  evidence_digest: z.string(),
}).optional();

deny_reason_class: z.enum([
  "tool-identity",
  "argument",
  "tainted-input",
  "resource-scope",
  "missing-delegation",
  "policy-rule",
  "other",
]).optional();    // only when decision == "deny"
```

The `bounded-lease` mode is the value the v0.2 → v0.3 transition is
really about: it's the case that currently has no enforceable
representation between `deny` and `ask_user`, leading to either
rubber-stamp prompting or over-broad session-level approval.

A sixth mode that two runtimes agree on graduates from
`approval_extension` into the enum on the next minor bump. Until
then, vendor-specific modes don't fragment the policy query surface.

## Gap 5 (deferred to v0.4) — `decision_envelope` as projection first

**Today:** Authority decision context is assembled across several
top-level fields: `decision`, `subject`, `resource`, `capability`,
`policy_bundle_digest`, `scope_lease_id`, `approval_context_hash`,
`result_digest`.

**Proposal:** Defer the nested record to v0.4. In v0.3, define the
envelope as a **documented projection** over the v0.3 top-level
fields, with golden-test fixtures showing the materialization.
Consumers that want a stable nested shape can materialize it from
the top-level data; the schema does not yet commit to a nested
record name or shape.

Projection target (subject to revision before v0.4 normative):

```ts
// Materialized by consumer code, not stored:
{
  action_class: <derived from side_effect_class>,
  target_boundary: <derived from state_digest_coverage>,
  principal_id: <subject>,
  run_id: <top-level run_id>,
  policy_id: <derived from policy_bundle_digest>,
  policy_version: <derived from policy_bundle_digest>,
  approval_or_lease_id: <approval_context_hash || scope_lease_id>,
  payload_digest: <argument_drift.observed_args_digest || tool_input_digest>,
  expiry_ms: <derived from scope lease, if any>,
  observed_result: <"success" | "failure" | "denied" | "skipped">,
}
```

Promotion criterion: v0.4 promotes the projection to a normative
`decision_envelope` nested record once **two** independent runtimes
emit data whose projection materializes identically on the same
inputs. Until that empirical bar is met, the projection stays in
docs + tests and the schema stays flat.

This avoids the naming-debate trap on a field that no consumer is
yet blocked on.

## Migration

- v0.2 records parse fine under v0.3 schema (all new fields optional
  or have defaults).
- v0.3 emitters write `schema_version: "aep/v0.3"`.
- Validators support both (`Literal["aep/v0.1", "aep/v0.2", "aep/v0.3"]`).
- The 12-dimension trust score already handles missing optional
  dimensions — no change needed there.
- `state_changing: boolean` stays for v0.2 reader back-compat, even
  though v0.3 emitters write both that field and the new
  `side_effect_class`.

## Non-goals

- No change to the Ed25519 signature contract from v0.2.
- No change to delegation chain fields (`parent_action_id`,
  `causal_chain_id`, `scope_lease_id`) — those already match
  @armorer-labs's "delegation chain" item.
- No change to `@wasmagent/otel-exporter` mappings — the
  existing AEP↔OTel bridge stays.
- No nested `decision_envelope` record in v0.3 (deferred — see Gap 5).
- No `other-*` escape hatch in the `approval_mode` enum (use
  `approval_extension` instead).

## Open questions (post-second-review)

The four questions in the original draft are resolved by the
second-round review. Remaining:

1. The per-kind `state_digest_coverage` shapes above are first
   drafts. Implementers who've handled a kind in production (db-rowset
   especially: cross-database joins? sharded tables?) — please
   comment with concrete examples that break the proposed shape.
2. Should `argument_drift` be allowed on `decision: "allow"` records
   (i.e., recording that drift was checked and matched), or only on
   `deny`? Default is to allow on both, but it inflates the
   common-path record.
3. The `run_side_effect_class_max` summary uses
   `read < mutate-local < mutate-external < network-egress < unknown`
   as the ordering. Is `unknown` correctly at the top
   ("treat-as-most-severe") rather than separate from the ordering?

## Related

- @armorer-labs review (motivation): [WasmAgent/wasmagent#1](https://github.com/WasmAgent/wasmagent/discussions/1) and [#2](https://github.com/WasmAgent/wasmagent/discussions/2)
- @armorer-labs second-round review on this RFC: [#7 comment](https://github.com/WasmAgent/wasmagent-js/issues/7#issuecomment-4808624245)
- Current schema: [`packages/aep/src/types.ts`](https://github.com/WasmAgent/wasmagent-js/blob/main/packages/aep/src/types.ts)
- Cross-repo contract: [`docs/aep-contract.md`](https://github.com/WasmAgent/wasmagent-js/blob/main/docs/aep-contract.md)

---

Feedback welcome from anyone running AEP-equivalent audit records in
production, particularly on the per-kind `state_digest_coverage`
shapes and the `decision_envelope` projection target shape.




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

RFC: aep/v0.3 — side-effect class, state-digest kind, drift evidence, approval mode #7

Design principle (the only litmus test for fields)

Acceptance criterion — three-state comparability test

Summary

Gap 1 — `side_effect_class` beyond a boolean

Gap 2 — `state_digest_kind` + `state_digest_coverage`

`git-tree` (gold standard)

`db-rowset`

`sandbox-fs`

`browser-dom`

`memory-bag` and `kv-snapshot`

`kv-snapshot`

`other`

Gap 3 — `argument_drift` with one-action-per-record semantics

Gap 4 — `approval_mode` + `approval_extension`

Gap 5 (deferred to v0.4) — `decision_envelope` as projection first

Migration

Non-goals

Open questions (post-second-review)

Related

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Bucket	What it does	Used for equality?
identity	The "what" being observed; stable, joinable across runs (e.g., `database_id`, `base_image_digest`, `origin`+`route_path`+`frame_path`, `namespace`)	yes — required
selector	The subset chosen (predicates, excludes, selector roots)	yes — must match for equality
boundary	Comparability frame: snapshot id, capture phase, normalization policy	yes — when present, must match
observational	Diagnostic only (row count, file count, timestamps)	never sufficient for equality

Uh oh!

RFC: aep/v0.3 — side-effect class, state-digest kind, drift evidence, approval mode #7

Description

Design principle (the only litmus test for fields)

Acceptance criterion — three-state comparability test

Summary

Gap 1 — side_effect_class beyond a boolean

Gap 2 — state_digest_kind + state_digest_coverage

git-tree (gold standard)

db-rowset

sandbox-fs

browser-dom

memory-bag and kv-snapshot

kv-snapshot

other

Gap 3 — argument_drift with one-action-per-record semantics

Gap 4 — approval_mode + approval_extension

Gap 5 (deferred to v0.4) — decision_envelope as projection first

Migration

Non-goals

Open questions (post-second-review)

Related

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Gap 1 — `side_effect_class` beyond a boolean

Gap 2 — `state_digest_kind` + `state_digest_coverage`

`git-tree` (gold standard)

`db-rowset`

`sandbox-fs`

`browser-dom`

`memory-bag` and `kv-snapshot`

`kv-snapshot`

`other`

Gap 3 — `argument_drift` with one-action-per-record semantics

Gap 4 — `approval_mode` + `approval_extension`

Gap 5 (deferred to v0.4) — `decision_envelope` as projection first