Skip to content

SI-2 P1–P3 planning: Category-E contract investigation, fixture matrix, test gates#26

Merged
EsatanGW merged 7 commits into
mainfrom
claude/ecc-si2-p1-investigate
May 31, 2026
Merged

SI-2 P1–P3 planning: Category-E contract investigation, fixture matrix, test gates#26
EsatanGW merged 7 commits into
mainfrom
claude/ecc-si2-p1-investigate

Conversation

@EsatanGW
Copy link
Copy Markdown
Owner

Summary

This PR completes the planning phases (P1 Investigation, P2 Plan-lock, P3 Test plan) for SI-2 Category-E implementation. It adds three evidence documents that surface the §7.1 contract requirements, per-runtime fixture/wrapper coverage, and executable verification gates. The manifest is updated to reflect maintainer dispositions on O4/O6/O8/O9 and to extend the residual-risk record with the three canonical sites that must be corrected when the S25 state-file producer lands.

Scope classification

  • Full — multiple surfaces (evidence documents, manifest updates, ROADMAP clarifications), new planning artifacts, and explicit maintainer decisions on open questions.

Surfaces touched

  • Information surface — three new evidence documents enumerate contract requirements, fixture coverage matrix, and test gates; manifest assumptions clarified with maintainer dispositions.
  • Operational surface — ROADMAP row updated to reflect P1–P3 completion and locked decisions; residual-risks extended to name the three canonical sites requiring future correction.

Rationale:

  • Information surface: The P1/P2/P3 planning documents are read-only Planner artifacts that surface what implementation must honor (contract shape, fixture scope, test gates) without making canonical edits. They feed the maintainer's decision gates and the P4 Implementer's execution plan.
  • Operational surface: The manifest and ROADMAP are updated to record the maintainer's O4/O6/O8/O9 dispositions and to clarify the scope of SI-2 (fail-safe-empty reader, real-captured fixtures only, validator deferred). The residual-risk entry is extended to name all three canonical sites that become stale under the keep-G4 decision and must be corrected together in the producer-landing slice.

Source of Truth impact

  • SoT files: ROADMAP.md (SI-2 row updated with P1–P3 completion and maintainer dispositions), .github/scripts/change-manifest.ecc-pr2-si2-category-e-implementation.yaml (manifest updated with P1–P3 evidence, O9 disposition confirmed, residual-risks extended).
  • New evidence files (non-canonical, read-only): evidence/2026-05-31-ecc-pr2-si2-category-e-implementation/p1-investigation.md, p2-fixture-wrapper-matrix.md, p3-test-plan.md.
  • No canonical edits: Per SI-2 scope guard, no changes to docs/, AGENTS.md, schemas, or skills. The three canonical sites that require future correction (§Category E ~L136, §Requirements-on-runtime-bridges item 6 ~L416, §1.9.2 ~L192) are identified in the residual-risks entry but not edited in this PR.

Evidence

P1 Investigation (p1-investigation.md):

  • Enumerates the §7.1 Category-E surface contract (6 triggers, 7 whitelisted fields, field rules, exit codes, core/wrapper split).
  • Surfaces the proposed 9th failure mode (§1.9.7 orientation-payload context injection) and recommends O8 = accept the concept + author the prose with an explicit distinctness clause.
  • Identifies the S26 fixture capture requirement (real-runtime payloads only, no hand-written ABI fixtures) and the O6 feasibility input (sandbox/donations needed).
  • Verifies O9 punt is correct against the §7.2 validator-landing trigger (no real consumer file in SI-2).
  • KEY FINDING (§5): The shipped contract promises SI-2 will specify the S25 state-file producer, but SI-2 scope defers it (G4). The "specified in SI-2" phrase appears at three canonical sites that all become stale and must be corrected together in the producer-landing slice.

P2 Plan-lock (p2-fixture-wrapper-matrix.md):

  • Locks the per-runtime S26 fixture matrix from `

https://claude.ai/code/session_01CddK7uiWupUh6ydxPJYvpP

claude added 7 commits May 31, 2026 03:08
…h mode, S26 shape, O9 trigger

Maintainer authorized SI-2 P1 ("continue with the next task"). P1 is
Planner-phase, read-only — no edits to canonical files (docs/, AGENTS.md,
schemas, skills). Deliverables: the investigation evidence note + the
ROADMAP P1 row flip (⏸ paused → ✅ passed).

Investigated the four P1 sources and recorded findings in
evidence/2026-05-31-ecc-pr2-si2-category-e-implementation/p1-investigation.md:

1. orient.sh §7.1 surface contract — fully enumerated: 6 abstract triggers,
   7 whitelisted fields, the orientation JSON shape, field rules
   (nullable/typed/repo-relative/core-sanitized), the hook_state_flags
   grammar (closed agent-protocol/ set + bounded values + 96-char/charset
   caps), 0-or-TOOL_ERROR exit semantics, and the deterministic-core +
   per-runtime-wrapper split. Enough to implement O4; no contract
   ambiguity except finding 5.
2. Proposed §1.9.7 "orientation-payload context injection" — exists only
   as a NAME in the closed SI-1 manifest (no prose drafted); additive atop
   §1.9.1–§1.9.6. Overlaps §1.9.2 (poisoned state) / §1.9.4 (branch-name
   injection) / §1.9.5 (output-channel confusion), so O8 must author the
   prose with an explicit distinctness clause.
3. S26 fixture pack — shape clear (claude-code 4 + codex 3 + gemini-legacy 2
   if S18 smoke; Cursor/Windsurf no-op; Antigravity docs-only), but each
   fixture must be CAPTURED from a real runtime (not hand-written), so O6
   needs sandbox access or maintainer fixture donations.
4. §7.2 validator-landing trigger — verbatim "lands together with the first
   real consumer file". SI-2 ships no real x-agent-protocol consumer →
   trigger not met → O9 PUNT is verified correct.

KEY CROSS-CUTTING FINDING (P2/O4 input): §7.1 (line ~136) states the
state-file producer is "specified in SI-2", but the G4 decision already in
the manifest/ROADMAP defers the S25 producer OUT of SI-2 (orient.sh is a
fail-safe-empty reader). The shipped canonical contract therefore promises
something SI-2 will not deliver. Maintainer must resolve at O4 — recommend
keep G4 + schedule a later-slice contract follow-up + extend
residual_risks[0] to name the line-136 claim.

P1 disposition recommendations (for the maintainer at the P2 gate): O4
approve-as-reader + resolve finding 5; O6 approve only if real-capture
feasible; O8 accept-concept + author distinct §1.9.7; O9 punt (verified).
No implementation, no canonical edits, no decisions resolved here.

Verification: check-internal-links exit 0; validate-change-manifests (root)
exit 0; generate-section-index --check exit 0; make verify-docs / verify-local
exit 0. Changed: ROADMAP.md (P1 row) + new evidence note.
…roducer-tension finding to all 3 canonical sites (L2)

Addresses the two fix-first findings from the maintainer's P1 review,
so the P1 gate is not carried into P2 as a weak record.

L1 (ROADMAP P1 row was non-repeatable):
- Gate column: replaced the bare "Read-only; no edits to canonical files"
  with exact repeatable checks — `git show --stat a8d603a` changes only
  ROADMAP.md + the P1 evidence note (no docs/ / AGENTS.md / schemas/ /
  skills/ path ⇒ zero canonical edits); check-internal-links.py exit 0;
  make verify-docs exit 0; the evidence note enumerates all four P1
  sources. (Verified in-commit: a8d603a touches exactly those 2 files,
  0 canonical-path edits.)
- Artifact column: now names the actual P1 artifact (the evidence note +
  this ROADMAP row), not "Manifest assumptions + sot_map".
- Commit column: concrete SHA `a8d603a` (P1 investigation commit) +
  branch, replacing the bare "_(P1 investigate branch)_" placeholder.

L2 (producer-tension finding was too narrow): the §5 finding + the
ROADMAP P1 Notes told P2 to correct only the §7.1 line-~136 claim, but
the same "specified in SI-2" promise appears in THREE canonical places:
docs/runtime-hook-contract.md §Category E (~L136), the same file's
§Requirements-on-runtime-bridges item 6 "Category E state-source
ownership" (~L416), and docs/runtime-hook-threat-model.md §1.9.2
"Poisoned state file" (~L192). All three must be corrected together when
the producer lands; fixing only one leaves stale canonical claims.
Broadened the evidence note §5 (opening finding, option (b), and the O4
summary-table row) and the ROADMAP P1 Notes to name all three sites.

Still read-only: no canonical-file edits in this commit either (only
ROADMAP.md + the P1 evidence note). Verification: check-internal-links
exit 0; validate-change-manifests (root) exit 0; generate-section-index
--check exit 0; make verify-docs / verify-local exit 0.

Maintainer dispositions O4/O6/O8/O9 received; recorded into the P2
plan-lock in the following commit.
Maintainer dispositioned all four open decisions in the P1 review; this
commit records them into the planning artifacts (ROADMAP decisions list +
SI-2 manifest). No canonical-file edits, no implementation — Planner-phase
plan-lock only.

O4 (orient.sh): approve the fail-safe-empty reader; KEEP G4 (do NOT pull
S25 back into SI-2). Per the maintainer, the producer-tension record must
not be framed as only a line-~136 issue — residual_risks[0] now names all
THREE canonical sites carrying the stale "specified in SI-2" promise
(runtime-hook-contract.md §Category E ~L136 + §Requirements-on-runtime-
bridges item 6 ~L416 + runtime-hook-threat-model.md §1.9.2 ~L192) and the
planned correction path (corrected together, to "specified in a later SI
slice", in the slice that lands the producer; SI-2 makes no canonical edit).

O6 (S26 fixtures): approve REAL captured fixtures only — no hand-written
ABI fixtures. If no runtime sandbox / donated fixture exists for a
(runtime, trigger) pair, narrow b3 to capturable pairs and explicitly
defer the rest. Recorded in the S26 assumption's validation_plan. The
actual per-runtime matrix lock is the remaining P2 step (conditional on
capture feasibility) — surfaced to the maintainer.

O8 (9th mode): accept the concept; §1.9.7 prose MUST carry an explicit
"distinct from §1.9.2 / §1.9.4 / §1.9.5 because…" clause. Recorded in the
b2 scope + the ROADMAP decisions list.

O9 (validator): PUNT — confirmed. SI-2 ships no real x-agent-protocol
consumer file, so the §7.2 validator-landing trigger is not met. No b5
slice; SI-2 stays at 4 task_slices (b1-b4). The O9 assumption +
validation_plan now record this as DECIDED, not recommended.

ROADMAP: the O4/O6/O8 (line 80) and O9 (line 81) decision entries flip
from "pending" to dispositioned; the SI-2 P2 row flips ⏸ paused → ⏳
in_progress with the remaining P2 step named (per-runtime fixture/wrapper
matrix lock vs runtime-surface-deviations.md, conditional on O6 capture
feasibility). P4 still needs implementation authorization + a distinct
Implementer identity.

Verification (subprocess return codes; YAML re-validated after an
inner-quote fix — a first attempt put unescaped double-quotes inside the
double-quoted residual_risks scalar and the schema gate caught it):
- validate-change-manifests.py --manifest <SI-2>: exit 0 (b2 scope
  795/800)
- validate-change-manifests.py (root): exit 0
- generate-section-index.py --check: exit 0
- make verify-docs / verify-local: exit 0
- full validator agent_protocol_validate on SI-2: only the F1-waived
  sot.source_file_missing remains
- O9=punt structurally confirmed: 4 task_slices, no b5
- changed files: 2 (SI-2 manifest, ROADMAP.md)
…rom runtime-surface-deviations.md §3 (capture = P4 gate)

Per the maintainer's Option-1 disposition: lock the supported (runtime,
trigger) matrix now as P2 plan input from the SoT (docs/runtime-surface-
deviations.md §3), with fixture capture as a P4 EXECUTION GATE — not a
P2 claim that every supported pair will be fixtured.

New artifact: evidence/2026-05-31-ecc-pr2-si2-category-e-implementation/
p2-fixture-wrapper-matrix.md. For each (runtime, trigger) pair it records
the §3 classification (wrapper-only / unsupported / no-op / docs-only /
n/a / command), capture_required, and the P4 disposition (captured
fixture OR explicit deferral — never hand-written, per O6).

Locked from §3 (column order Claude Code / Codex / Cursor / Gemini CLI
legacy / Antigravity / Windsurf):
- b3 capture-required candidates: claude-code {session-start, session-end,
  pre-compact, post-compact}; codex {session-start, pre-compact,
  post-compact, turn-stop(JSON)}; gemini-cli-legacy {4 lifecycle} IFF S18
  smoke confirms. Excluded: codex session-end (unsupported), Cursor/
  Windsurf lifecycle (no-op), Antigravity (docs-only), all manual /orient
  (command — exercised by b1 --selftest, not a b3 fixture).
- b1 wrapper coverage: Claude Code + Codex ship; gemini S18-smoke-gated;
  Cursor/Windsurf no-op (no wrapper); Antigravity docs-only (no wrapper).

Two items explicitly carried to P4 capture (recorded in the note §C):
- D1: §3 classifies codex session-start wrapper-only, but the S26 README
  planned pack omitted it. §3 is the SoT and is locked; P4 capture must
  confirm Codex emits session-start or record an explicit deferral +
  reconcile the README.
- S18: the entire gemini column (fixtures + wrapper) is smoke-gated; if
  S18 is not confirmed at P4, gemini is deferred explicitly.

Manifest: the S26-fixture and wrapper-coverage assumptions' validation_plans
now point at the locked matrix note and state capture is a P4 gate (S26
validation_plan rewritten concisely to 569/800 after it accumulated over
the 800 cap). ROADMAP SI-2 P2 row flipped ⏳ in_progress -> ✅ passed; its
Gate column sharpened to a repeatable check (validate-change-manifests.py
--manifest exit 0; 4 task_slices, no b5 = O9 punt reflected; matrix locked
in the note) per the L1 auditability lesson. last_updated set to real
clock time (last_updated <= commit time).

Verification (subprocess return codes):
- validate-change-manifests.py --manifest <SI-2>: exit 0
- validate-change-manifests.py (root): exit 0
- check-internal-links.py: exit 0 (matrix-note + ROADMAP pointers resolve)
- generate-section-index.py --check: exit 0
- make verify-docs / verify-local: exit 0
- full validator agent_protocol_validate on SI-2: only F1-waived
  sot.source_file_missing remains
- SI-2 stays at 4 task_slices (O9 punt); P2 row = ✅ passed
- changed: SI-2 manifest, ROADMAP.md, new P2 matrix evidence note
…e handoff/blocker text (M2) + make P2 row auditable (M3)

Three fix-first findings from the maintainer's review of 3575e33.

M1 — b3 gate contradicted the captured-or-deferred model:
- The b3 scope/review_plan + check-source-verification-flipped.sh
  UNCONDITIONALLY required real fixtures for every pair and S26/S27
  flipped off pending. Under P2's "captured OR explicit deferral" model
  a deferred pair would either fail the gate or force false evidence.
  Reworked b3 to the consistency model: capture the capturable pairs,
  record an explicit deferral for the rest (no hand-written fixtures),
  and update S26/S27 (pipe rows + §2.6.f/§2.6.g detail) to the
  captured-or-deferred reality — confirmed ONLY for captured pairs,
  deferred-with-reason for the rest, never falsely confirmed. The
  planned check-source-verification-flipped.sh now asserts the S26/S27
  ledger is CONSISTENT with the committed fixtures + deferral records
  (not "all off pending"). b3 scope 770/800.
- The surfaces_touched note claimed S26 fixtures cover "all six
  lifecycle triggers including manual"; manual is a user-invoked
  /orient command (no runtime event ⇒ no S26 fixture; exercised by b1
  --selftest). Corrected to the five runtime-event triggers per the P2
  matrix, manual explicitly excluded.

M2 — stale text after the ed82482 dispositions:
- manifest handoff_narrative said O4/O6/O8/O9 "all are pending" → now
  records them DISPOSITIONED (O4 keep-G4 reader + 3-site tension; O6
  real-captured-only; O8 accept + §1.9.7 distinctness; O9 punt) with
  the real remaining gates (P3/P4 authorization, P4 capture feasibility,
  distinct Implementer identity).
- ROADMAP P4 row said "Blocked on O4/O6/O8/O9 disposition" → now
  "dispositioned at P2; P4 gated on implementation authorization +
  capture feasibility + distinct Implementer identity."

M3 — P2 ROADMAP row auditability (ROADMAP §34 rule):
- Commit column: placeholder → concrete SHAs `ed82482` (dispositions)
  + `3575e33` (matrix lock) + review-fix follow-ups on the branch.
- Artifact column: added the P2 matrix note.

Verification (subprocess return codes):
- validate-change-manifests.py --manifest <SI-2>: exit 0
- validate-change-manifests.py (root): exit 0
- check-internal-links.py / generate-section-index.py --check: exit 0
- make verify-docs / verify-local: exit 0
- full validator agent_protocol_validate on SI-2: only F1-waived
  sot.source_file_missing remains
- residual stale-text scans = 0 ("all are pending", S26-covers-manual,
  P4 "Blocked on O4", b3 unconditional off-pending)
- changed: SI-2 manifest, ROADMAP.md
…ip P3 row ✅ (last Planner phase)

Maintainer reviewed a0481a1 (no findings) and said continue. P3 is the
last Planner-phase task — read-only, no canonical edits, no implementation.

Deliverable: evidence/2026-05-31-ecc-pr2-si2-category-e-implementation/
p3-test-plan.md — the executable verification matrix, consolidating the
gates already declared in the task_slice verification_commands into a
completeness-checked plan:

- Per-slice gates (P4): b1 = make verify-hooks-shellcheck + scripts/orient.sh
  --selftest (synthetic deterministic-core: CLI shape, fail-safe-empty,
  turn-stop JSON-only, §7.1 7-field whitelist under the 6 triggers);
  b2 = check-internal-links + generate-section-index --check +
  check-threat-model-additive.sh + grep §1.9.7 present; b3 = check-paths-exist
  + validate-runtime-event-fixtures.py --check + check-source-verification-
  flipped.sh (ledger consistent with captured-or-deferred); b4 =
  verify-hooks-shellcheck + anchored grep orient.sh in Makefile AND
  validate.yml + generate-changelog-json.py --check.
- Cross-cutting (P4→P5→P6): validate-change-manifests.py --manifest;
  make verify-docs / verify-local; check-role-consistency.py; full
  agent_protocol_validate.
- Completeness: 4/4 task_slices have a verification_command; 4/4
  evidence_plan entries map to ≥1 gate (system_interface→b1, process→b2,
  operational→b3+b4); no SI-2 surface ungated.

Forward-obligations P3 surfaced for P4 (recorded in the note + the P3
ROADMAP Notes):
- Create the 3 planned helper scripts or their gates fail.
- REMOVE the now-moot F1 waiver once scripts/orient.sh lands — the
  sot.source_file_missing finding disappears, so a live waiver for a
  satisfied condition would be stale evidence. (A genuine gate-lifecycle
  obligation the test plan must hand to the implementer.)
- Resolve capture feasibility / gemini S18 smoke / codex-session-start
  §3-vs-README at P4 execution.

ROADMAP P3 row flipped ⏸ paused -> ✅ passed, made auditable from the
start (per the L1/M3 lessons): exact repeatable Gate checks, the
test-plan note as Artifact, _(this change)_ commit per the repo's
self-reference convention.

Verification (subprocess return codes):
- validate-change-manifests.py --manifest <SI-2> + root: exit 0
- check-internal-links.py: exit 0 (P3 note + ROADMAP pointers resolve)
- generate-section-index.py --check: exit 0
- make verify-docs / verify-local: exit 0
- changed: ROADMAP.md + new P3 test-plan note (no manifest edit, no
  canonical edit)
…at P3 is passed

Fix-first finding from the maintainer's review of c3e6b2a. When P3 was
flipped to ✅ passed (ROADMAP + the new p3-test-plan.md), the manifest
handoff_narrative was left stale: it still listed "explicit P3/P4
authorization" among the remaining gates, implying P3 still needs
authorization. That would mislead the next session / Implementer.

This is the same cross-cutting-consumer drift I committed to avoiding —
when a phase state changes, all its downstream consumers must move in the
same commit. I flipped the P3 row but did not re-sweep the handoff. Caught
on review; fixed now and re-swept (the only stale P3-as-pending-gate
reference was this one — residual scans: "P3/P4 authorization" 0,
"explicit P3" 0).

Change: handoff "Remaining gates: explicit P3/P4 authorization, P4
fixture-capture feasibility, and a distinct Implementer identity" ->
"Remaining gates: explicit P4 implementation authorization, P4
fixture-capture feasibility, and a distinct Implementer identity". This
matches ROADMAP P3 (✅ passed), ROADMAP P4 (gated on P4 authorization +
capture feasibility + distinct Implementer identity), and the P3 note.
Bumped last_updated 12:01:19 -> 12:52:59 +08:00 (real clock; <= commit time).

Verification (subprocess return codes):
- validate-change-manifests.py --manifest <SI-2> + root: exit 0
- check-internal-links.py / generate-section-index.py --check: exit 0
- make verify-docs / verify-local: exit 0
- full validator agent_protocol_validate on SI-2: only F1-waived
  sot.source_file_missing remains
- changed: SI-2 manifest only (handoff line + last_updated)
@EsatanGW EsatanGW merged commit ba90e68 into main May 31, 2026
18 checks passed
@EsatanGW EsatanGW deleted the claude/ecc-si2-p1-investigate branch May 31, 2026 13:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants