SI-2 P1–P3 planning: Category-E contract investigation, fixture matrix, test gates#26
Merged
Merged
Conversation
…h mode, S26 shape, O9 trigger
Maintainer authorized SI-2 P1 ("continue with the next task"). P1 is
Planner-phase, read-only — no edits to canonical files (docs/, AGENTS.md,
schemas, skills). Deliverables: the investigation evidence note + the
ROADMAP P1 row flip (⏸ paused → ✅ passed).
Investigated the four P1 sources and recorded findings in
evidence/2026-05-31-ecc-pr2-si2-category-e-implementation/p1-investigation.md:
1. orient.sh §7.1 surface contract — fully enumerated: 6 abstract triggers,
7 whitelisted fields, the orientation JSON shape, field rules
(nullable/typed/repo-relative/core-sanitized), the hook_state_flags
grammar (closed agent-protocol/ set + bounded values + 96-char/charset
caps), 0-or-TOOL_ERROR exit semantics, and the deterministic-core +
per-runtime-wrapper split. Enough to implement O4; no contract
ambiguity except finding 5.
2. Proposed §1.9.7 "orientation-payload context injection" — exists only
as a NAME in the closed SI-1 manifest (no prose drafted); additive atop
§1.9.1–§1.9.6. Overlaps §1.9.2 (poisoned state) / §1.9.4 (branch-name
injection) / §1.9.5 (output-channel confusion), so O8 must author the
prose with an explicit distinctness clause.
3. S26 fixture pack — shape clear (claude-code 4 + codex 3 + gemini-legacy 2
if S18 smoke; Cursor/Windsurf no-op; Antigravity docs-only), but each
fixture must be CAPTURED from a real runtime (not hand-written), so O6
needs sandbox access or maintainer fixture donations.
4. §7.2 validator-landing trigger — verbatim "lands together with the first
real consumer file". SI-2 ships no real x-agent-protocol consumer →
trigger not met → O9 PUNT is verified correct.
KEY CROSS-CUTTING FINDING (P2/O4 input): §7.1 (line ~136) states the
state-file producer is "specified in SI-2", but the G4 decision already in
the manifest/ROADMAP defers the S25 producer OUT of SI-2 (orient.sh is a
fail-safe-empty reader). The shipped canonical contract therefore promises
something SI-2 will not deliver. Maintainer must resolve at O4 — recommend
keep G4 + schedule a later-slice contract follow-up + extend
residual_risks[0] to name the line-136 claim.
P1 disposition recommendations (for the maintainer at the P2 gate): O4
approve-as-reader + resolve finding 5; O6 approve only if real-capture
feasible; O8 accept-concept + author distinct §1.9.7; O9 punt (verified).
No implementation, no canonical edits, no decisions resolved here.
Verification: check-internal-links exit 0; validate-change-manifests (root)
exit 0; generate-section-index --check exit 0; make verify-docs / verify-local
exit 0. Changed: ROADMAP.md (P1 row) + new evidence note.
…roducer-tension finding to all 3 canonical sites (L2) Addresses the two fix-first findings from the maintainer's P1 review, so the P1 gate is not carried into P2 as a weak record. L1 (ROADMAP P1 row was non-repeatable): - Gate column: replaced the bare "Read-only; no edits to canonical files" with exact repeatable checks — `git show --stat a8d603a` changes only ROADMAP.md + the P1 evidence note (no docs/ / AGENTS.md / schemas/ / skills/ path ⇒ zero canonical edits); check-internal-links.py exit 0; make verify-docs exit 0; the evidence note enumerates all four P1 sources. (Verified in-commit: a8d603a touches exactly those 2 files, 0 canonical-path edits.) - Artifact column: now names the actual P1 artifact (the evidence note + this ROADMAP row), not "Manifest assumptions + sot_map". - Commit column: concrete SHA `a8d603a` (P1 investigation commit) + branch, replacing the bare "_(P1 investigate branch)_" placeholder. L2 (producer-tension finding was too narrow): the §5 finding + the ROADMAP P1 Notes told P2 to correct only the §7.1 line-~136 claim, but the same "specified in SI-2" promise appears in THREE canonical places: docs/runtime-hook-contract.md §Category E (~L136), the same file's §Requirements-on-runtime-bridges item 6 "Category E state-source ownership" (~L416), and docs/runtime-hook-threat-model.md §1.9.2 "Poisoned state file" (~L192). All three must be corrected together when the producer lands; fixing only one leaves stale canonical claims. Broadened the evidence note §5 (opening finding, option (b), and the O4 summary-table row) and the ROADMAP P1 Notes to name all three sites. Still read-only: no canonical-file edits in this commit either (only ROADMAP.md + the P1 evidence note). Verification: check-internal-links exit 0; validate-change-manifests (root) exit 0; generate-section-index --check exit 0; make verify-docs / verify-local exit 0. Maintainer dispositions O4/O6/O8/O9 received; recorded into the P2 plan-lock in the following commit.
Maintainer dispositioned all four open decisions in the P1 review; this commit records them into the planning artifacts (ROADMAP decisions list + SI-2 manifest). No canonical-file edits, no implementation — Planner-phase plan-lock only. O4 (orient.sh): approve the fail-safe-empty reader; KEEP G4 (do NOT pull S25 back into SI-2). Per the maintainer, the producer-tension record must not be framed as only a line-~136 issue — residual_risks[0] now names all THREE canonical sites carrying the stale "specified in SI-2" promise (runtime-hook-contract.md §Category E ~L136 + §Requirements-on-runtime- bridges item 6 ~L416 + runtime-hook-threat-model.md §1.9.2 ~L192) and the planned correction path (corrected together, to "specified in a later SI slice", in the slice that lands the producer; SI-2 makes no canonical edit). O6 (S26 fixtures): approve REAL captured fixtures only — no hand-written ABI fixtures. If no runtime sandbox / donated fixture exists for a (runtime, trigger) pair, narrow b3 to capturable pairs and explicitly defer the rest. Recorded in the S26 assumption's validation_plan. The actual per-runtime matrix lock is the remaining P2 step (conditional on capture feasibility) — surfaced to the maintainer. O8 (9th mode): accept the concept; §1.9.7 prose MUST carry an explicit "distinct from §1.9.2 / §1.9.4 / §1.9.5 because…" clause. Recorded in the b2 scope + the ROADMAP decisions list. O9 (validator): PUNT — confirmed. SI-2 ships no real x-agent-protocol consumer file, so the §7.2 validator-landing trigger is not met. No b5 slice; SI-2 stays at 4 task_slices (b1-b4). The O9 assumption + validation_plan now record this as DECIDED, not recommended. ROADMAP: the O4/O6/O8 (line 80) and O9 (line 81) decision entries flip from "pending" to dispositioned; the SI-2 P2 row flips ⏸ paused → ⏳ in_progress with the remaining P2 step named (per-runtime fixture/wrapper matrix lock vs runtime-surface-deviations.md, conditional on O6 capture feasibility). P4 still needs implementation authorization + a distinct Implementer identity. Verification (subprocess return codes; YAML re-validated after an inner-quote fix — a first attempt put unescaped double-quotes inside the double-quoted residual_risks scalar and the schema gate caught it): - validate-change-manifests.py --manifest <SI-2>: exit 0 (b2 scope 795/800) - validate-change-manifests.py (root): exit 0 - generate-section-index.py --check: exit 0 - make verify-docs / verify-local: exit 0 - full validator agent_protocol_validate on SI-2: only the F1-waived sot.source_file_missing remains - O9=punt structurally confirmed: 4 task_slices, no b5 - changed files: 2 (SI-2 manifest, ROADMAP.md)
…rom runtime-surface-deviations.md §3 (capture = P4 gate)
Per the maintainer's Option-1 disposition: lock the supported (runtime,
trigger) matrix now as P2 plan input from the SoT (docs/runtime-surface-
deviations.md §3), with fixture capture as a P4 EXECUTION GATE — not a
P2 claim that every supported pair will be fixtured.
New artifact: evidence/2026-05-31-ecc-pr2-si2-category-e-implementation/
p2-fixture-wrapper-matrix.md. For each (runtime, trigger) pair it records
the §3 classification (wrapper-only / unsupported / no-op / docs-only /
n/a / command), capture_required, and the P4 disposition (captured
fixture OR explicit deferral — never hand-written, per O6).
Locked from §3 (column order Claude Code / Codex / Cursor / Gemini CLI
legacy / Antigravity / Windsurf):
- b3 capture-required candidates: claude-code {session-start, session-end,
pre-compact, post-compact}; codex {session-start, pre-compact,
post-compact, turn-stop(JSON)}; gemini-cli-legacy {4 lifecycle} IFF S18
smoke confirms. Excluded: codex session-end (unsupported), Cursor/
Windsurf lifecycle (no-op), Antigravity (docs-only), all manual /orient
(command — exercised by b1 --selftest, not a b3 fixture).
- b1 wrapper coverage: Claude Code + Codex ship; gemini S18-smoke-gated;
Cursor/Windsurf no-op (no wrapper); Antigravity docs-only (no wrapper).
Two items explicitly carried to P4 capture (recorded in the note §C):
- D1: §3 classifies codex session-start wrapper-only, but the S26 README
planned pack omitted it. §3 is the SoT and is locked; P4 capture must
confirm Codex emits session-start or record an explicit deferral +
reconcile the README.
- S18: the entire gemini column (fixtures + wrapper) is smoke-gated; if
S18 is not confirmed at P4, gemini is deferred explicitly.
Manifest: the S26-fixture and wrapper-coverage assumptions' validation_plans
now point at the locked matrix note and state capture is a P4 gate (S26
validation_plan rewritten concisely to 569/800 after it accumulated over
the 800 cap). ROADMAP SI-2 P2 row flipped ⏳ in_progress -> ✅ passed; its
Gate column sharpened to a repeatable check (validate-change-manifests.py
--manifest exit 0; 4 task_slices, no b5 = O9 punt reflected; matrix locked
in the note) per the L1 auditability lesson. last_updated set to real
clock time (last_updated <= commit time).
Verification (subprocess return codes):
- validate-change-manifests.py --manifest <SI-2>: exit 0
- validate-change-manifests.py (root): exit 0
- check-internal-links.py: exit 0 (matrix-note + ROADMAP pointers resolve)
- generate-section-index.py --check: exit 0
- make verify-docs / verify-local: exit 0
- full validator agent_protocol_validate on SI-2: only F1-waived
sot.source_file_missing remains
- SI-2 stays at 4 task_slices (O9 punt); P2 row = ✅ passed
- changed: SI-2 manifest, ROADMAP.md, new P2 matrix evidence note
…e handoff/blocker text (M2) + make P2 row auditable (M3) Three fix-first findings from the maintainer's review of 3575e33. M1 — b3 gate contradicted the captured-or-deferred model: - The b3 scope/review_plan + check-source-verification-flipped.sh UNCONDITIONALLY required real fixtures for every pair and S26/S27 flipped off pending. Under P2's "captured OR explicit deferral" model a deferred pair would either fail the gate or force false evidence. Reworked b3 to the consistency model: capture the capturable pairs, record an explicit deferral for the rest (no hand-written fixtures), and update S26/S27 (pipe rows + §2.6.f/§2.6.g detail) to the captured-or-deferred reality — confirmed ONLY for captured pairs, deferred-with-reason for the rest, never falsely confirmed. The planned check-source-verification-flipped.sh now asserts the S26/S27 ledger is CONSISTENT with the committed fixtures + deferral records (not "all off pending"). b3 scope 770/800. - The surfaces_touched note claimed S26 fixtures cover "all six lifecycle triggers including manual"; manual is a user-invoked /orient command (no runtime event ⇒ no S26 fixture; exercised by b1 --selftest). Corrected to the five runtime-event triggers per the P2 matrix, manual explicitly excluded. M2 — stale text after the ed82482 dispositions: - manifest handoff_narrative said O4/O6/O8/O9 "all are pending" → now records them DISPOSITIONED (O4 keep-G4 reader + 3-site tension; O6 real-captured-only; O8 accept + §1.9.7 distinctness; O9 punt) with the real remaining gates (P3/P4 authorization, P4 capture feasibility, distinct Implementer identity). - ROADMAP P4 row said "Blocked on O4/O6/O8/O9 disposition" → now "dispositioned at P2; P4 gated on implementation authorization + capture feasibility + distinct Implementer identity." M3 — P2 ROADMAP row auditability (ROADMAP §34 rule): - Commit column: placeholder → concrete SHAs `ed82482` (dispositions) + `3575e33` (matrix lock) + review-fix follow-ups on the branch. - Artifact column: added the P2 matrix note. Verification (subprocess return codes): - validate-change-manifests.py --manifest <SI-2>: exit 0 - validate-change-manifests.py (root): exit 0 - check-internal-links.py / generate-section-index.py --check: exit 0 - make verify-docs / verify-local: exit 0 - full validator agent_protocol_validate on SI-2: only F1-waived sot.source_file_missing remains - residual stale-text scans = 0 ("all are pending", S26-covers-manual, P4 "Blocked on O4", b3 unconditional off-pending) - changed: SI-2 manifest, ROADMAP.md
…ip P3 row ✅ (last Planner phase) Maintainer reviewed a0481a1 (no findings) and said continue. P3 is the last Planner-phase task — read-only, no canonical edits, no implementation. Deliverable: evidence/2026-05-31-ecc-pr2-si2-category-e-implementation/ p3-test-plan.md — the executable verification matrix, consolidating the gates already declared in the task_slice verification_commands into a completeness-checked plan: - Per-slice gates (P4): b1 = make verify-hooks-shellcheck + scripts/orient.sh --selftest (synthetic deterministic-core: CLI shape, fail-safe-empty, turn-stop JSON-only, §7.1 7-field whitelist under the 6 triggers); b2 = check-internal-links + generate-section-index --check + check-threat-model-additive.sh + grep §1.9.7 present; b3 = check-paths-exist + validate-runtime-event-fixtures.py --check + check-source-verification- flipped.sh (ledger consistent with captured-or-deferred); b4 = verify-hooks-shellcheck + anchored grep orient.sh in Makefile AND validate.yml + generate-changelog-json.py --check. - Cross-cutting (P4→P5→P6): validate-change-manifests.py --manifest; make verify-docs / verify-local; check-role-consistency.py; full agent_protocol_validate. - Completeness: 4/4 task_slices have a verification_command; 4/4 evidence_plan entries map to ≥1 gate (system_interface→b1, process→b2, operational→b3+b4); no SI-2 surface ungated. Forward-obligations P3 surfaced for P4 (recorded in the note + the P3 ROADMAP Notes): - Create the 3 planned helper scripts or their gates fail. - REMOVE the now-moot F1 waiver once scripts/orient.sh lands — the sot.source_file_missing finding disappears, so a live waiver for a satisfied condition would be stale evidence. (A genuine gate-lifecycle obligation the test plan must hand to the implementer.) - Resolve capture feasibility / gemini S18 smoke / codex-session-start §3-vs-README at P4 execution. ROADMAP P3 row flipped ⏸ paused -> ✅ passed, made auditable from the start (per the L1/M3 lessons): exact repeatable Gate checks, the test-plan note as Artifact, _(this change)_ commit per the repo's self-reference convention. Verification (subprocess return codes): - validate-change-manifests.py --manifest <SI-2> + root: exit 0 - check-internal-links.py: exit 0 (P3 note + ROADMAP pointers resolve) - generate-section-index.py --check: exit 0 - make verify-docs / verify-local: exit 0 - changed: ROADMAP.md + new P3 test-plan note (no manifest edit, no canonical edit)
…at P3 is passed Fix-first finding from the maintainer's review of c3e6b2a. When P3 was flipped to ✅ passed (ROADMAP + the new p3-test-plan.md), the manifest handoff_narrative was left stale: it still listed "explicit P3/P4 authorization" among the remaining gates, implying P3 still needs authorization. That would mislead the next session / Implementer. This is the same cross-cutting-consumer drift I committed to avoiding — when a phase state changes, all its downstream consumers must move in the same commit. I flipped the P3 row but did not re-sweep the handoff. Caught on review; fixed now and re-swept (the only stale P3-as-pending-gate reference was this one — residual scans: "P3/P4 authorization" 0, "explicit P3" 0). Change: handoff "Remaining gates: explicit P3/P4 authorization, P4 fixture-capture feasibility, and a distinct Implementer identity" -> "Remaining gates: explicit P4 implementation authorization, P4 fixture-capture feasibility, and a distinct Implementer identity". This matches ROADMAP P3 (✅ passed), ROADMAP P4 (gated on P4 authorization + capture feasibility + distinct Implementer identity), and the P3 note. Bumped last_updated 12:01:19 -> 12:52:59 +08:00 (real clock; <= commit time). Verification (subprocess return codes): - validate-change-manifests.py --manifest <SI-2> + root: exit 0 - check-internal-links.py / generate-section-index.py --check: exit 0 - make verify-docs / verify-local: exit 0 - full validator agent_protocol_validate on SI-2: only F1-waived sot.source_file_missing remains - changed: SI-2 manifest only (handoff line + last_updated)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR completes the planning phases (P1 Investigation, P2 Plan-lock, P3 Test plan) for SI-2 Category-E implementation. It adds three evidence documents that surface the §7.1 contract requirements, per-runtime fixture/wrapper coverage, and executable verification gates. The manifest is updated to reflect maintainer dispositions on O4/O6/O8/O9 and to extend the residual-risk record with the three canonical sites that must be corrected when the S25 state-file producer lands.
Scope classification
Surfaces touched
Rationale:
Source of Truth impact
ROADMAP.md(SI-2 row updated with P1–P3 completion and maintainer dispositions),.github/scripts/change-manifest.ecc-pr2-si2-category-e-implementation.yaml(manifest updated with P1–P3 evidence, O9 disposition confirmed, residual-risks extended).evidence/2026-05-31-ecc-pr2-si2-category-e-implementation/p1-investigation.md,p2-fixture-wrapper-matrix.md,p3-test-plan.md.docs/,AGENTS.md, schemas, or skills. The three canonical sites that require future correction (§Category E ~L136, §Requirements-on-runtime-bridges item 6 ~L416, §1.9.2 ~L192) are identified in the residual-risks entry but not edited in this PR.Evidence
P1 Investigation (
p1-investigation.md):P2 Plan-lock (
p2-fixture-wrapper-matrix.md):https://claude.ai/code/session_01CddK7uiWupUh6ydxPJYvpP