Codex/ecc si2 p4 implementation#27
Conversation
Implements the b1 fail-safe-empty Category E reader at scripts/orient.sh. The core emits the §7.1 JSON shape, filters the seven whitelisted orientation fields, validates hook_state_flags, reads optional state without writing it, degrades missing/corrupt state to an empty payload, and forces Codex turn-stop hook output to JSON. This keeps the G4/S25 boundary intact: no state-file producer is added. P4 is opened on codex/ecc-si2-p4-implementation with Implementer identity ecc-pr2-implementer-identity-1. The manifest phase flips to implement, the Implementer author is recorded, b1 evidence is collected, and the F1 sot.source_file_missing waiver is removed because scripts/orient.sh now exists. Makefile and validate.yml get the explicit scripts/orient.sh shellcheck row, and verify-local now runs orient.sh --selftest so the b1 gate validates the actual script. Verification (subprocess return codes): SI2_SCHEMA=0; ROOT_MANIFESTS=0; INTERNAL_LINKS=0; SECTION_INDEX=0; CHANGELOG_JSON=0; VERIFY_DOCS=0; VERIFY_LOCAL=0 (with existing advisory that [Unreleased] is empty); FULL_VALIDATOR=0; B1_GATE make verify-hooks-shellcheck=0 and bash scripts/orient.sh --selftest=0.
Adds docs/runtime-hook-threat-model.md §1.9.7 for orientation-payload context injection with the required distinctness clause against §1.9.2, §1.9.4, and §1.9.5. This is additive to the protected threat-model bodies and keeps the §7.1 contract plus bridge item 6 untouched. Adds .github/scripts/check-threat-model-additive.sh so the b2 gate can compare §1.1-§1.8 and §1.9.1-§1.9.6 byte-for-byte against origin/main. The visible Category-E failure-mode lists and docs section index are updated, and the manifest records b2 evidence under Implementer identity ecc-pr2-implementer-identity-1. Verification (subprocess return codes): SI2_SCHEMA=0; ROOT_MANIFESTS=0; INTERNAL_LINKS=0; SECTION_INDEX=0; CHANGELOG_JSON=0; VERIFY_DOCS=0; VERIFY_LOCAL=0 (with existing advisory that [Unreleased] is empty); FULL_VALIDATOR=0; DIFF_CHECK=0; B2_GATE check-internal-links=0, generate-section-index --check=0, check-threat-model-additive=0, §1.9.7 grep=0.
Resolves the S26 runtime-event ABI fixture slice without fabricating fixtures. No real runtime hook-input payloads or maintainer-donated captures were available in this P4 workspace, so tests/fixtures/runtime-events/deferrals.json records every P2 candidate pair as deferred with reason and README.md now states captured=0/deferred=12. Adds validate-runtime-event-fixtures.py for hook-input fixture validation and check-source-verification-flipped.sh for S26/S27 ledger consistency. docs/source-verification-notes.md flips S26/S27 to captured-or-deferred reality, including codex session-start and full gemini S18 deferral, while never claiming repo-smoke-confirmed for uncaptured pairs. Verification (subprocess return codes): SI2_SCHEMA=0; ROOT_MANIFESTS=0; INTERNAL_LINKS=0; SECTION_INDEX=0; CHANGELOG_JSON=0; VERIFY_DOCS=0; VERIFY_LOCAL=0 (with existing advisory that [Unreleased] is empty); FULL_VALIDATOR=0; PY_COMPILE=0; DIFF_CHECK=0; B3_GATE check-paths-exist=0, validate-runtime-event-fixtures=0, check-source-verification-flipped=0.
Completes the operational b4 slice. The explicit scripts/orient.sh shellcheck row and verify-local selftest wiring were landed in b1 so the new reader was linted and exercised immediately; this commit records that evidence in the manifest and adds [Unreleased] CHANGELOG entries for b1, b2, and b3. Regenerates CHANGELOG.json from CHANGELOG.md. The b4 gate confirms Makefile and validate.yml contain exact scripts/orient.sh rows rather than a glob, and make verify-local now reports [Unreleased] has content yes instead of the prior advisory. Verification (subprocess return codes): SI2_SCHEMA=0; ROOT_MANIFESTS=0; INTERNAL_LINKS=0; SECTION_INDEX=0; CHANGELOG_JSON=0; VERIFY_DOCS=0; VERIFY_LOCAL=0; FULL_VALIDATOR=0; DIFF_CHECK=0; B1_SELFTEST=0; B3_GATE check-paths-exist=0, validate-runtime-event-fixtures=0, check-source-verification-flipped=0; B4_GATE make verify-hooks-shellcheck=0, Makefile grep=0, validate.yml grep=0, generate-changelog-json --check=0.
Closes Phase 4 only and advances the SI-2 manifest to phase=review with status still in_progress. The handoff narrative records the actual P4 delivery: b1 orient.sh reader and F1 waiver removal, b2 §1.9.7 plus additive checker, b3 captured=0/deferred=12 runtime-event fixture reality, and b4 explicit orient.sh CI wiring plus CHANGELOG artifacts. Updates ROADMAP P4 from in_progress to passed with the concrete b1-b4 commit chain and the repeatable final gate. P5/P6 are intentionally not attempted here; the next Reviewer identity must differ from both ecc-pr2-planner-identity-1 and ecc-pr2-implementer-identity-1. Verification (subprocess return codes): SI2_SCHEMA=0; ROOT_MANIFESTS=0; INTERNAL_LINKS=0; SECTION_INDEX=0; CHANGELOG_JSON=0; VERIFY_DOCS=0; VERIFY_LOCAL=0; FULL_VALIDATOR=0; DIFF_CHECK=0; B1_GATE=0; B2_GATE=0; B3_GATE=0; B4_GATE=0.
Persists the P5 read-only audit by distinct Reviewer identity ecc-pr2-reviewer-identity-1 (Claude Opus 4.8 1M context) into the SI-2 manifest: six review_notes (five pass + one pass_with_followup) and one AI approval, plus the reviewer added to authors[]. Verdict PASS_WITH_FOLLOWUP. The audit confirmed O4 reader-only + the three-site producer tension preserved (threat-model §1.9.2 byte-identical to origin/main), O6 captured=0/deferred=12 no-fabrication, O8 additive §1.9.7 with the distinctness clause, O9 punt (4 task_slices, no b5), and the §7.1 contract surface invariants; all 17 gates reproduced exit 0. Cross-verified independently by an external Codex review. ROADMAP SI-2 P5 row flips to passed with the review_notes/approvals artifact, and the Status + Identity-sequence lines record the now-assigned distinct Reviewer identity. No implementation files are edited in this persistence step: orient.sh, the threat model, the fixtures, and the check-paths-exist.sh:127 placeholder label are all untouched. The single non-blocking pass_with_followup (that stale label) is recorded, not fixed. Manifest phase stays review / status in_progress (pass_with_followup does not advance to signoff). P6 human sign-off and any version bump remain pending. No push. Verification (subprocess return codes): SCOPED_SCHEMA=0; ROOT_RESOLVER=0; FULL_VALIDATOR=0 (findings: []); INTERNAL_LINKS=0; DIFF_CHECK=0; changed files = manifest + ROADMAP only.
Manifest-only fix-up for the P5 persistence commit (259d69b), which updated review_notes / approvals / authors[] / last_updated and ROADMAP P5 but left the handoff_narrative still claiming "ready for Phase 5 review", "Reviewer remains unassigned", and "P5/P6 are NOT attempted" — contradicting the same file's recorded P5 audit and the ROADMAP P5=passed row. External Codex review flagged the inconsistency. Updates the narrative to state P5 PASSED (PASS_WITH_FOLLOWUP) by distinct Reviewer identity ecc-pr2-reviewer-identity-1, names the one non-blocking follow-up (the unfixed check-paths-exist.sh:127 placeholder label), and records that P6 human sign-off + version bump remain pending. No state field changed: phase stays review / status in_progress. The b1-b4 description and O4/O6/O8/O9 dispositions are preserved verbatim. Scope: manifest handoff_narrative only. No implementation files, no ROADMAP, no check-paths-exist.sh label edit, no P6, no push. Verification (subprocess return codes): SCOPED_SCHEMA=0; ROOT_RESOLVER=0; FULL_VALIDATOR=0 (findings: []); DIFF_CHECK=0; stale-phrase grep = 0 matches; changed files = manifest only.
The closed/accepted RFC-026 root manifest (shipped v1.51.0) left all three residual_risks[].accepted_by at the "unaccepted" sentinel even though it was signed off with a recorded Reviewer approval whose conditions enumerate the residual risk. The completion-audit stop hook binds to the only root-level change-manifest*.yaml, so it blocked every session's stop with completion.unaccepted-risk on this stale, internally-inconsistent record. Flips the three accepted_by fields to the P5 Reviewer of record (rfc-026-reviewer-identity-3, Reviewer Meridian); the value points to approvals[0].conditions, so it records the acceptance that already happened at P5 rather than backdating a new one. Bumped last_updated. No state change: phase stays signoff / status stays accepted; residual_risks descriptions, severities, and mitigations are unchanged. Scope: residual_risks[].accepted_by + last_updated only. Unrelated working-tree changes (the SI-2 check-paths-exist.sh label edit and the untracked docs/proposals/* drafts) are deliberately excluded. Verification (subprocess return codes): completion-audit.sh = 0 (was 1); unaccepted-risk count 3 -> 0; validate-change-manifests.py = 0; direct Draft 2020-12 schema validation = 0 errors. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 89a3223d74
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| rule = ALLOWED_FLAGS.get(name) | ||
| if rule is None: | ||
| return sep == "" |
There was a problem hiding this comment.
Drop unknown agent-protocol flags explicitly
When the state file contains an undeclared no-value flag such as agent-protocol/not-allowed, ALLOWED_FLAGS.get(name) returns None, the same value used for known no-value flags, so valid_flag() returns true and propagates arbitrary methodology-namespace tags into the Category E payload. This violates the closed-set rule in docs/runtime-hook-contract.md for hook_state_flags, and it matters because this payload can enter model context from a poisoned or stale state file.
Useful? React with 👍 / 👎.
| return | ||
| try: | ||
| repo = Path.cwd().resolve() | ||
| resolved = (repo / candidate).resolve(strict=False) |
There was a problem hiding this comment.
Reject unresolved manifest paths
For a poisoned or stale state file that names a repo-relative path that does not exist, resolve(strict=False) succeeds and the hook emits that path as active_manifest_path instead of null plus agent-protocol/manifest-rejected:unresolved. The contract says unknown/unavailable active manifests must not be guessed, and the threat model lists unresolved as a rejected path reason, so this lets an attacker or stale producer steer the agent toward a fabricated manifest path.
Useful? React with 👍 / 👎.
| for key in ("current_phase", "current_mode", "roadmap_row_id", "roadmap_row_state"): | ||
| value = source.get(key) | ||
| orientation[key] = value if isinstance(value, str) else None |
There was a problem hiding this comment.
Bound phase and roadmap fields before emitting
If the runtime-owned state file is poisoned or stale, these fields are copied as arbitrary strings, including newlines, ANSI/control characters, or instruction-shaped text, and then emitted into the Category E payload that may enter model context. This undercuts the new §1.9.7 mitigation that the core keep orientation values free of free-text instruction channels; at minimum these phase/mode/state/id fields need closed enums or the same kind of length/charset filtering used for branches and flags.
Useful? React with 👍 / 👎.
| raw_flags = source.get("hook_state_flags", []) | ||
| if isinstance(raw_flags, list): | ||
| for item in raw_flags: | ||
| if valid_flag(item): | ||
| add_flag(flags, item) | ||
| add_flag(flags, state_mtime_flag(state_path)) |
There was a problem hiding this comment.
Do not trust state-provided mtime flags
When a poisoned or stale state file already contains a syntactically valid agent-protocol/state-mtime:<timestamp>, the loop accepts it before appending the real file mtime, so the payload can contain a forged future freshness signal alongside the actual one. Since §1.9.1 relies on the core stamping state-mtime as the stale-orientation mitigation, state-sourced state-mtime should be ignored or replaced rather than propagated from untrusted input.
Useful? React with 👍 / 👎.
| # explicitly does not equal full CI parity). | ||
| run: | | ||
| shellcheck -s sh \ | ||
| scripts/orient.sh \ |
There was a problem hiding this comment.
Run the runtime-event fixture verifier in CI
I checked .github/workflows/validate.yml and Makefile: this change only adds scripts/orient.sh to shellcheck/selftest, while the new validate-runtime-event-fixtures.py and check-source-verification-flipped.sh gates are only referenced from documentation/manifest text. That means future invalid or hand-written runtime-event fixtures, or stale S26/S27 deferrals, can still pass the repository's normal CI/local gates unless a reviewer remembers to run the README commands manually.
Useful? React with 👍 / 👎.
Summary
Scope classification
Surfaces touched
Rationale for the surfaces checked (one sentence each is enough):
Source of Truth impact
docs/…/schemas/…/skills/…docs/…,skills/engineering-workflow/templates/…,reference-implementations/…Evidence
python3 .github/scripts/validate-schema-syntax.pypython3 .github/scripts/validate-templates.pysh .github/scripts/check-version-consistency.shsh reference-implementations/hooks-claude-code/selftests/selftest.sh(if hook bundle touched)pytestinreference-implementations/validator-python/(if validator touched)docs/diagrams.mdtouched)Evidence artifacts (paste relevant output or link to CI run):
Merge readiness and branch protection
not run(state why)python3 .github/scripts/verify-branch-protection.py --selftestpython3 .github/scripts/verify-branch-protection.py --live --repo <owner/name> --branch mainBreaking-change level
If ≥ L1, link the migration guidance (usually
CHANGELOG.mdunderChanged/Removedor a dedicated doc section).Rollback mode
Cross-cutting concerns
Only check items that changed. Leave unchecked if this PR does not touch them.
reference-implementations/hooks-*, CI workflow, or anything in.github/. Confirm no untrusted event inputs, no side effects, no network-in-hook.Change Manifest (Full mode)
Checklist before requesting review
CLAUDE.md§5 observed: if a cross-cutting term changed, all consumers updated.By submitting this PR you confirm: the changes are your own, MIT-compatible, and free of secrets or proprietary content.