Skip to content

Codex/ecc si2 p4 implementation#27

Closed
EsatanGW wants to merge 8 commits into
mainfrom
codex/ecc-si2-p4-implementation
Closed

Codex/ecc si2 p4 implementation#27
EsatanGW wants to merge 8 commits into
mainfrom
codex/ecc-si2-p4-implementation

Conversation

@EsatanGW
Copy link
Copy Markdown
Owner

Summary

Scope classification

  • Lean — single surface, ≤ 1 consumer, no public behavior change.
  • Full — multiple surfaces, public behavior change, new contract, or any breaking change.
  • Trivial — typo / comment / single-file doc fix / chore.

Surfaces touched

  • User surface (UI, copy, routes, a11y, i18n)
  • System-interface surface (schemas, APIs, event shapes, public contracts, hook contracts)
  • Information surface (config, flags, enums, validation rules)
  • Operational surface (CI, release process, runbook, observability)

Rationale for the surfaces checked (one sentence each is enough):

Source of Truth impact

  • SoT file(s): docs/… / schemas/… / skills/…
  • Consumer updates in this PR: docs/…, skills/engineering-workflow/templates/…, reference-implementations/…
  • Known consumers NOT updated and why:

Evidence

  • Ran python3 .github/scripts/validate-schema-syntax.py
  • Ran python3 .github/scripts/validate-templates.py
  • Ran sh .github/scripts/check-version-consistency.sh
  • Ran sh reference-implementations/hooks-claude-code/selftests/selftest.sh (if hook bundle touched)
  • Ran pytest in reference-implementations/validator-python/ (if validator touched)
  • Visual diagrams render in GitHub preview (if docs/diagrams.md touched)

Evidence artifacts (paste relevant output or link to CI run):

Merge readiness and branch protection

  • Feature branch:
  • CI run URL / check summary:
  • Reviewer approval link:
  • Intended merge strategy: squash / merge commit / rebase
  • Branch-protection verification:
    • not run (state why)
    • Offline verifier selftest passed: python3 .github/scripts/verify-branch-protection.py --selftest
    • Live verifier passed: python3 .github/scripts/verify-branch-protection.py --live --repo <owner/name> --branch main

Breaking-change level

  • L0 — additive, no consumer action required.
  • L1 — additive + deprecation, opt-in migration.
  • L2 — structural, consumers must adapt.
  • L3 — removal / renaming with migration path.
  • L4 — removal / renaming without migration path (avoid — requires explicit justification).

If ≥ L1, link the migration guidance (usually CHANGELOG.md under Changed / Removed or a dedicated doc section).

Rollback mode

  • Mode 1 — reversible by revert (most doc / code changes here).
  • Mode 2 — forward-fix (e.g. schema version bump that already shipped).
  • Mode 3 — compensation required (rare in this repo; call it out).

Cross-cutting concerns

Only check items that changed. Leave unchecked if this PR does not touch them.

  • Security — touches reference-implementations/hooks-*, CI workflow, or anything in .github/. Confirm no untrusted event inputs, no side effects, no network-in-hook.
  • Observability — touches logging / telemetry / metrics surfaces that adopters rely on.
  • Testability — adds or removes CI jobs, selftests, or validator coverage.
  • Performance — changes anything with a declared latency budget (runtime hooks < 500 ms / 2 s per contract).

Change Manifest (Full mode)

  • ROADMAP initiative:
  • Or attached manifest:
  • Or N/A (trivial / single-surface Lean):

Checklist before requesting review

  • Scope matches the ROADMAP row (or this is a trivial change with no row).
  • No drive-by refactors; no opportunistic cleanup outside the declared scope.
  • No vendor / model / framework names introduced into normative content (bridges excepted).
  • No CHANGELOG rewrites of past entries — the new entry appended, old entries left as historical fact.
  • CLAUDE.md §5 observed: if a cross-cutting term changed, all consumers updated.

By submitting this PR you confirm: the changes are your own, MIT-compatible, and free of secrets or proprietary content.

Abel and others added 8 commits May 31, 2026 23:01
Implements the b1 fail-safe-empty Category E reader at scripts/orient.sh. The core emits the §7.1 JSON shape, filters the seven whitelisted orientation fields, validates hook_state_flags, reads optional state without writing it, degrades missing/corrupt state to an empty payload, and forces Codex turn-stop hook output to JSON. This keeps the G4/S25 boundary intact: no state-file producer is added.

P4 is opened on codex/ecc-si2-p4-implementation with Implementer identity ecc-pr2-implementer-identity-1. The manifest phase flips to implement, the Implementer author is recorded, b1 evidence is collected, and the F1 sot.source_file_missing waiver is removed because scripts/orient.sh now exists. Makefile and validate.yml get the explicit scripts/orient.sh shellcheck row, and verify-local now runs orient.sh --selftest so the b1 gate validates the actual script.

Verification (subprocess return codes): SI2_SCHEMA=0; ROOT_MANIFESTS=0; INTERNAL_LINKS=0; SECTION_INDEX=0; CHANGELOG_JSON=0; VERIFY_DOCS=0; VERIFY_LOCAL=0 (with existing advisory that [Unreleased] is empty); FULL_VALIDATOR=0; B1_GATE make verify-hooks-shellcheck=0 and bash scripts/orient.sh --selftest=0.
Adds docs/runtime-hook-threat-model.md §1.9.7 for orientation-payload context injection with the required distinctness clause against §1.9.2, §1.9.4, and §1.9.5. This is additive to the protected threat-model bodies and keeps the §7.1 contract plus bridge item 6 untouched.

Adds .github/scripts/check-threat-model-additive.sh so the b2 gate can compare §1.1-§1.8 and §1.9.1-§1.9.6 byte-for-byte against origin/main. The visible Category-E failure-mode lists and docs section index are updated, and the manifest records b2 evidence under Implementer identity ecc-pr2-implementer-identity-1.

Verification (subprocess return codes): SI2_SCHEMA=0; ROOT_MANIFESTS=0; INTERNAL_LINKS=0; SECTION_INDEX=0; CHANGELOG_JSON=0; VERIFY_DOCS=0; VERIFY_LOCAL=0 (with existing advisory that [Unreleased] is empty); FULL_VALIDATOR=0; DIFF_CHECK=0; B2_GATE check-internal-links=0, generate-section-index --check=0, check-threat-model-additive=0, §1.9.7 grep=0.
Resolves the S26 runtime-event ABI fixture slice without fabricating fixtures. No real runtime hook-input payloads or maintainer-donated captures were available in this P4 workspace, so tests/fixtures/runtime-events/deferrals.json records every P2 candidate pair as deferred with reason and README.md now states captured=0/deferred=12.

Adds validate-runtime-event-fixtures.py for hook-input fixture validation and check-source-verification-flipped.sh for S26/S27 ledger consistency. docs/source-verification-notes.md flips S26/S27 to captured-or-deferred reality, including codex session-start and full gemini S18 deferral, while never claiming repo-smoke-confirmed for uncaptured pairs.

Verification (subprocess return codes): SI2_SCHEMA=0; ROOT_MANIFESTS=0; INTERNAL_LINKS=0; SECTION_INDEX=0; CHANGELOG_JSON=0; VERIFY_DOCS=0; VERIFY_LOCAL=0 (with existing advisory that [Unreleased] is empty); FULL_VALIDATOR=0; PY_COMPILE=0; DIFF_CHECK=0; B3_GATE check-paths-exist=0, validate-runtime-event-fixtures=0, check-source-verification-flipped=0.
Completes the operational b4 slice. The explicit scripts/orient.sh shellcheck row and verify-local selftest wiring were landed in b1 so the new reader was linted and exercised immediately; this commit records that evidence in the manifest and adds [Unreleased] CHANGELOG entries for b1, b2, and b3.

Regenerates CHANGELOG.json from CHANGELOG.md. The b4 gate confirms Makefile and validate.yml contain exact scripts/orient.sh rows rather than a glob, and make verify-local now reports [Unreleased] has content yes instead of the prior advisory.

Verification (subprocess return codes): SI2_SCHEMA=0; ROOT_MANIFESTS=0; INTERNAL_LINKS=0; SECTION_INDEX=0; CHANGELOG_JSON=0; VERIFY_DOCS=0; VERIFY_LOCAL=0; FULL_VALIDATOR=0; DIFF_CHECK=0; B1_SELFTEST=0; B3_GATE check-paths-exist=0, validate-runtime-event-fixtures=0, check-source-verification-flipped=0; B4_GATE make verify-hooks-shellcheck=0, Makefile grep=0, validate.yml grep=0, generate-changelog-json --check=0.
Closes Phase 4 only and advances the SI-2 manifest to phase=review with status still in_progress. The handoff narrative records the actual P4 delivery: b1 orient.sh reader and F1 waiver removal, b2 §1.9.7 plus additive checker, b3 captured=0/deferred=12 runtime-event fixture reality, and b4 explicit orient.sh CI wiring plus CHANGELOG artifacts.

Updates ROADMAP P4 from in_progress to passed with the concrete b1-b4 commit chain and the repeatable final gate. P5/P6 are intentionally not attempted here; the next Reviewer identity must differ from both ecc-pr2-planner-identity-1 and ecc-pr2-implementer-identity-1.

Verification (subprocess return codes): SI2_SCHEMA=0; ROOT_MANIFESTS=0; INTERNAL_LINKS=0; SECTION_INDEX=0; CHANGELOG_JSON=0; VERIFY_DOCS=0; VERIFY_LOCAL=0; FULL_VALIDATOR=0; DIFF_CHECK=0; B1_GATE=0; B2_GATE=0; B3_GATE=0; B4_GATE=0.
Persists the P5 read-only audit by distinct Reviewer identity
ecc-pr2-reviewer-identity-1 (Claude Opus 4.8 1M context) into the SI-2
manifest: six review_notes (five pass + one pass_with_followup) and one AI
approval, plus the reviewer added to authors[]. Verdict PASS_WITH_FOLLOWUP.
The audit confirmed O4 reader-only + the three-site producer tension preserved
(threat-model §1.9.2 byte-identical to origin/main), O6 captured=0/deferred=12
no-fabrication, O8 additive §1.9.7 with the distinctness clause, O9 punt
(4 task_slices, no b5), and the §7.1 contract surface invariants; all 17 gates
reproduced exit 0. Cross-verified independently by an external Codex review.

ROADMAP SI-2 P5 row flips to passed with the review_notes/approvals artifact,
and the Status + Identity-sequence lines record the now-assigned distinct
Reviewer identity.

No implementation files are edited in this persistence step: orient.sh, the
threat model, the fixtures, and the check-paths-exist.sh:127 placeholder label
are all untouched. The single non-blocking pass_with_followup (that stale label)
is recorded, not fixed. Manifest phase stays review / status in_progress
(pass_with_followup does not advance to signoff). P6 human sign-off and any
version bump remain pending. No push.

Verification (subprocess return codes): SCOPED_SCHEMA=0; ROOT_RESOLVER=0;
FULL_VALIDATOR=0 (findings: []); INTERNAL_LINKS=0; DIFF_CHECK=0; changed files =
manifest + ROADMAP only.
Manifest-only fix-up for the P5 persistence commit (259d69b), which updated
review_notes / approvals / authors[] / last_updated and ROADMAP P5 but left the
handoff_narrative still claiming "ready for Phase 5 review", "Reviewer remains
unassigned", and "P5/P6 are NOT attempted" — contradicting the same file's
recorded P5 audit and the ROADMAP P5=passed row. External Codex review flagged
the inconsistency.

Updates the narrative to state P5 PASSED (PASS_WITH_FOLLOWUP) by distinct
Reviewer identity ecc-pr2-reviewer-identity-1, names the one non-blocking
follow-up (the unfixed check-paths-exist.sh:127 placeholder label), and records
that P6 human sign-off + version bump remain pending. No state field changed:
phase stays review / status in_progress. The b1-b4 description and O4/O6/O8/O9
dispositions are preserved verbatim.

Scope: manifest handoff_narrative only. No implementation files, no ROADMAP, no
check-paths-exist.sh label edit, no P6, no push.

Verification (subprocess return codes): SCOPED_SCHEMA=0; ROOT_RESOLVER=0;
FULL_VALIDATOR=0 (findings: []); DIFF_CHECK=0; stale-phrase grep = 0 matches;
changed files = manifest only.
The closed/accepted RFC-026 root manifest (shipped v1.51.0) left all three
residual_risks[].accepted_by at the "unaccepted" sentinel even though it was
signed off with a recorded Reviewer approval whose conditions enumerate the
residual risk. The completion-audit stop hook binds to the only root-level
change-manifest*.yaml, so it blocked every session's stop with
completion.unaccepted-risk on this stale, internally-inconsistent record.

Flips the three accepted_by fields to the P5 Reviewer of record
(rfc-026-reviewer-identity-3, Reviewer Meridian); the value points to
approvals[0].conditions, so it records the acceptance that already happened at
P5 rather than backdating a new one. Bumped last_updated. No state change:
phase stays signoff / status stays accepted; residual_risks descriptions,
severities, and mitigations are unchanged.

Scope: residual_risks[].accepted_by + last_updated only. Unrelated working-tree
changes (the SI-2 check-paths-exist.sh label edit and the untracked
docs/proposals/* drafts) are deliberately excluded.

Verification (subprocess return codes): completion-audit.sh = 0 (was 1);
unaccepted-risk count 3 -> 0; validate-change-manifests.py = 0; direct
Draft 2020-12 schema validation = 0 errors.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@EsatanGW EsatanGW closed this May 31, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 89a3223d74

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread scripts/orient.sh
Comment on lines +151 to +153
rule = ALLOWED_FLAGS.get(name)
if rule is None:
return sep == ""
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Drop unknown agent-protocol flags explicitly

When the state file contains an undeclared no-value flag such as agent-protocol/not-allowed, ALLOWED_FLAGS.get(name) returns None, the same value used for known no-value flags, so valid_flag() returns true and propagates arbitrary methodology-namespace tags into the Category E payload. This violates the closed-set rule in docs/runtime-hook-contract.md for hook_state_flags, and it matters because this payload can enter model context from a poisoned or stale state file.

Useful? React with 👍 / 👎.

Comment thread scripts/orient.sh
return
try:
repo = Path.cwd().resolve()
resolved = (repo / candidate).resolve(strict=False)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reject unresolved manifest paths

For a poisoned or stale state file that names a repo-relative path that does not exist, resolve(strict=False) succeeds and the hook emits that path as active_manifest_path instead of null plus agent-protocol/manifest-rejected:unresolved. The contract says unknown/unavailable active manifests must not be guessed, and the threat model lists unresolved as a rejected path reason, so this lets an attacker or stale producer steer the agent toward a fabricated manifest path.

Useful? React with 👍 / 👎.

Comment thread scripts/orient.sh
Comment on lines +231 to +233
for key in ("current_phase", "current_mode", "roadmap_row_id", "roadmap_row_state"):
value = source.get(key)
orientation[key] = value if isinstance(value, str) else None
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Bound phase and roadmap fields before emitting

If the runtime-owned state file is poisoned or stale, these fields are copied as arbitrary strings, including newlines, ANSI/control characters, or instruction-shaped text, and then emitted into the Category E payload that may enter model context. This undercuts the new §1.9.7 mitigation that the core keep orientation values free of free-text instruction channels; at minimum these phase/mode/state/id fields need closed enums or the same kind of length/charset filtering used for branches and flags.

Useful? React with 👍 / 👎.

Comment thread scripts/orient.sh
Comment on lines +238 to +243
raw_flags = source.get("hook_state_flags", [])
if isinstance(raw_flags, list):
for item in raw_flags:
if valid_flag(item):
add_flag(flags, item)
add_flag(flags, state_mtime_flag(state_path))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Do not trust state-provided mtime flags

When a poisoned or stale state file already contains a syntactically valid agent-protocol/state-mtime:<timestamp>, the loop accepts it before appending the real file mtime, so the payload can contain a forged future freshness signal alongside the actual one. Since §1.9.1 relies on the core stamping state-mtime as the stale-orientation mitigation, state-sourced state-mtime should be ignored or replaced rather than propagated from untrusted input.

Useful? React with 👍 / 👎.

# explicitly does not equal full CI parity).
run: |
shellcheck -s sh \
scripts/orient.sh \
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Run the runtime-event fixture verifier in CI

I checked .github/workflows/validate.yml and Makefile: this change only adds scripts/orient.sh to shellcheck/selftest, while the new validate-runtime-event-fixtures.py and check-source-verification-flipped.sh gates are only referenced from documentation/manifest text. That means future invalid or hand-written runtime-event fixtures, or stale S26/S27 deferrals, can still pass the repository's normal CI/local gates unless a reviewer remembers to run the README commands manually.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant