Audit remediation Phase 1: F-6 exit codes, F-5a model override, F-1 yq guard by alanshurafa · Pull Request #32 · alanshurafa/co-evolution

alanshurafa · 2026-06-01T01:37:52Z

What this does

Phase 1 of the performance-audit remediation (REMEDIATION-PLAN.md, #31) -- the correctness and portability fixes. Three findings, each its own commit, each test-first.

F-6 -- `die()` now honors its exit code

die() ignored its optional second argument and always exited 1, silently collapsing ~51 call sites across 8 files that pass meaningful codes (2, 3, 5, 6, 8, 9, 10). For example, pr-emitter's "yq required" guard meant exit 2 but signalled a generic failure. Both die() definitions now exit "${2:-1}"; the default stays 1.

F-5a -- Claude model is overridable

invoke_claude hardcoded --model claude-opus-4-6 in two places. It now reads $CLAUDE_MODEL (default unchanged) with a --claude-model flag on co-evolve-bouncer.sh. Precedence: flag > CLAUDE_MODEL env > default. agent-bouncer.sh has no option parser, so it inherits the env var only.

F-1 -- reject the wrong `yq`

All three yq guards were presence-only (command -v yq), so Debian/Ubuntu's incompatible python yq (from apt install yq) passed and then failed later with opaque jq-ish errors. A shared require_mikefarah_yq helper now checks yq --version for the mikefarah signature; the two PEL guards that can't source the lib (self-containment invariants) inline the same check. The apt trap is documented in three READMEs.

Verification

Each fix has a hermetic test that fails before the change and passes after. The full existing suite was run before and after against master in a clean worktree -- exact parity, no regressions across the ~50 exit-code paths F-6 activated.

bash tests/die-exit-code-simulation.sh         # 7/7
bash tests/claude-model-override-simulation.sh # 4/4
bash tests/yq-guard-simulation.sh              # 6/6

Notes

template-proposer-simulation.sh fails from non-main worktrees. This is pre-existing (the test hardcodes a path to one checkout instead of using $REPO_ROOT) and is not introduced here; tracked separately.
F-2 (schema canonicalization) is deferred to its own change: the validator reads none of the three review-verdict.json copies and enforces the loose shape, while the strict copies feed the PowerShell runner -- a design decision with cross-runner risk, not a mechanical dedup.
shellcheck isn't wired yet (that lands in Phase 2); bash -n is clean on every touched file.

🤖 Generated with Claude Code

die() ignored its optional second argument and always exited 1, so the ~50 call sites that pass a meaningful code (2,3,5,6,8,9,10) collapsed to a generic failure -- e.g. pr-emitter.sh's "yq required" guard intended exit 2. Honor "${2:-1}" in both die() definitions (the canonical one in lib/ and the guarded fallback in evals/) so a caller's intent reaches the shell; the default stays 1. Add tests/die-exit-code-simulation.sh covering both definitions. The full existing suite shows exact baseline parity -- no path regressed on the now-activated codes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

invoke_claude hardcoded "--model claude-opus-4-6" in both the WSL and the direct branch, so the model could not change without editing source. Read $CLAUDE_MODEL instead (default unchanged at claude-opus-4-6) and add a --claude-model flag to co-evolve-bouncer.sh. Precedence: flag > CLAUDE_MODEL env > default. agent-bouncer.sh has no option parser, so it inherits the env var only; a flag there is deferred (4.3 may retire that runner). Add tests/claude-model-override-simulation.sh (default, env override, flag wiring, --help). Full suite parity holds -- only the pre-existing template-proposer failure remains, unrelated to this change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

All three yq guards did presence-only checks (command -v yq), so the incompatible python yq (Debian/Ubuntu apt install yq) passed and then failed later with opaque jq-ish errors. Add require_mikefarah_yq to lib/co-evolution.sh -- discriminate on `yq --version | grep -qi mikefarah` -- and call it from evals ensure_yq; inline the same check at the two PEL sites that cannot source lib by their self-containment invariants (proposer/policy keeps echo+exit 2, pr-emitter keeps die ... 2). Document the apt install yq trap in README, evals/README, lab/pel/README. Chose the version string over the plan's functional-probe preference: a behavioral probe is harder to make robust here, while mikefarah reliably prints "mikefarah" in --version and the python yq does not. Add tests/yq-guard-simulation.sh (helper accept/reject, drift coverage at all three sites, end-to-end python rejection at exit 2 -- which also exercises the F-6 exit-code fix). Full suite parity holds; only the pre-existing template-proposer failure remains. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b2631ad72d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-01T01:40:29Z

-  command -v yq >/dev/null 2>&1 || die "yq not found. Install mikefarah/yq: 'scoop install yq' (Windows), 'brew install yq' (macOS), 'apt install yq' (Linux go-install)."
+  # F-1: delegate to the shared guard in lib/co-evolution.sh (in scope at runtime
+  # via the eval-harness callers, which source both libs); rejects the python yq.
+  require_mikefarah_yq


Keep evals yq guard self-contained

When the eval library is executed directly via its supported --self-test entry point, this call now fails with require_mikefarah_yq: command not found because the self-test does not source lib/co-evolution.sh before calling ensure_yq. The main eval scripts source both libraries, but the library's own self-test and any standalone source of evals/lib/co-evolution-evals.sh now regress before reaching the actual yq check.

Useful? React with 👍 / 👎.

alanshurafa and others added 3 commits May 31, 2026 14:04

chatgpt-codex-connector Bot reviewed Jun 1, 2026

View reviewed changes

alanshurafa mentioned this pull request Jun 1, 2026

Canonicalize review-verdict.json to the loose shape (F-2) #34

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Audit remediation Phase 1: F-6 exit codes, F-5a model override, F-1 yq guard#32

Audit remediation Phase 1: F-6 exit codes, F-5a model override, F-1 yq guard#32
alanshurafa wants to merge 3 commits into
masterfrom
fix/audit-phase-1-correctness

alanshurafa commented Jun 1, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

alanshurafa commented Jun 1, 2026

What this does

F-6 -- die() now honors its exit code

F-5a -- Claude model is overridable

F-1 -- reject the wrong yq

Verification

Notes

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

F-6 -- `die()` now honors its exit code

F-1 -- reject the wrong `yq`