Skip to content

Audit remediation Phase 1: F-6 exit codes, F-5a model override, F-1 yq guard#32

Open
alanshurafa wants to merge 3 commits into
masterfrom
fix/audit-phase-1-correctness
Open

Audit remediation Phase 1: F-6 exit codes, F-5a model override, F-1 yq guard#32
alanshurafa wants to merge 3 commits into
masterfrom
fix/audit-phase-1-correctness

Conversation

@alanshurafa

Copy link
Copy Markdown
Owner

What this does

Phase 1 of the performance-audit remediation (REMEDIATION-PLAN.md, #31) -- the correctness and portability fixes. Three findings, each its own commit, each test-first.

F-6 -- die() now honors its exit code

die() ignored its optional second argument and always exited 1, silently collapsing ~51 call sites across 8 files that pass meaningful codes (2, 3, 5, 6, 8, 9, 10). For example, pr-emitter's "yq required" guard meant exit 2 but signalled a generic failure. Both die() definitions now exit "${2:-1}"; the default stays 1.

F-5a -- Claude model is overridable

invoke_claude hardcoded --model claude-opus-4-6 in two places. It now reads $CLAUDE_MODEL (default unchanged) with a --claude-model flag on co-evolve-bouncer.sh. Precedence: flag > CLAUDE_MODEL env > default. agent-bouncer.sh has no option parser, so it inherits the env var only.

F-1 -- reject the wrong yq

All three yq guards were presence-only (command -v yq), so Debian/Ubuntu's incompatible python yq (from apt install yq) passed and then failed later with opaque jq-ish errors. A shared require_mikefarah_yq helper now checks yq --version for the mikefarah signature; the two PEL guards that can't source the lib (self-containment invariants) inline the same check. The apt trap is documented in three READMEs.

Verification

Each fix has a hermetic test that fails before the change and passes after. The full existing suite was run before and after against master in a clean worktree -- exact parity, no regressions across the ~50 exit-code paths F-6 activated.

bash tests/die-exit-code-simulation.sh         # 7/7
bash tests/claude-model-override-simulation.sh # 4/4
bash tests/yq-guard-simulation.sh              # 6/6

Notes

  • template-proposer-simulation.sh fails from non-main worktrees. This is pre-existing (the test hardcodes a path to one checkout instead of using $REPO_ROOT) and is not introduced here; tracked separately.
  • F-2 (schema canonicalization) is deferred to its own change: the validator reads none of the three review-verdict.json copies and enforces the loose shape, while the strict copies feed the PowerShell runner -- a design decision with cross-runner risk, not a mechanical dedup.
  • shellcheck isn't wired yet (that lands in Phase 2); bash -n is clean on every touched file.

🤖 Generated with Claude Code

alanshurafa and others added 3 commits May 31, 2026 14:04
die() ignored its optional second argument and always exited 1, so the
~50 call sites that pass a meaningful code (2,3,5,6,8,9,10) collapsed to a
generic failure -- e.g. pr-emitter.sh's "yq required" guard intended exit
2. Honor "${2:-1}" in both die() definitions (the canonical one in lib/
and the guarded fallback in evals/) so a caller's intent reaches the
shell; the default stays 1.

Add tests/die-exit-code-simulation.sh covering both definitions. The full
existing suite shows exact baseline parity -- no path regressed on the
now-activated codes.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
invoke_claude hardcoded "--model claude-opus-4-6" in both the WSL and the
direct branch, so the model could not change without editing source. Read
$CLAUDE_MODEL instead (default unchanged at claude-opus-4-6) and add a
--claude-model flag to co-evolve-bouncer.sh. Precedence: flag > CLAUDE_MODEL
env > default. agent-bouncer.sh has no option parser, so it inherits the env
var only; a flag there is deferred (4.3 may retire that runner).

Add tests/claude-model-override-simulation.sh (default, env override, flag
wiring, --help). Full suite parity holds -- only the pre-existing
template-proposer failure remains, unrelated to this change.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
All three yq guards did presence-only checks (command -v yq), so the
incompatible python yq (Debian/Ubuntu apt install yq) passed and then
failed later with opaque jq-ish errors. Add require_mikefarah_yq to
lib/co-evolution.sh -- discriminate on `yq --version | grep -qi mikefarah`
-- and call it from evals ensure_yq; inline the same check at the two PEL
sites that cannot source lib by their self-containment invariants
(proposer/policy keeps echo+exit 2, pr-emitter keeps die ... 2). Document
the apt install yq trap in README, evals/README, lab/pel/README.

Chose the version string over the plan's functional-probe preference: a
behavioral probe is harder to make robust here, while mikefarah reliably
prints "mikefarah" in --version and the python yq does not.

Add tests/yq-guard-simulation.sh (helper accept/reject, drift coverage at
all three sites, end-to-end python rejection at exit 2 -- which also
exercises the F-6 exit-code fix). Full suite parity holds; only the
pre-existing template-proposer failure remains.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b2631ad72d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

command -v yq >/dev/null 2>&1 || die "yq not found. Install mikefarah/yq: 'scoop install yq' (Windows), 'brew install yq' (macOS), 'apt install yq' (Linux go-install)."
# F-1: delegate to the shared guard in lib/co-evolution.sh (in scope at runtime
# via the eval-harness callers, which source both libs); rejects the python yq.
require_mikefarah_yq

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Keep evals yq guard self-contained

When the eval library is executed directly via its supported --self-test entry point, this call now fails with require_mikefarah_yq: command not found because the self-test does not source lib/co-evolution.sh before calling ensure_yq. The main eval scripts source both libraries, but the library's own self-test and any standalone source of evals/lib/co-evolution-evals.sh now regress before reaching the actual yq check.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant