Single-model co-evolution: --single-model flag, loud failure UX, codex-access skill#30
Single-model co-evolution: --single-model flag, loud failure UX, codex-access skill#30alanshurafa wants to merge 4 commits into
Conversation
Pins both reviewer and composer onto one agent (claude or codex) when a second model isn't available, and prepends a persona-discipline preface asking the model to deliberately diverge from its own prior turn rather than nod along to shared-weight bias. Default (cross-model) runs are byte-identical — flag is opt-in only. Hermetic test exercises arg-parser + preamble-builder paths. https://claude.ai/code/session_018T89MTHJfAo1ZGgeygEC2B
A/B test on a dense paper section (2026-05-24) showed the persona-discipline preface SUPPRESSED concrete code-grounded critique — the bare same-model baseline raised 14 markers (citing specific files and line numbers) while the preface variant raised only 6 abstract methodological objections on the same input. Hypothesis: the preface's "read as if a stranger wrote it" + "suppress default voice" framing pushes the model into pure-text-critique mode and dampens its natural impulse to investigate the codebase via Read tool. Change: --single-model now defaults to bare two-role mode (no preface). The preface is preserved as opt-in via the new --persona-discipline flag, which still suits compose-then-bounce of one's own draft (where shared-author bias actually applies — that case wasn't isolated in the A/B). Test scenario F flipped from "preface present by default" to "preface absent by default"; scenario G added to cover the opt-in wiring. Empirical methodology and findings logged in notesforhumans.md. https://claude.ai/code/session_018T89MTHJfAo1ZGgeygEC2B
…ilable Smoke test on a container without codex installed surfaced a silent-failure mode: when AGENT_B returned empty output on retry the bouncer broke out of the bounce loop but still emitted "CO-EVOLVE COMPLETE", exited 0, and handed the user a half-bounced document with raw [CONTESTED]/[CLARIFY] markers embedded in the prose. Fix: - Track agent-failure state in the bounce loop (BOUNCE_FAILED + FAILED_AGENT) - Closing banner now reads "CO-EVOLVE INCOMPLETE (bounce aborted)" on failure - Loud WARNING explains the bounce ended early and the output may contain unresolved markers - Conditional HINT points the user at "--single-model <working_agent>" when they were running cross-vendor (the working agent is whichever AGENT_A/B was NOT the one that failed). Suppressed under --single-model since the user already opted in and there is no useful escape hatch to suggest. - Exit code 2 (distinct from compose-failure's exit 1, distinct from success's 0) Invariant preserved: no auto-fallback to single-model. The bouncer still fails when the partner agent is unavailable — it just fails loudly now and tells the user how to recover. Test additions: scenarios H (cross-vendor failure → exit 2 + HINT) and I (single-model failure → exit 2 without HINT). Stub now supports STUB_<AGENT>_FAILS_AFTER=N to isolate bounce-failure from compose-failure. https://claude.ai/code/session_018T89MTHJfAo1ZGgeygEC2B
Centralizes the codex launch matrix that was previously inlined across 6+
files (lib/co-evolution.sh, lab/pel/*/adapter.sh, dev-review/codex/dev-review.sh).
New helper functions in lib/codex-access.sh:
- codex_available() — exit 0 iff codex is reachable via WSL bridge or
direct PATH lookup
- codex_invoke() — drop-in for the existing invoke_codex contract;
routes through WSL→cmd.exe bridge when applicable,
falls back to native codex on PATH, otherwise
writes install hint to stderr + empty output_file
- codex_install_hint() — multi-line actionable message covering npm install,
OPENAI_API_KEY auth, WSL gotcha, and the remote-
container fallback (--single-model claude)
Skill at skills/codex-access/SKILL.md documents the launch matrix, the
install paths for native + WSL + cloud cases, and is explicit about what
this skill does NOT do (no auto-install, no faking codex with claude, no
hidden fallback).
Non-invasive integration: lib/co-evolution.sh:invoke_codex is left alone
(it already implements the inline pattern). New scripts should source
lib/codex-access.sh instead of copying the inline form. A future refactor
may collapse invoke_codex onto the helper, gated on a separate change.
Test: tests/codex-access-simulation.sh exercises all three launch-matrix
branches via PATH-shadow stubs (no real codex/cmd.exe/wslpath invoked).
7/7 scenarios pass.
https://claude.ai/code/session_018T89MTHJfAo1ZGgeygEC2B
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 02c1220007
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if [[ $# -ge 2 && -n "${2:-}" && "${2:0:1}" != "-" ]]; then | ||
| SINGLE_MODEL_AGENT="$2" |
There was a problem hiding this comment.
Do not consume the task as the optional single-model agent
When users follow the documented bare form, e.g. --single-model "Stress test this argument", this branch treats the positional task as the optional agent name and then fails because it is not claude or codex. Since the parser cannot distinguish a bare task from an optional positional agent, the advertised default-agent syntax is unusable unless the caller adds --, pipes input, or uses --single-model=claude; the bare flag should not consume the next non-flag argument.
Useful? React with 👍 / 👎.
Summary
Adds opt-in same-model co-evolution to
co-evolve-bouncer.sh, surfaces partner-agent failures loudly with an actionable HINT, and centralizes the codex launch pattern into a reusable skill + helper.--single-model [claude|codex]— pins both reviewer and composer to one agent. Defaults to bare two-role mode (no preface) based on empirical A/B testing. Opt-in only; cross-vendor (claude,codex) remains the default invocation.--persona-discipline— opt-in divergence preface. Decoupled from--single-modelafter the A/B showed the preface suppressed concrete code-grounded critique on dense technical documents (only 6 markers vs 14 for bare mode on the same paper section).CO-EVOLVE INCOMPLETE, and emits a HINT pointing at--single-model <working_agent>. Suppressed under--single-modelmode (no escape hatch to suggest).codex-accessskill +lib/codex-access.shhelper — centralizes the WSL→cmd.exe bridge pattern previously inlined across 6+ files. Three functions:codex_available,codex_invoke,codex_install_hint. Non-invasive integration;invoke_codexinlib/co-evolution.shis unchanged.Empirical findings driving the defaults
04-pilot-data.md)Hypothesis: the preface's "read as if a stranger wrote it" framing pushes the model into pure-text-critique mode and dampens its natural impulse to investigate the codebase via Read tool. Full methodology + data table in
notesforhumans.mdunder "Same-Model A/B (2026-05-24)".Invariants preserved
claude,codex). No flag = no behavior change.Test plan
tests/single-model-simulation.sh— 9/9 scenarios cover flag parsing, banner output, byte-parity for default runs, preface gating, and the new exit-2/HINT/INCOMPLETE machinery (scenarios H + I).tests/codex-access-simulation.sh— 7/7 scenarios exercise all three launch-matrix branches via PATH-shadow stubs (no real codex/cmd.exe invoked).tests/lab-routing-simulation.sh— 4/4, regression check confirms--labrouting byte-parity.--single-model claude --chain— compose → critique surfaces 4 contested + 3 clarify markers → defend resolves all → tighten polishes. Exit 0.CO-EVOLVE INCOMPLETEbanner, WARNING that markers may be unresolved, and a HINT suggesting--single-model claude.lib/codex-access.shsmoke test in codex-less env:codex_availablereturns 1,codex_invokewrites the install hint withnpm install -g @openai/codex,OPENAI_API_KEYsetup, WSL guidance, and--single-model claudefallback to stderr.Files
co-evolve-bouncer.sh— new flags, failure tracking, INCOMPLETE banner, conditional HINTtemplates/co-evolve/single-model-preface.md— new (gated by--persona-discipline)lib/codex-access.sh— new helper withcodex_available/codex_invoke/codex_install_hintskills/codex-access/SKILL.md— new skill documenting the launch matrix + install pathstests/single-model-simulation.sh— new, 9 scenariostests/codex-access-simulation.sh— new, 7 scenariosREADME.md,CLAUDE.md,notesforhumans.md— docs + empirical methodologyhttps://claude.ai/code/session_018T89MTHJfAo1ZGgeygEC2B
Generated by Claude Code