Single-model co-evolution: --single-model flag, loud failure UX, codex-access skill by alanshurafa · Pull Request #30 · alanshurafa/co-evolution

alanshurafa · 2026-05-27T17:01:06Z

Summary

Adds opt-in same-model co-evolution to co-evolve-bouncer.sh, surfaces partner-agent failures loudly with an actionable HINT, and centralizes the codex launch pattern into a reusable skill + helper.

--single-model [claude|codex] — pins both reviewer and composer to one agent. Defaults to bare two-role mode (no preface) based on empirical A/B testing. Opt-in only; cross-vendor (claude,codex) remains the default invocation.
--persona-discipline — opt-in divergence preface. Decoupled from --single-model after the A/B showed the preface suppressed concrete code-grounded critique on dense technical documents (only 6 markers vs 14 for bare mode on the same paper section).
Loud failure UX — when a partner agent returns empty output on retry, the bouncer now exits 2, logs CO-EVOLVE INCOMPLETE, and emits a HINT pointing at --single-model <working_agent>. Suppressed under --single-model mode (no escape hatch to suggest).
codex-access skill + lib/codex-access.sh helper — centralizes the WSL→cmd.exe bridge pattern previously inlined across 6+ files. Three functions: codex_available, codex_invoke, codex_install_hint. Non-invasive integration; invoke_codex in lib/co-evolution.sh is unchanged.

Empirical findings driving the defaults

Input	Bare two-role markers (pass 1)	With-preface markers (pass 1)	Winner
Open prompt (~30 words)	4	7	preface (more critique, marginal quality)
Dense paper section (1277 words, `04-pilot-data.md`)	14	6	bare (concrete code-grounded findings vs abstract objections)

Hypothesis: the preface's "read as if a stranger wrote it" framing pushes the model into pure-text-critique mode and dampens its natural impulse to investigate the codebase via Read tool. Full methodology + data table in notesforhumans.md under "Same-Model A/B (2026-05-24)".

Invariants preserved

Default is cross-vendor adversarial review (claude,codex). No flag = no behavior change.
Single-model only via explicit flag. No auto-fallback to single-model when codex (or any partner) is unavailable. The bouncer fails loudly and points the user at the right escape hatch instead.

Test plan

tests/single-model-simulation.sh — 9/9 scenarios cover flag parsing, banner output, byte-parity for default runs, preface gating, and the new exit-2/HINT/INCOMPLETE machinery (scenarios H + I).
tests/codex-access-simulation.sh — 7/7 scenarios exercise all three launch-matrix branches via PATH-shadow stubs (no real codex/cmd.exe invoked).
tests/lab-routing-simulation.sh — 4/4, regression check confirms --lab routing byte-parity.
Real-model smoke test with --single-model claude --chain — compose → critique surfaces 4 contested + 3 clarify markers → defend resolves all → tighten polishes. Exit 0.
Real-world failure smoke test in this remote container (codex CLI unavailable): default invocation produces exit 2, CO-EVOLVE INCOMPLETE banner, WARNING that markers may be unresolved, and a HINT suggesting --single-model claude.
lib/codex-access.sh smoke test in codex-less env: codex_available returns 1, codex_invoke writes the install hint with npm install -g @openai/codex, OPENAI_API_KEY setup, WSL guidance, and --single-model claude fallback to stderr.

Files

co-evolve-bouncer.sh — new flags, failure tracking, INCOMPLETE banner, conditional HINT
templates/co-evolve/single-model-preface.md — new (gated by --persona-discipline)
lib/codex-access.sh — new helper with codex_available / codex_invoke / codex_install_hint
skills/codex-access/SKILL.md — new skill documenting the launch matrix + install paths
tests/single-model-simulation.sh — new, 9 scenarios
tests/codex-access-simulation.sh — new, 7 scenarios
README.md, CLAUDE.md, notesforhumans.md — docs + empirical methodology

https://claude.ai/code/session_018T89MTHJfAo1ZGgeygEC2B

Generated by Claude Code

Pins both reviewer and composer onto one agent (claude or codex) when a second model isn't available, and prepends a persona-discipline preface asking the model to deliberately diverge from its own prior turn rather than nod along to shared-weight bias. Default (cross-model) runs are byte-identical — flag is opt-in only. Hermetic test exercises arg-parser + preamble-builder paths. https://claude.ai/code/session_018T89MTHJfAo1ZGgeygEC2B

A/B test on a dense paper section (2026-05-24) showed the persona-discipline preface SUPPRESSED concrete code-grounded critique — the bare same-model baseline raised 14 markers (citing specific files and line numbers) while the preface variant raised only 6 abstract methodological objections on the same input. Hypothesis: the preface's "read as if a stranger wrote it" + "suppress default voice" framing pushes the model into pure-text-critique mode and dampens its natural impulse to investigate the codebase via Read tool. Change: --single-model now defaults to bare two-role mode (no preface). The preface is preserved as opt-in via the new --persona-discipline flag, which still suits compose-then-bounce of one's own draft (where shared-author bias actually applies — that case wasn't isolated in the A/B). Test scenario F flipped from "preface present by default" to "preface absent by default"; scenario G added to cover the opt-in wiring. Empirical methodology and findings logged in notesforhumans.md. https://claude.ai/code/session_018T89MTHJfAo1ZGgeygEC2B

…ilable Smoke test on a container without codex installed surfaced a silent-failure mode: when AGENT_B returned empty output on retry the bouncer broke out of the bounce loop but still emitted "CO-EVOLVE COMPLETE", exited 0, and handed the user a half-bounced document with raw [CONTESTED]/[CLARIFY] markers embedded in the prose. Fix: - Track agent-failure state in the bounce loop (BOUNCE_FAILED + FAILED_AGENT) - Closing banner now reads "CO-EVOLVE INCOMPLETE (bounce aborted)" on failure - Loud WARNING explains the bounce ended early and the output may contain unresolved markers - Conditional HINT points the user at "--single-model <working_agent>" when they were running cross-vendor (the working agent is whichever AGENT_A/B was NOT the one that failed). Suppressed under --single-model since the user already opted in and there is no useful escape hatch to suggest. - Exit code 2 (distinct from compose-failure's exit 1, distinct from success's 0) Invariant preserved: no auto-fallback to single-model. The bouncer still fails when the partner agent is unavailable — it just fails loudly now and tells the user how to recover. Test additions: scenarios H (cross-vendor failure → exit 2 + HINT) and I (single-model failure → exit 2 without HINT). Stub now supports STUB_<AGENT>_FAILS_AFTER=N to isolate bounce-failure from compose-failure. https://claude.ai/code/session_018T89MTHJfAo1ZGgeygEC2B

Centralizes the codex launch matrix that was previously inlined across 6+ files (lib/co-evolution.sh, lab/pel/*/adapter.sh, dev-review/codex/dev-review.sh). New helper functions in lib/codex-access.sh: - codex_available() — exit 0 iff codex is reachable via WSL bridge or direct PATH lookup - codex_invoke() — drop-in for the existing invoke_codex contract; routes through WSL→cmd.exe bridge when applicable, falls back to native codex on PATH, otherwise writes install hint to stderr + empty output_file - codex_install_hint() — multi-line actionable message covering npm install, OPENAI_API_KEY auth, WSL gotcha, and the remote- container fallback (--single-model claude) Skill at skills/codex-access/SKILL.md documents the launch matrix, the install paths for native + WSL + cloud cases, and is explicit about what this skill does NOT do (no auto-install, no faking codex with claude, no hidden fallback). Non-invasive integration: lib/co-evolution.sh:invoke_codex is left alone (it already implements the inline pattern). New scripts should source lib/codex-access.sh instead of copying the inline form. A future refactor may collapse invoke_codex onto the helper, gated on a separate change. Test: tests/codex-access-simulation.sh exercises all three launch-matrix branches via PATH-shadow stubs (no real codex/cmd.exe/wslpath invoked). 7/7 scenarios pass. https://claude.ai/code/session_018T89MTHJfAo1ZGgeygEC2B

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 02c1220007

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-27T17:03:18Z

+      if [[ $# -ge 2 && -n "${2:-}" && "${2:0:1}" != "-" ]]; then
+        SINGLE_MODEL_AGENT="$2"


Do not consume the task as the optional single-model agent

When users follow the documented bare form, e.g. --single-model "Stress test this argument", this branch treats the positional task as the optional agent name and then fails because it is not claude or codex. Since the parser cannot distinguish a bare task from an optional positional agent, the advertised default-agent syntax is unusable unless the caller adds --, pipes input, or uses --single-model=claude; the bare flag should not consume the next non-flag argument.

Useful? React with 👍 / 👎.

claude added 4 commits May 24, 2026 04:13

chatgpt-codex-connector Bot reviewed May 27, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Single-model co-evolution: --single-model flag, loud failure UX, codex-access skill#30

Single-model co-evolution: --single-model flag, loud failure UX, codex-access skill#30
alanshurafa wants to merge 4 commits into
masterfrom
claude/single-model-test-strategies-i3uKx

alanshurafa commented May 27, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		if [[ $# -ge 2 && -n "${2:-}" && "${2:0:1}" != "-" ]]; then
		SINGLE_MODEL_AGENT="$2"

Conversation

alanshurafa commented May 27, 2026

Summary

Empirical findings driving the defaults

Invariants preserved

Test plan

Files

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 27, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants