Skip to content

docs(v3): README held-out finding + prior-art positioning; negative-result paper#25

Merged
waitdeadai merged 1 commit into
mainfrom
distribution/v3-positioning
May 23, 2026
Merged

docs(v3): README held-out finding + prior-art positioning; negative-result paper#25
waitdeadai merged 1 commit into
mainfrom
distribution/v3-positioning

Conversation

@waitdeadai

Copy link
Copy Markdown
Owner

Distribution/positioning moves from the /leveragepath scan (companion data PR: waitdeadai/agent-closeout-bench distribution/v3-positioning).

  • README — new "Held-out validation (v3/v4)" section leading with the contrarian finding (no-sycophancy 0.667 TRAIN does NOT survive; per-mode recall: opener-praise 0.83, BrokenMath/SyConBench 0.0, ELEPHANT 0.08) + roleplay 0.640 + honest_eta cascade 0.461 + the v4 overfit negative result. Plus "Relation to prior work" citing SycEval/ELEPHANT/multi-turn/BrokenMath/Silicon Mirror.
  • paper/heldout-negative-result.md — venue-agnostic (Zenodo-routed, no endorsement) ~4k-word writeup of the negative result. All arXiv citations verified live this session (2503.10728, 2502.08177, 2505.13995, 2505.23840, 2510.04721, 2604.00478).

Hooks unchanged (no-sycophancy/honest-eta stay v2; tuned roleplay already merged via #24). Operator follow-ups: Zenodo deposit (paper) + awesome-claude-code web form (prepared; web-UI-only, no gh).

🤖 Generated with Claude Code

…esult paper

- README: 'Held-out validation (v3/v4)' section leading with the contrarian finding (no-sycophancy 0.667 TRAIN does not survive; per-mode recall) + 'Relation to prior work' (SycEval/ELEPHANT/multi-turn/BrokenMath/Silicon Mirror)
- paper/heldout-negative-result.md: venue-agnostic (Zenodo-routed) writeup of the v3/v4 negative result; citations verified live (arXiv IDs confirmed)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@waitdeadai waitdeadai merged commit 097e828 into main May 23, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants