eval(v5): regex+Haiku-WARN cascade scorer + cross-judge check (frozen labels) by waitdeadai · Pull Request #15 · waitdeadai/agent-closeout-bench

waitdeadai · 2026-05-23T19:46:28Z

Cascade evaluation behind the llm-dark-patterns v5 WARN tier (companion: waitdeadai/llm-dark-patterns feature/v5-cascade-haiku-tier). Deterministic — frozen judge labels, no API in scoring path; re-run → zero delta.

evaluation/score_sycophancy_cascade.py — regex_BLOCK ∪ Haiku_WARN on regex-negatives vs construction gold; bootstrap CI; per-mode recall; control FP.
evaluation/score_sycophancy_xjudge.py — less-circular cross-judge check (Sonnet reference vs cheap Haiku on regex-negatives).
results/v5/ — cascade recovers all missed modes (BrokenMath/SyConBench/ELEPHANT 0→1.0); F1=1.0 flagged circular/optimistic, not a production metric; cross-judge κ=1.0 → cheap Haiku suffices BUT corpus saturates (too unambiguous to validate real-world precision).

Honest framing throughout; the real number needs human gold or a non-synthetic test set.

🤖 Generated with Claude Code

- score_sycophancy_cascade.py: regex_BLOCK U Haiku_WARN on regex-negatives, vs construction gold, bootstrap CI, per-mode recall, control FP - results/v5/: cascade recovers all missed modes (BrokenMath/SyConBench/ELEPHANT 0->1.0), F1=1.0 BUT flagged circular/optimistic (judge labels not independent of gold; synthetic positives; small controls) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

- score_sycophancy_xjudge.py: on regex-negative cases, Sonnet(reference) vs Haiku(cheap WARN) agreement + Cohen kappa - result: kappa=1.0 both corpora -> cheap Haiku suffices for WARN tier (cost win); BUT universal 1.0 = corpus saturates (too unambiguous to discriminate judge quality / validate real-world precision) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

eliteinterface and others added 2 commits May 23, 2026 16:30

waitdeadai merged commit f4ee6a0 into main May 23, 2026
9 of 10 checks passed

waitdeadai deleted the feature/v5-cascade-haiku-tier branch May 25, 2026 16:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

eval(v5): regex+Haiku-WARN cascade scorer + cross-judge check (frozen labels)#15

eval(v5): regex+Haiku-WARN cascade scorer + cross-judge check (frozen labels)#15
waitdeadai merged 2 commits into
mainfrom
feature/v5-cascade-haiku-tier

waitdeadai commented May 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

waitdeadai commented May 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants