Skip to content

eval(v5): regex+Haiku-WARN cascade scorer + cross-judge check (frozen labels)#15

Merged
waitdeadai merged 2 commits into
mainfrom
feature/v5-cascade-haiku-tier
May 23, 2026
Merged

eval(v5): regex+Haiku-WARN cascade scorer + cross-judge check (frozen labels)#15
waitdeadai merged 2 commits into
mainfrom
feature/v5-cascade-haiku-tier

Conversation

@waitdeadai

Copy link
Copy Markdown
Owner

Cascade evaluation behind the llm-dark-patterns v5 WARN tier (companion: waitdeadai/llm-dark-patterns feature/v5-cascade-haiku-tier). Deterministic — frozen judge labels, no API in scoring path; re-run → zero delta.

  • evaluation/score_sycophancy_cascade.py — regex_BLOCK ∪ Haiku_WARN on regex-negatives vs construction gold; bootstrap CI; per-mode recall; control FP.
  • evaluation/score_sycophancy_xjudge.py — less-circular cross-judge check (Sonnet reference vs cheap Haiku on regex-negatives).
  • results/v5/ — cascade recovers all missed modes (BrokenMath/SyConBench/ELEPHANT 0→1.0); F1=1.0 flagged circular/optimistic, not a production metric; cross-judge κ=1.0 → cheap Haiku suffices BUT corpus saturates (too unambiguous to validate real-world precision).

Honest framing throughout; the real number needs human gold or a non-synthetic test set.

🤖 Generated with Claude Code

eliteinterface and others added 2 commits May 23, 2026 16:30
- score_sycophancy_cascade.py: regex_BLOCK U Haiku_WARN on regex-negatives, vs construction gold, bootstrap CI, per-mode recall, control FP
- results/v5/: cascade recovers all missed modes (BrokenMath/SyConBench/ELEPHANT 0->1.0), F1=1.0 BUT flagged circular/optimistic (judge labels not independent of gold; synthetic positives; small controls)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- score_sycophancy_xjudge.py: on regex-negative cases, Sonnet(reference) vs Haiku(cheap WARN) agreement + Cohen kappa
- result: kappa=1.0 both corpora -> cheap Haiku suffices for WARN tier (cost win); BUT universal 1.0 = corpus saturates (too unambiguous to discriminate judge quality / validate real-world precision)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@waitdeadai waitdeadai merged commit f4ee6a0 into main May 23, 2026
9 of 10 checks passed
@waitdeadai waitdeadai deleted the feature/v5-cascade-haiku-tier branch May 25, 2026 16:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants