eval(no-count-drift): committed recall probe (in-scope coverage 25/25) by waitdeadai · Pull Request #28 · waitdeadai/llm-dark-patterns

waitdeadai · 2026-05-25T21:35:48Z

Follow-up to #27. Adds a committed recall probe (evaluation/v6/recall_probe.jsonl, 25 genuine count-drift positives spanning phrasing variety) and folds recall into the scorer/RESULTS. Result: 25/25 in-scope recall alongside the 0/988 independent false positives from #27. Data-backed conclusion: no LLM-judge tier or engine is warranted — the deterministic tier covers the common phrasing space at full precision; out-of-scope forms stay abstained by design. No hook logic changed.

🤖 Generated with Claude Code

… 25/25) Answers the open question from the v6 merge: precision was proven (0/988 independent FP) but recall was unmeasured because the MAD corpus barely contains the target pattern. Adds evaluation/v6/recall_probe.jsonl — 25 genuine count-drift positives authored to span phrasing variety (digit/word lead-ins, number-first headings, prose prefixes, "all N passed", "there/here are N", numbered lists, "a dozen", N-of-M, fraction/percent) — and folds recall into score_count_drift.py / RESULTS.md. Result: 25/25 in-scope recall, alongside 0/988 independent FP. Conclusion (data-backed): no LLM-judge tier or Rust engine is warranted — the deterministic tier already covers the common phrasing space at full precision. Out-of-scope forms remain abstained by design. No hook logic changed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Bumps the plugin version and description to include no-count-drift, the count-vs-enumeration self-consistency gate (MAST FM-3.2) merged in #27/#28. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

waitdeadai merged commit a275d3a into main May 25, 2026
2 checks passed

waitdeadai deleted the feature/v6-count-drift-recall branch May 25, 2026 21:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

eval(no-count-drift): committed recall probe (in-scope coverage 25/25)#28

eval(no-count-drift): committed recall probe (in-scope coverage 25/25)#28
waitdeadai merged 1 commit into
mainfrom
feature/v6-count-drift-recall

waitdeadai commented May 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

waitdeadai commented May 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants