feat(hooks): no-count-drift — count-vs-enumeration self-consistency gate (MAST FM-3.2) by waitdeadai · Pull Request #27 · waitdeadai/llm-dark-patterns

waitdeadai · 2026-05-25T21:01:51Z

`no-count-drift` — count-vs-enumeration self-consistency gate

A deterministic Stop/SubagentStop hook that blocks a count stated in a message when it contradicts the message's own enumeration or arithmetic. Proposed by @beq00000 (Brendan Quinn) on recognition-without-arrest-corpus#9 — "a final-pass diff between every count-claim in prose and its enumeration or table source," a verification gate that lives outside the writing agent's recall.

Why it's distinct from `no-fake-stats`

Orthogonal axes (the factuality-vs-faithfulness split, HalluLens ACL 2025):

no-fake-stats = factuality: a precise number lacks a citation. Ignores small integers by design.
no-count-drift = self-consistency / faithfulness (MAST FM-3.2 "no or incomplete verification"): a stated count contradicts the artifact's own content. A citation cannot repair an internal mismatch, and the common case ("six findings:" then five bullets; "all 5 tests pass" then four listed) uses the small integers no-fake-stats skips.

Landscape check (deepresearch): no existing hook in llm-dark-patterns / agent-closeout-bench / cc-safe-setup does in-message count-vs-enumeration. This fills a real gap, not a duplicate.

Design — deterministic, abstain-on-ambiguity

Counting lives in pure-stdlib Python (lib/count_drift.py, no deps) because counting is a rule-based-symbolic strength and an LLM weakness whose errors are self-consistent on resample ("Sequential Enumeration in LLMs"; "Too Consistent to Detect"). Three detectors, each abstaining when scope is ambiguous:

R1 fraction/percent recompute: 9/10 = 80% → blocked (it's 90%); 2/35 = 5.7% passes (correct rounding).
R2 "N of M" bound: 5 of 3 → blocked.
R3 headline count vs a single immediately-adjacent enumeration (list or table), top-level/depth-aware; abstains on 0 or ≥2 candidate enumerations, label/section indices ("Section 3"), nested-colon lead-ins ("3 reasons: the top 2 are:"), vague cardinality, and approximation markers.

It is a blocking gate, so it fires only on unambiguous self-contained mismatches and otherwise passes (fail-open without jq/python3).

Verification

Precision 1.000 / 0 false positives on a 15-case adversarial negative set (the negatives are authored to break it: section indices, label words, nested-colon traps, approx markers, ambiguous multi-list scope, nested-list depth).
tests/test-count-drift.sh → 9/9 PASS (block/pass/abstain, fail-open, re-entrancy, determinism).
hooks/hooks.json valid; hook fires end-to-end (exit 2) via CLAUDE_PLUGIN_ROOT.

Honesty caveat (in `evaluation/v6/RESULTS.md`)

F1 = 1.000 here is a co-evolved-corpus number — same author wrote the detector and the fixtures — not a field-generalization claim, and would inflate if cited as such. The load-bearing, generalizable metric is precision / zero-false-positives on the adversarial negatives. Per the statcheck precedent (deterministic internal-consistency check: ~96–100% specificity, ~61% recall in the wild), real-world recall will be far below 1.0, bounded by structural-extraction coverage. That trade is intentional: abstain rather than false-fire.

Files

lib/count_drift.py, hooks/no-count-drift.sh, evaluation/v6/{SPEC.md,RESULTS.md,fixtures.jsonl,score_count_drift.py}, tests/test-count-drift.sh; wired into hooks/hooks.json (Stop + SubagentStop); README.md MAST table + catalog updated.

🤖 Generated with Claude Code

@beq00000

…cy gate A deterministic Stop/SubagentStop hook that blocks a count stated in a message when it contradicts the message's own enumeration or arithmetic (e.g. "six findings:" + a 5-item list; "9/10 = 80%"). Self-consistency / MAST FM-3.2 axis, orthogonal to no-fake-stats (citation presence). Counting lives in pure-stdlib Python (lib/count_drift.py) because counting is a symbolic strength and an LLM weakness. Three detectors (fraction/percent recompute, "N of M" bound, headline-vs-single-enumeration), each abstaining on ambiguous scope. High-precision blocking gate; fail-open without jq/python3. Verified: precision 1.000 / 0 false positives on 15 adversarial negatives; harness 9/9. F1=1.0 on the hand-authored corpus is a co-evolved-corpus number (caveated in RESULTS.md), not a field-generalization claim. Proposed by @beq00000 (Brendan Quinn) on recognition-without-arrest-corpus#9. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ndent MAD eval Testing count_drift over corpora it was NOT authored against — 660 real LLM responses in evaluation/raw_results.jsonl + 328 stress fixtures for the other hooks — surfaced 17 false positives the hand-authored (co-evolved) fixtures could not see: 1. R3 lead-in was too loose: "...favor one side. Instead:" / "one of four quadrants:" matched because a number+noun merely co-occurred with a sentence-colon on the line. Fix: colon must be adjacent to the noun phrase (no intervening punctuation/number), count >= 2, lists only (a 2x2 table has 4 cells but 2 rows). 2. Number words lacked a leading \b, so "of-ten"/"writ-ten" parsed as "ten". Fix: \b before the number in the lead-in. Independent false-positive rate now 0 / 988 texts. The two FP classes are locked in as regression negatives in fixtures.jsonl. Adds evaluation/v6/independent_eval.py (reproducible non-circular check) and folds its result into RESULTS.md, so the load-bearing precision number is the independent one, not the hand-authored F1. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

waitdeadai · 2026-05-25T21:19:28Z

Follow-up before merge: ran an independent, non-circular precision check.

Tested count_drift over 988 texts it was NOT authored against — 660 real LLM model_response/prompt_text from evaluation/raw_results.jsonl (the DarkBench/MAD eval inputs) plus 328 stress fixtures written for the other hooks. The first pass surfaced 17 false positives the hand-authored fixtures could not see (the co-evolved-corpus blind spot):

R3 lead-in too loose — "...favor one side. Instead:", "one of four quadrants:" matched because a number+noun merely co-occurred with a sentence colon on the line.
number words lacked a leading word boundary, so "of-ten" / "writ-ten" parsed as "ten".

Both fixed in afb27d3 (colon must be adjacent to the noun phrase, count ≥ 2, lists-only, \b before number words), and locked in as regression negatives. Independent false-positive rate is now 0 / 988. evaluation/v6/independent_eval.py makes it reproducible, and RESULTS.md now leads with this non-circular number rather than the hand-authored F1 (which is a co-evolved-corpus 1.0 and caveated as such). CI green.

Bumps the plugin version and description to include no-count-drift, the count-vs-enumeration self-consistency gate (MAST FM-3.2) merged in #27/#28. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

eliteinterface and others added 2 commits May 25, 2026 18:01

waitdeadai merged commit 41402e0 into main May 25, 2026
2 checks passed

waitdeadai deleted the feature/v6-count-drift branch May 25, 2026 21:19

This was referenced May 25, 2026

case: RUSE applied to memory-relevance judgment under work-character shift beq00000/recognition-without-arrest-corpus#9

Open

eval(no-count-drift): committed recall probe (in-scope coverage 25/25) #28

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(hooks): no-count-drift — count-vs-enumeration self-consistency gate (MAST FM-3.2)#27

feat(hooks): no-count-drift — count-vs-enumeration self-consistency gate (MAST FM-3.2)#27
waitdeadai merged 2 commits into
mainfrom
feature/v6-count-drift

waitdeadai commented May 25, 2026

Uh oh!

waitdeadai commented May 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

waitdeadai commented May 25, 2026

no-count-drift — count-vs-enumeration self-consistency gate

Why it's distinct from no-fake-stats

Design — deterministic, abstain-on-ambiguity

Verification

Honesty caveat (in evaluation/v6/RESULTS.md)

Files

Uh oh!

waitdeadai commented May 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

`no-count-drift` — count-vs-enumeration self-consistency gate

Why it's distinct from `no-fake-stats`

Honesty caveat (in `evaluation/v6/RESULTS.md`)