Skip to content

feat(hooks): no-count-drift — count-vs-enumeration self-consistency gate (MAST FM-3.2)#27

Merged
waitdeadai merged 2 commits into
mainfrom
feature/v6-count-drift
May 25, 2026
Merged

feat(hooks): no-count-drift — count-vs-enumeration self-consistency gate (MAST FM-3.2)#27
waitdeadai merged 2 commits into
mainfrom
feature/v6-count-drift

Conversation

@waitdeadai

Copy link
Copy Markdown
Owner

no-count-drift — count-vs-enumeration self-consistency gate

A deterministic Stop/SubagentStop hook that blocks a count stated in a message when it contradicts the message's own enumeration or arithmetic. Proposed by @beq00000 (Brendan Quinn) on recognition-without-arrest-corpus#9 — "a final-pass diff between every count-claim in prose and its enumeration or table source," a verification gate that lives outside the writing agent's recall.

Why it's distinct from no-fake-stats

Orthogonal axes (the factuality-vs-faithfulness split, HalluLens ACL 2025):

  • no-fake-stats = factuality: a precise number lacks a citation. Ignores small integers by design.
  • no-count-drift = self-consistency / faithfulness (MAST FM-3.2 "no or incomplete verification"): a stated count contradicts the artifact's own content. A citation cannot repair an internal mismatch, and the common case ("six findings:" then five bullets; "all 5 tests pass" then four listed) uses the small integers no-fake-stats skips.

Landscape check (deepresearch): no existing hook in llm-dark-patterns / agent-closeout-bench / cc-safe-setup does in-message count-vs-enumeration. This fills a real gap, not a duplicate.

Design — deterministic, abstain-on-ambiguity

Counting lives in pure-stdlib Python (lib/count_drift.py, no deps) because counting is a rule-based-symbolic strength and an LLM weakness whose errors are self-consistent on resample ("Sequential Enumeration in LLMs"; "Too Consistent to Detect"). Three detectors, each abstaining when scope is ambiguous:

  • R1 fraction/percent recompute: 9/10 = 80% → blocked (it's 90%); 2/35 = 5.7% passes (correct rounding).
  • R2 "N of M" bound: 5 of 3 → blocked.
  • R3 headline count vs a single immediately-adjacent enumeration (list or table), top-level/depth-aware; abstains on 0 or ≥2 candidate enumerations, label/section indices ("Section 3"), nested-colon lead-ins ("3 reasons: the top 2 are:"), vague cardinality, and approximation markers.

It is a blocking gate, so it fires only on unambiguous self-contained mismatches and otherwise passes (fail-open without jq/python3).

Verification

  • Precision 1.000 / 0 false positives on a 15-case adversarial negative set (the negatives are authored to break it: section indices, label words, nested-colon traps, approx markers, ambiguous multi-list scope, nested-list depth).
  • tests/test-count-drift.sh9/9 PASS (block/pass/abstain, fail-open, re-entrancy, determinism).
  • hooks/hooks.json valid; hook fires end-to-end (exit 2) via CLAUDE_PLUGIN_ROOT.

Honesty caveat (in evaluation/v6/RESULTS.md)

F1 = 1.000 here is a co-evolved-corpus number — same author wrote the detector and the fixtures — not a field-generalization claim, and would inflate if cited as such. The load-bearing, generalizable metric is precision / zero-false-positives on the adversarial negatives. Per the statcheck precedent (deterministic internal-consistency check: ~96–100% specificity, ~61% recall in the wild), real-world recall will be far below 1.0, bounded by structural-extraction coverage. That trade is intentional: abstain rather than false-fire.

Files

lib/count_drift.py, hooks/no-count-drift.sh, evaluation/v6/{SPEC.md,RESULTS.md,fixtures.jsonl,score_count_drift.py}, tests/test-count-drift.sh; wired into hooks/hooks.json (Stop + SubagentStop); README.md MAST table + catalog updated.

🤖 Generated with Claude Code

eliteinterface and others added 2 commits May 25, 2026 18:01
…cy gate

A deterministic Stop/SubagentStop hook that blocks a count stated in a message
when it contradicts the message's own enumeration or arithmetic (e.g. "six
findings:" + a 5-item list; "9/10 = 80%"). Self-consistency / MAST FM-3.2 axis,
orthogonal to no-fake-stats (citation presence).

Counting lives in pure-stdlib Python (lib/count_drift.py) because counting is a
symbolic strength and an LLM weakness. Three detectors (fraction/percent
recompute, "N of M" bound, headline-vs-single-enumeration), each abstaining on
ambiguous scope. High-precision blocking gate; fail-open without jq/python3.

Verified: precision 1.000 / 0 false positives on 15 adversarial negatives;
harness 9/9. F1=1.0 on the hand-authored corpus is a co-evolved-corpus number
(caveated in RESULTS.md), not a field-generalization claim.

Proposed by @beq00000 (Brendan Quinn) on recognition-without-arrest-corpus#9.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ndent MAD eval

Testing count_drift over corpora it was NOT authored against — 660 real LLM
responses in evaluation/raw_results.jsonl + 328 stress fixtures for the other
hooks — surfaced 17 false positives the hand-authored (co-evolved) fixtures
could not see:
  1. R3 lead-in was too loose: "...favor one side. Instead:" / "one of four
     quadrants:" matched because a number+noun merely co-occurred with a
     sentence-colon on the line. Fix: colon must be adjacent to the noun phrase
     (no intervening punctuation/number), count >= 2, lists only (a 2x2 table
     has 4 cells but 2 rows).
  2. Number words lacked a leading \b, so "of-ten"/"writ-ten" parsed as "ten".
     Fix: \b before the number in the lead-in.

Independent false-positive rate now 0 / 988 texts. The two FP classes are locked
in as regression negatives in fixtures.jsonl. Adds evaluation/v6/independent_eval.py
(reproducible non-circular check) and folds its result into RESULTS.md, so the
load-bearing precision number is the independent one, not the hand-authored F1.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@waitdeadai

Copy link
Copy Markdown
Owner Author

Follow-up before merge: ran an independent, non-circular precision check.

Tested count_drift over 988 texts it was NOT authored against — 660 real LLM model_response/prompt_text from evaluation/raw_results.jsonl (the DarkBench/MAD eval inputs) plus 328 stress fixtures written for the other hooks. The first pass surfaced 17 false positives the hand-authored fixtures could not see (the co-evolved-corpus blind spot):

  • R3 lead-in too loose — "...favor one side. Instead:", "one of four quadrants:" matched because a number+noun merely co-occurred with a sentence colon on the line.
  • number words lacked a leading word boundary, so "of-ten" / "writ-ten" parsed as "ten".

Both fixed in afb27d3 (colon must be adjacent to the noun phrase, count ≥ 2, lists-only, \b before number words), and locked in as regression negatives. Independent false-positive rate is now 0 / 988. evaluation/v6/independent_eval.py makes it reproducible, and RESULTS.md now leads with this non-circular number rather than the hand-authored F1 (which is a co-evolved-corpus 1.0 and caveated as such). CI green.

@waitdeadai waitdeadai merged commit 41402e0 into main May 25, 2026
2 checks passed
@waitdeadai waitdeadai deleted the feature/v6-count-drift branch May 25, 2026 21:19
waitdeadai pushed a commit that referenced this pull request May 25, 2026
Bumps the plugin version and description to include no-count-drift, the
count-vs-enumeration self-consistency gate (MAST FM-3.2) merged in #27/#28.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants