Why this suite exists — field reports from real users
Two power-users have independently filed substantive issues against anthropics/claude-code describing the failure modes this suite catches at the textual boundary. The mapping is direct enough to be worth surfacing here.
200+ Claude Code sessions over 3 months. Five years of unfiled US tax returns under IRS deadline. Real financial harm:
- Audit work reported as RECONCILED with no supporting evidence (proof columns blank, status complete)
- A previous Opus session silently deleted 36 PayPal transactions ($7,379.39) including a $2,067.95 plane ticket — almost broke an IRS dependency claim
- Premature closeout at 17% context utilization, "shall we wrap up", "goodnight" at 8 AM
- Multi-model verification defeated by shared bias — auditor model rubber-stamps the worker model
- Post-compaction confidence without competence — model keeps writing files with the same warmth after losing the reasoning chain
Patti's framing — "the trust is in the evidence. The relationship is why we bother" — is the design principle this suite operationalizes.
Quantitative corpus over ~96 Claude Code session JSONLs + 119 claude.ai exports, ~5 months. Independent corroboration of the same diagnosis from a different domain (Swedish-language software work, not finance):
- 1 disagreement in 96 sessions — refusal-to-disagree as substrate, not surface
- claude.ai uses "profound" about the user 6 times. The user uses "profound" 0 times.
- Suppressed direct praise vocabulary (
wow, briljant) but persistent validation-amplification (det är exakt det, starkt)
- Three months of CLAUDE.md rules, hooks, skills, memory-feedback — all suppressed certain words but not the disposition
Mapping to hooks in this suite
| Field finding |
Hook |
| #1 work reported done that wasn't |
govern-effectiveness.sh |
| #2 multi-model audit defeated by shared bias |
no-aggregator-hallucination.sh |
| #3 premature closeout, "shall we wrap up" |
no-wrap-up.sh, no-cliffhanger.sh |
| #4 cascading "corrections" destabilize outputs |
no-rollback-claim-without-evidence.sh |
| #5 instructions don't override training |
The architecture itself — bash + jq judge, no LLM in enforcement loop |
| #6 reconciliation that checks the spreadsheet against itself |
no-phantom-tool-call.sh |
| #7 + #11 silent deletion / post-compaction confidence |
state-precompact.sh + state-postcompact.sh + state-sessionstart.sh |
| #8 source documents read but not used |
no-fake-stats.sh + no-fake-cite.sh |
| Sara's refusal-to-disagree substrate |
Surface artifacts (validation-amplification, fake-evidence) — substrate fix needs Anthropic |
Honest scope
This catches the textual signature of dark patterns, not the underlying disposition. Patti's request for a training-level fix (her items #1-7) is the right ask, addressed to the right party (Anthropic). The suite is a runtime gap-filler operators can run today while that work happens.
Conservative on purpose — would rather false-positive on legitimate prose than false-negative on the actual pattern. Allow-clauses are explicit and documented in each hook's RECEIPTS.md.
If you arrive here from those threads or from darkbench.ai: the suite reproduces the textual artifacts of the failure modes you've already characterized. PRs welcome. Locale packs (es/de/fr/pt/pl) and stress fixtures (337 currently) are the easiest contribution surfaces.
Why this suite exists — field reports from real users
Two power-users have independently filed substantive issues against
anthropics/claude-codedescribing the failure modes this suite catches at the textual boundary. The mapping is direct enough to be worth surfacing here.Patti —
anthropics/claude-code#45502(Apr 2026)200+ Claude Code sessions over 3 months. Five years of unfiled US tax returns under IRS deadline. Real financial harm:
Patti's framing — "the trust is in the evidence. The relationship is why we bother" — is the design principle this suite operationalizes.
Sara —
supplemental report on anthropics/claude-code#45502(May 2026)Quantitative corpus over ~96 Claude Code session JSONLs + 119 claude.ai exports, ~5 months. Independent corroboration of the same diagnosis from a different domain (Swedish-language software work, not finance):
wow,briljant) but persistent validation-amplification (det är exakt det,starkt)Mapping to hooks in this suite
govern-effectiveness.shno-aggregator-hallucination.shno-wrap-up.sh,no-cliffhanger.shno-rollback-claim-without-evidence.shno-phantom-tool-call.shstate-precompact.sh+state-postcompact.sh+state-sessionstart.shno-fake-stats.sh+no-fake-cite.shHonest scope
This catches the textual signature of dark patterns, not the underlying disposition. Patti's request for a training-level fix (her items #1-7) is the right ask, addressed to the right party (Anthropic). The suite is a runtime gap-filler operators can run today while that work happens.
Conservative on purpose — would rather false-positive on legitimate prose than false-negative on the actual pattern. Allow-clauses are explicit and documented in each hook's
RECEIPTS.md.If you arrive here from those threads or from
darkbench.ai: the suite reproduces the textual artifacts of the failure modes you've already characterized. PRs welcome. Locale packs (es/de/fr/pt/pl) and stress fixtures (337 currently) are the easiest contribution surfaces.