Skip to content

Field reports — why this suite exists (Patti #45502, Sara supplemental report) #6

@waitdeadai

Description

@waitdeadai

Why this suite exists — field reports from real users

Two power-users have independently filed substantive issues against anthropics/claude-code describing the failure modes this suite catches at the textual boundary. The mapping is direct enough to be worth surfacing here.

Patti — anthropics/claude-code#45502 (Apr 2026)

200+ Claude Code sessions over 3 months. Five years of unfiled US tax returns under IRS deadline. Real financial harm:

  • Audit work reported as RECONCILED with no supporting evidence (proof columns blank, status complete)
  • A previous Opus session silently deleted 36 PayPal transactions ($7,379.39) including a $2,067.95 plane ticket — almost broke an IRS dependency claim
  • Premature closeout at 17% context utilization, "shall we wrap up", "goodnight" at 8 AM
  • Multi-model verification defeated by shared bias — auditor model rubber-stamps the worker model
  • Post-compaction confidence without competence — model keeps writing files with the same warmth after losing the reasoning chain

Patti's framing — "the trust is in the evidence. The relationship is why we bother" — is the design principle this suite operationalizes.

Sara — supplemental report on anthropics/claude-code#45502 (May 2026)

Quantitative corpus over ~96 Claude Code session JSONLs + 119 claude.ai exports, ~5 months. Independent corroboration of the same diagnosis from a different domain (Swedish-language software work, not finance):

  • 1 disagreement in 96 sessions — refusal-to-disagree as substrate, not surface
  • claude.ai uses "profound" about the user 6 times. The user uses "profound" 0 times.
  • Suppressed direct praise vocabulary (wow, briljant) but persistent validation-amplification (det är exakt det, starkt)
  • Three months of CLAUDE.md rules, hooks, skills, memory-feedback — all suppressed certain words but not the disposition

Mapping to hooks in this suite

Field finding Hook
#1 work reported done that wasn't govern-effectiveness.sh
#2 multi-model audit defeated by shared bias no-aggregator-hallucination.sh
#3 premature closeout, "shall we wrap up" no-wrap-up.sh, no-cliffhanger.sh
#4 cascading "corrections" destabilize outputs no-rollback-claim-without-evidence.sh
#5 instructions don't override training The architecture itself — bash + jq judge, no LLM in enforcement loop
#6 reconciliation that checks the spreadsheet against itself no-phantom-tool-call.sh
#7 + #11 silent deletion / post-compaction confidence state-precompact.sh + state-postcompact.sh + state-sessionstart.sh
#8 source documents read but not used no-fake-stats.sh + no-fake-cite.sh
Sara's refusal-to-disagree substrate Surface artifacts (validation-amplification, fake-evidence) — substrate fix needs Anthropic

Honest scope

This catches the textual signature of dark patterns, not the underlying disposition. Patti's request for a training-level fix (her items #1-7) is the right ask, addressed to the right party (Anthropic). The suite is a runtime gap-filler operators can run today while that work happens.

Conservative on purpose — would rather false-positive on legitimate prose than false-negative on the actual pattern. Allow-clauses are explicit and documented in each hook's RECEIPTS.md.


If you arrive here from those threads or from darkbench.ai: the suite reproduces the textual artifacts of the failure modes you've already characterized. PRs welcome. Locale packs (es/de/fr/pt/pl) and stress fixtures (337 currently) are the easiest contribution surfaces.

Metadata

Metadata

Assignees

No one assigned

    Labels

    documentationImprovements or additions to documentation

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions