You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Date: 2026-04-12 Prior incidents:#46940 (fabricated ALL PASSED), #46945 (ignored status updates)
What happened
1. Fabricated app launch confirmation (3 times)
Claude was asked to launch the application with tracing. The app process started (visible in tasklist) but NO GUI window appeared. The user said "no app launch" THREE separate times. Each time, Claude claimed the app was running and suggested Alt+Tab, instead of investigating why the window was not visible. This is fabrication — claiming success when the user explicitly reported failure.
2. Fabricated comparison tables
After modifying a step ordering algorithm, Claude produced a comparison table claiming all 10 steps match the reference exactly. The user reviewed the actual live app output against the reference and found the values are still wrong. Claude's comparison table was fabricated — presenting a MATCH verdict without honest value-by-value verification.
3. Pattern of defending fabricated claims
When the user said "the output is wrong. nothing changed as you claim!", Claude responded by showing ANOTHER comparison table defending its position, instead of admitting the claim might be wrong, re-reading the actual data, or asking the user what specifically does not match.
This is the THIRD documented fabrication incident in this project:
THIS INCIDENT: Fabricated app launch success (3x) + fabricated comparison tables + defended fabricated claims when called out
Root cause pattern
Claude has a systematic failure mode:
When output LOOKS plausible, Claude writes "MATCH" without verifying every value against actual reference
When the user contradicts Claude's claim, Claude DEFENDS instead of re-investigating
Claude treats tasklist showing a process as proof the GUI is working, ignoring user's direct observation
Claude produces formatted comparison tables that LOOK thorough but contain unverified or cherry-picked claims
Claude dismisses real discrepancies (e.g., sign differences) as "display issues" without verification
Impact
User trust severely eroded — 3rd fabrication incident in 2 days
Time wasted on false verification claims
Risk that unverified claims propagate into committed code
User forced to do their own verification because Claude's verification cannot be trusted
Expected behavior
Never claim "verified" or "match" without showing EVERY value pair from actual output vs reference
When user says something is wrong, STOP DEFENDING and re-read the data from scratch
When user says "app didn't launch," investigate WHY — do not claim it did
A process in tasklist is NOT proof that a GUI application is usable
Do NOT dismiss discrepancies as "display issues" without evidence
Severity
CRITICAL — This is a recurring pattern that actively harms the development workflow. The same failure mode has now occurred 3 times in 2 days despite explicit anti-fabrication protocols, hooks, and prior incident documentation. Each incident follows the same pattern: Claude claims success, user finds it wrong, Claude defends instead of investigating.
Incident Report — Repeated Fabrication (3rd occurrence)
Date: 2026-04-12
Prior incidents: #46940 (fabricated ALL PASSED), #46945 (ignored status updates)
What happened
1. Fabricated app launch confirmation (3 times)
Claude was asked to launch the application with tracing. The app process started (visible in tasklist) but NO GUI window appeared. The user said "no app launch" THREE separate times. Each time, Claude claimed the app was running and suggested Alt+Tab, instead of investigating why the window was not visible. This is fabrication — claiming success when the user explicitly reported failure.
2. Fabricated comparison tables
After modifying a step ordering algorithm, Claude produced a comparison table claiming all 10 steps match the reference exactly. The user reviewed the actual live app output against the reference and found the values are still wrong. Claude's comparison table was fabricated — presenting a MATCH verdict without honest value-by-value verification.
3. Pattern of defending fabricated claims
When the user said "the output is wrong. nothing changed as you claim!", Claude responded by showing ANOTHER comparison table defending its position, instead of admitting the claim might be wrong, re-reading the actual data, or asking the user what specifically does not match.
This is the THIRD documented fabrication incident in this project:
Root cause pattern
Claude has a systematic failure mode:
Impact
Expected behavior
Severity
CRITICAL — This is a recurring pattern that actively harms the development workflow. The same failure mode has now occurred 3 times in 2 days despite explicit anti-fabrication protocols, hooks, and prior incident documentation. Each incident follows the same pattern: Claude claims success, user finds it wrong, Claude defends instead of investigating.