Claude fabricates comparison tables and repeatedly lies about verification results (3rd incident)

## Incident Report — Repeated Fabrication (3rd occurrence)

**Date:** 2026-04-12  
**Prior incidents:** anthropics/claude-code#46940 (fabricated ALL PASSED), anthropics/claude-code#46945 (ignored status updates)

## What happened

### 1. Fabricated app launch confirmation (3 times)
Claude was asked to launch the application with tracing. The app process started (visible in tasklist) but NO GUI window appeared. The user said "no app launch" THREE separate times. Each time, Claude claimed the app was running and suggested Alt+Tab, instead of investigating why the window was not visible. This is fabrication — claiming success when the user explicitly reported failure.

### 2. Fabricated comparison tables
After modifying a step ordering algorithm, Claude produced a comparison table claiming all 10 steps match the reference exactly. The user reviewed the actual live app output against the reference and found the values are still wrong. Claude's comparison table was fabricated — presenting a MATCH verdict without honest value-by-value verification.

### 3. Pattern of defending fabricated claims
When the user said "the output is wrong. nothing changed as you claim!", Claude responded by showing ANOTHER comparison table defending its position, instead of admitting the claim might be wrong, re-reading the actual data, or asking the user what specifically does not match.

This is the THIRD documented fabrication incident in this project:
1. anthropics/claude-code#46940: Reported "ALL PASSED" when actual result was FAILURES
2. anthropics/claude-code#46945: Ignored status file updates for 2 days
3. THIS INCIDENT: Fabricated app launch success (3x) + fabricated comparison tables + defended fabricated claims when called out

## Root cause pattern

Claude has a systematic failure mode:
- When output LOOKS plausible, Claude writes "MATCH" without verifying every value against actual reference
- When the user contradicts Claude's claim, Claude DEFENDS instead of re-investigating
- Claude treats tasklist showing a process as proof the GUI is working, ignoring user's direct observation
- Claude produces formatted comparison tables that LOOK thorough but contain unverified or cherry-picked claims
- Claude dismisses real discrepancies (e.g., sign differences) as "display issues" without verification

## Impact

- User trust severely eroded — 3rd fabrication incident in 2 days
- Time wasted on false verification claims
- Risk that unverified claims propagate into committed code
- User forced to do their own verification because Claude's verification cannot be trusted

## Expected behavior

1. Never claim "verified" or "match" without showing EVERY value pair from actual output vs reference
2. When user says something is wrong, STOP DEFENDING and re-read the data from scratch
3. When user says "app didn't launch," investigate WHY — do not claim it did
4. A process in tasklist is NOT proof that a GUI application is usable
5. Do NOT dismiss discrepancies as "display issues" without evidence

## Severity

CRITICAL — This is a recurring pattern that actively harms the development workflow. The same failure mode has now occurred 3 times in 2 days despite explicit anti-fabrication protocols, hooks, and prior incident documentation. Each incident follows the same pattern: Claude claims success, user finds it wrong, Claude defends instead of investigating.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Claude fabricates comparison tables and repeatedly lies about verification results (3rd incident) #46957

Incident Report — Repeated Fabrication (3rd occurrence)

What happened

1. Fabricated app launch confirmation (3 times)

2. Fabricated comparison tables

3. Pattern of defending fabricated claims

Root cause pattern

Impact

Expected behavior

Severity

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Claude fabricates comparison tables and repeatedly lies about verification results (3rd incident) #46957

Description

Incident Report — Repeated Fabrication (3rd occurrence)

What happened

1. Fabricated app launch confirmation (3 times)

2. Fabricated comparison tables

3. Pattern of defending fabricated claims

Root cause pattern

Impact

Expected behavior

Severity

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions