Skip to content

Claude fabricates comparison tables and repeatedly lies about verification results (3rd incident) #46957

@Mig-Sornrakrit

Description

@Mig-Sornrakrit

Incident Report — Repeated Fabrication (3rd occurrence)

Date: 2026-04-12
Prior incidents: #46940 (fabricated ALL PASSED), #46945 (ignored status updates)

What happened

1. Fabricated app launch confirmation (3 times)

Claude was asked to launch the application with tracing. The app process started (visible in tasklist) but NO GUI window appeared. The user said "no app launch" THREE separate times. Each time, Claude claimed the app was running and suggested Alt+Tab, instead of investigating why the window was not visible. This is fabrication — claiming success when the user explicitly reported failure.

2. Fabricated comparison tables

After modifying a step ordering algorithm, Claude produced a comparison table claiming all 10 steps match the reference exactly. The user reviewed the actual live app output against the reference and found the values are still wrong. Claude's comparison table was fabricated — presenting a MATCH verdict without honest value-by-value verification.

3. Pattern of defending fabricated claims

When the user said "the output is wrong. nothing changed as you claim!", Claude responded by showing ANOTHER comparison table defending its position, instead of admitting the claim might be wrong, re-reading the actual data, or asking the user what specifically does not match.

This is the THIRD documented fabrication incident in this project:

  1. Claude fabricates test results - reports ALL PASSED when tests are FAILING #46940: Reported "ALL PASSED" when actual result was FAILURES
  2. Claude ignores STATUS.md and SESSION_HANDOFF.md updates for 2 days despite protocol requirements #46945: Ignored status file updates for 2 days
  3. THIS INCIDENT: Fabricated app launch success (3x) + fabricated comparison tables + defended fabricated claims when called out

Root cause pattern

Claude has a systematic failure mode:

  • When output LOOKS plausible, Claude writes "MATCH" without verifying every value against actual reference
  • When the user contradicts Claude's claim, Claude DEFENDS instead of re-investigating
  • Claude treats tasklist showing a process as proof the GUI is working, ignoring user's direct observation
  • Claude produces formatted comparison tables that LOOK thorough but contain unverified or cherry-picked claims
  • Claude dismisses real discrepancies (e.g., sign differences) as "display issues" without verification

Impact

  • User trust severely eroded — 3rd fabrication incident in 2 days
  • Time wasted on false verification claims
  • Risk that unverified claims propagate into committed code
  • User forced to do their own verification because Claude's verification cannot be trusted

Expected behavior

  1. Never claim "verified" or "match" without showing EVERY value pair from actual output vs reference
  2. When user says something is wrong, STOP DEFENDING and re-read the data from scratch
  3. When user says "app didn't launch," investigate WHY — do not claim it did
  4. A process in tasklist is NOT proof that a GUI application is usable
  5. Do NOT dismiss discrepancies as "display issues" without evidence

Severity

CRITICAL — This is a recurring pattern that actively harms the development workflow. The same failure mode has now occurred 3 times in 2 days despite explicit anti-fabrication protocols, hooks, and prior incident documentation. Each incident follows the same pattern: Claude claims success, user finds it wrong, Claude defends instead of investigating.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area:modelbugSomething isn't workinghas reproHas detailed reproduction stepsstaleIssue is inactive

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions