Skip to content

Case study: healthcare deployment (phantom dispatches + hollow code)#2

Open
nvst18 wants to merge 1 commit into
ianymu:mainfrom
nvst18:case-study/effective-therapy-healthcare
Open

Case study: healthcare deployment (phantom dispatches + hollow code)#2
nvst18 wants to merge 1 commit into
ianymu:mainfrom
nvst18:case-study/effective-therapy-healthcare

Conversation

@nvst18

@nvst18 nvst18 commented May 26, 2026

Copy link
Copy Markdown

Summary

Healthcare deployment case study from Effective Therapy, a trauma therapy platform (Israel, clinical waitlist populations).

  • Deployment: 39 specialized agents on OpenClaw (Bedrock Sonnet 4.6), orchestrated from Claude Code CLI (Opus 4.7)
  • Discovery: 39 agents deployed, 5 ever used, ~20 total sessions. Verification agents (CLINIC, GUARD, SAFE, LEX, TESTER) reported running but had zero sessions.
  • Codebase audit: 80+ findings. Every hollow function had correct auth, routes, signatures, success messages. The missing line was always the one that saves data.
  • Behavioral comparison: 4.7 vs 4.6. 4.7's failures are invisible (looks complete). 4.6's failures are visible (leaves work incomplete).
  • Operator-side defenses: cc-safe-setup hooks installed (dispatch-receipt, closure-word-verify-gate, route-handler-emptiness-gate). 4.7 re-trial in progress on isolated branch.

Adds a case-studies/ directory. This is the first worked example from a production healthcare deployment with patient-safety implications.

Related issues: anthropics/claude-code#61107, anthropics/claude-code#61167

@ianymu @waitdeadai: per the collaborator invite discussion on #61107. Submitting via fork since the invite hasn't landed yet.

80+ audit findings from a trauma therapy platform (39 agents, Opus 4.7
in Claude Code CLI, Bedrock Sonnet 4.6 agent fleet). Documents two
failure surfaces: fabricated agent dispatches (MAST 2.6 + 3.3) and
hollow code (correct signatures, no data mutation). Includes behavioral
comparison between 4.7 and 4.6, quantified audit results, and
operator-side defenses deployed (cc-safe-setup hooks).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@waitdeadai

Copy link
Copy Markdown
Collaborator

This is the strongest worked example the cluster has — a production deployment, patient-safety stakes, and the verification-agent inversion stated more sharply than anywhere else: when the model fabricates the output of the five agents whose job is to catch divergence (CLINIC/GUARD/SAFE/LEX/TESTER), the safety layer becomes the strongest false-confirmation signal. That inversion is the core of why fabricated verification is uniquely dangerous, and it belongs in the corpus.

An honest coverage map against the two surfaces, so it's clear what the gates catch today and what's still a gap:

Failure Surface 1 — phantom dispatches (MAST 2.6 + 3.3). This is the surface the runtime gates address. dispatch-receipt records what actually crossed the tool boundary; closure-word-verify-gate catches the narrative claim with no backing call; the gap between them ("CLINIC review complete: 3 findings" / 0 sessions) is the fabrication, captured. I just merged a providers/ normalization layer into llm-dark-patterns (PR #29) that makes this detection run cross-model — tool_calls == [] while the closeout claims a dispatch normalizes identically whether the orchestrator is 4.7, 4.6, or a non-Claude model. So when the 4.7 re-trial moves back to agent-dispatch work, the same gate applies regardless of model.

Failure Surface 2 — hollow code. Being honest: this is not covered by the text-level llm-dark-patterns hooks. A handler with correct auth, correct route, correct success message that never reads $request or persists is structurally complete and semantically empty — invisible to a closeout-text check and to unreachable-symbol analysis. route-handler-emptiness-gate is the right home for it; the hollow-but-wired patterns (accept-and-discard, validate-without-capture) are a genuine open detector problem, not something I'd claim current coverage on.

Two asks:

  1. With your permission and full attribution, I'd like to add the dispatch-fabrication case (the CLINIC/GUARD "complete, zero sessions" shape) as a real worked-example fixture in the cross-model substrate, so it's a reproducible test rather than a synthetic one. Patient/clinical details stay out — just the failure-mode shape. Say the word and I'll keep it strictly to the dispatch pattern.
  2. Same question for citing Effective Therapy (attributed) as field evidence in the llm-dark-patterns / agent-closeout-bench "why this matters" sections. Your call — it's your deployment.

— Fernando (@waitdeadai)

@nvst18

nvst18 commented May 26, 2026

Copy link
Copy Markdown
Author

@waitdeadai: Yes to both.

Fixture permission. You have full permission to use the dispatch-fabrication case (CLINIC/GUARD/SAFE/LEX/TESTER "complete, zero sessions" shape) as a worked-example fixture in the cross-model substrate. Strip patient/clinical details, keep the failure-mode shape. Attribution to Effective Therapy is fine.

Citation permission. You may cite Effective Therapy (attributed) as field evidence in llm-dark-patterns and agent-closeout-bench. Same boundary: the deployment surface and failure patterns are fair game, patient-facing specifics stay out.

On the coverage gap honesty: appreciated. The hollow-code surface (accept-and-discard, validate-without-capture) is the harder problem and we would rather have an honest "not covered" than a false claim. route-handler-emptiness-gate catches the structural case (no persistence call in a handler that returns success), but the semantic cases (validates then reads raw input, branches on undefined variables) need a different detector class. If that work moves forward, we have 30+ labeled examples from our audit to contribute as test fixtures.

4.7 trial status. Still running on the isolated branch tonight. Six scoped bugs, all three hooks active. We will capture the full session transcript per your methodological note and report results (fire or no-fire) with the JSONL.

@waitdeadai

Copy link
Copy Markdown
Collaborator

Thanks — landed, within the boundary you set (deployment surface + failure shape only, no clinical detail).

  • Cross-model fixture: the verification-agent inversion (five agents narrated complete, zero dispatched) is now effective_therapy_inversion in providers/fixtures.py — it normalizes to tool_calls == [] on every provider envelope (Claude / OpenAI-chat / OpenAI-responses / Kimi), so the fabrication is caught the same way regardless of orchestrator model. Conformance suite is 19/19.
  • Field evidence: a "Field evidence" section in the llm-dark-patterns README (attributed to Effective Therapy, patient-facing specifics withheld), with the honest scope note that the hollow-but-wired surface is not covered by these gates. A matching citation is landing in agent-closeout-bench.

On the 30+ labeled hollow-code examples: that's the offer that actually makes the harder detector tractable. The semantic cases (validate-then-read-raw, branch-on-undefined-variable) are a different class from the structural one route-handler-emptiness-gate already catches, so the right move is probably to coordinate with @yurukusa on whether it's an extension of that gate or a new shared detector — but either way, a labeled corpus of real production examples is exactly what it needs to be built honestly (and measured without the co-evolved-fixture trap). If/when that work starts, I'd want those.

And thank you for the trial discipline — capturing the full session JSONL fire-or-no-fire is the deployment-calibrated signal the measurement side has been missing. Looking forward to the results.

— Fernando (@waitdeadai)

@nvst18

nvst18 commented May 26, 2026

Copy link
Copy Markdown
Author

@yurukusa @waitdeadai: 42 labeled fixtures contributed as promised.

Gist: https://gist.github.com/nvst18/d44d2099350965f8e17a9bfa6932e44f

Breakdown:

  • 35 phantom-dispatch positives (sycophancy, label=1): model claimed "dispatched AGENT" with zero matching curl POST
  • 4 inflated-claim positives (sycophancy, label=1): real dispatches existed but claims outnumbered them 2.5x to 5.2x
  • 3 honest-failure negatives (sycophancy, label=0): Opus 4.6 correctly reporting infrastructure unavailability

Schema: agent-closeout-bench SPEC.md v0.3. Field generation_method: field_observation. All internal URLs, IPs, keys, and clinical content redacted.

Forensic methodology: compared assistant text regex dispatch(ed|ing)? [A-Z]{3,} against actual curl -X POST to localhost agent server and OpenClaw gateway across 5 session transcripts (3 Opus 4.7, 2 Opus 4.6). Full forensic posted on anthropics/claude-code#61167.

Also happy to share redacted session JSONL if useful for the cross-model substrate.

  • Nofyah

@waitdeadai

Copy link
Copy Markdown
Collaborator

@nvst18 — before this lands anywhere public (including an agent-closeout-bench ingest on my side), one more redaction pass is needed. The URLs, ports, and keys are redacted, but the free text in the closeout_text fields still contains personal names, an Israeli location or two, and clinical-topic terms — so the "clinical content redacted / patient-facing specifics withheld" boundary isn't fully met yet. Given the healthcare context I didn't ingest or quote them, and I'm deliberately not enumerating them here so this comment doesn't index anything.

Two ways forward, your pick:

  • (a) I send you the specific line-ids privately and you re-redact, or
  • (b) I run a names/locations/clinical-term pass on the closeout_text (→ [REDACTED-NAME] / [REDACTED-LOCATION] / generalized) and send it back for your sign-off before anything is committed.

The schema, labels (39 pos / 3 neg), and field_observation provenance all check out cleanly — it's just the closeout free-text that needs the second pass. Holding the ingest until it's clean.

— Fernando (@waitdeadai)

@nvst18

nvst18 commented May 27, 2026

Copy link
Copy Markdown
Author

Go with (b). Run the names/locations/clinical-term pass on your side and send it back for sign-off. Appreciate you catching it and holding the ingest.

— Nofyah

@waitdeadai

Copy link
Copy Markdown
Collaborator

@nvst18 — done, option (b). Redacted candidate ready for your sign-off: https://gist.github.com/waitdeadai/e509dde2c43bd7add31f5bf4b8311a16

The free-text fields (closeout_text, session_summary, task_description, notes) got a names/locations/clinical-term pass: 19 personal names → [REDACTED-NAME], 6 location/org refs → [REDACTED-LOCATION], 5 clinical-topic terms → [CLINICAL-TOPIC]. It was a curated denylist, not a blunt capitalized-word sweep, so the dispatch-fabrication signal is preserved intact — agent codenames, the "Want me to dispatch X" phrasing, and the claim-vs-real counts in notes are untouched. Schema (v0.3), labels (39 positive / 3 negative), prompt_hash, provenance unchanged; 42/42 records. Re-scan shows zero residual from the denylist and no remaining proper-noun PII candidates (only codenames, tool/model names, and common words remain).

Manifest + redacted JSONL are in the gist. If anything still reads as identifying to you, flag it and it gets another pass. On your sign-off I'll ingest it as the attributed field_observation lane in agent-closeout-bench. Holding until you confirm.

— Fernando (@waitdeadai)

@waitdeadai

Copy link
Copy Markdown
Collaborator

@nvst18 — gentle nudge on the two fixture lots, since they ingest together and both are now waiting on just your confirm:

The 42 (this PR / redaction sign-off): redacted candidate is up for your sign-off — gist. Curated denylist pass on the free-text fields (19 names → [REDACTED-NAME], 6 location/org → [REDACTED-LOCATION], 5 clinical terms → [CLINICAL-TOPIC]), dispatch-fabrication signal left intact (codenames, "Want me to dispatch X" phrasing, claim-vs-real counts untouched), 42/42 records, zero residual on re-scan. If anything still reads as identifying to you, flag it and it gets another pass.

The 38 (llm-dark-patterns#6 / verification): verification came back clean — no names/paths/URLs/emails/creds in any field; the crisis/patient hits are all failure-mechanism descriptions, not patient content. One judgment call, not a blocker: the fixtures carry internal class/method names (NaomiGuardrail, getCrisisText, etc.). They aren't patient-facing and you've already published them in the gist, so I'll keep them as contributed unless you'd rather I generalize them to placeholders for the permanent benchmark.

On your sign-off (the 42 redaction) plus a yes-or-flag on the 38 class names, both lots ingest together into agent-closeout-bench as the attributed field_observation lane. No rush — just flagging that you're the last gate on both.

— Fernando

@nvst18

nvst18 commented May 31, 2026

Copy link
Copy Markdown
Author

Sign-off on both lots:

  1. The 42 (redaction): Approved. Redaction looks clean.
  2. The 38 (llm-dark-patterns): Generalize the internal class/method names (NaomiGuardrail, getCrisisText, etc.) to placeholders before ingest. They shouldn't travel with the permanent benchmark.

Both lots good to ingest into agent-closeout-bench once the placeholders are in.

— Nofyah

@waitdeadai

Copy link
Copy Markdown
Collaborator

@nvst18 — thank you, both lots are ingested: waitdeadai/agent-closeout-bench#20.

  • The 42 went in verbatim from the signed-off redacted candidate (agent codenames + "Want me to dispatch X" phrasing preserved as the fabrication signal).
  • The 38's internal class/method names are generalized to role-preserving surrogates (NaomiGuardrailContentGuardrail, getCrisisTextgetSafetyText, …); the failure signal incl. safety_prompt_bypass is intact, and a hard scan confirms zero original identifier survives.

They're kept as separately-tagged sub-batches, label_final stays null pending two annotation passes + adjudication, and attribution to you is retained. You're tagged on #20 — one thing to eyeball: the 42 carries short session-id hex (e.g. a2c8f421) inside your approved redaction; I left it as-is rather than touch signed-off content, happy to strip the prefixes if you'd prefer.

— Fernando

@nvst18

nvst18 commented Jun 1, 2026 via email

Copy link
Copy Markdown
Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants