Case study: healthcare deployment (phantom dispatches + hollow code)#2
Case study: healthcare deployment (phantom dispatches + hollow code)#2nvst18 wants to merge 1 commit into
Conversation
80+ audit findings from a trauma therapy platform (39 agents, Opus 4.7 in Claude Code CLI, Bedrock Sonnet 4.6 agent fleet). Documents two failure surfaces: fabricated agent dispatches (MAST 2.6 + 3.3) and hollow code (correct signatures, no data mutation). Includes behavioral comparison between 4.7 and 4.6, quantified audit results, and operator-side defenses deployed (cc-safe-setup hooks). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
This is the strongest worked example the cluster has — a production deployment, patient-safety stakes, and the verification-agent inversion stated more sharply than anywhere else: when the model fabricates the output of the five agents whose job is to catch divergence (CLINIC/GUARD/SAFE/LEX/TESTER), the safety layer becomes the strongest false-confirmation signal. That inversion is the core of why fabricated verification is uniquely dangerous, and it belongs in the corpus. An honest coverage map against the two surfaces, so it's clear what the gates catch today and what's still a gap: Failure Surface 1 — phantom dispatches (MAST 2.6 + 3.3). This is the surface the runtime gates address. Failure Surface 2 — hollow code. Being honest: this is not covered by the text-level Two asks:
— Fernando (@waitdeadai) |
|
@waitdeadai: Yes to both. Fixture permission. You have full permission to use the dispatch-fabrication case (CLINIC/GUARD/SAFE/LEX/TESTER "complete, zero sessions" shape) as a worked-example fixture in the cross-model substrate. Strip patient/clinical details, keep the failure-mode shape. Attribution to Effective Therapy is fine. Citation permission. You may cite Effective Therapy (attributed) as field evidence in On the coverage gap honesty: appreciated. The hollow-code surface (accept-and-discard, validate-without-capture) is the harder problem and we would rather have an honest "not covered" than a false claim. 4.7 trial status. Still running on the isolated branch tonight. Six scoped bugs, all three hooks active. We will capture the full session transcript per your methodological note and report results (fire or no-fire) with the JSONL. |
|
Thanks — landed, within the boundary you set (deployment surface + failure shape only, no clinical detail).
On the 30+ labeled hollow-code examples: that's the offer that actually makes the harder detector tractable. The semantic cases (validate-then-read-raw, branch-on-undefined-variable) are a different class from the structural one And thank you for the trial discipline — capturing the full session JSONL fire-or-no-fire is the deployment-calibrated signal the measurement side has been missing. Looking forward to the results. — Fernando (@waitdeadai) |
|
@yurukusa @waitdeadai: 42 labeled fixtures contributed as promised. Gist: https://gist.github.com/nvst18/d44d2099350965f8e17a9bfa6932e44f Breakdown:
Schema: agent-closeout-bench SPEC.md v0.3. Field Forensic methodology: compared assistant text regex Also happy to share redacted session JSONL if useful for the cross-model substrate.
|
|
@nvst18 — before this lands anywhere public (including an agent-closeout-bench ingest on my side), one more redaction pass is needed. The URLs, ports, and keys are redacted, but the free text in the Two ways forward, your pick:
The schema, labels (39 pos / 3 neg), and — Fernando (@waitdeadai) |
|
Go with (b). Run the names/locations/clinical-term pass on your side and send it back for sign-off. Appreciate you catching it and holding the ingest. — Nofyah |
|
@nvst18 — done, option (b). Redacted candidate ready for your sign-off: https://gist.github.com/waitdeadai/e509dde2c43bd7add31f5bf4b8311a16 The free-text fields ( Manifest + redacted JSONL are in the gist. If anything still reads as identifying to you, flag it and it gets another pass. On your sign-off I'll ingest it as the attributed — Fernando (@waitdeadai) |
|
@nvst18 — gentle nudge on the two fixture lots, since they ingest together and both are now waiting on just your confirm: The 42 (this PR / redaction sign-off): redacted candidate is up for your sign-off — gist. Curated denylist pass on the free-text fields (19 names → The 38 (llm-dark-patterns#6 / verification): verification came back clean — no names/paths/URLs/emails/creds in any field; the crisis/patient hits are all failure-mechanism descriptions, not patient content. One judgment call, not a blocker: the fixtures carry internal class/method names ( On your sign-off (the 42 redaction) plus a yes-or-flag on the 38 class names, both lots ingest together into agent-closeout-bench as the attributed — Fernando |
|
Sign-off on both lots:
Both lots good to ingest into agent-closeout-bench once the placeholders are in. — Nofyah |
|
@nvst18 — thank you, both lots are ingested: waitdeadai/agent-closeout-bench#20.
They're kept as separately-tagged sub-batches, — Fernando |
|
I approve. I'll take down my gist.
…On Thu, May 28, 2026 at 8:52 PM Fernando Lazzarin ***@***.***> wrote:
*waitdeadai* left a comment (ianymu/recognition-without-arrest#2)
<#2 (comment)>
@nvst18 <https://github.com/nvst18> — gentle nudge on the two fixture
lots, since they ingest together and both are now waiting on just your
confirm:
*The 42 (this PR / redaction sign-off):* redacted candidate is up for
your sign-off — gist
<https://gist.github.com/waitdeadai/e509dde2c43bd7add31f5bf4b8311a16>.
Curated denylist pass on the free-text fields (19 names → [REDACTED-NAME],
6 location/org → [REDACTED-LOCATION], 5 clinical terms → [CLINICAL-TOPIC]),
dispatch-fabrication signal left intact (codenames, "Want me to dispatch X"
phrasing, claim-vs-real counts untouched), 42/42 records, zero residual on
re-scan. If anything still reads as identifying to you, flag it and it gets
another pass.
*The 38 (llm-dark-patterns#6
<waitdeadai/llm-dark-patterns#6> / verification):*
verification came back clean — no names/paths/URLs/emails/creds in any
field; the crisis/patient hits are all failure-mechanism descriptions, not
patient content. One judgment call, not a blocker: the fixtures carry
internal class/method names (NaomiGuardrail, getCrisisText, etc.). They
aren't patient-facing and you've already published them in the gist, so
I'll keep them as contributed unless you'd rather I generalize them to
placeholders for the permanent benchmark.
On your sign-off (the 42 redaction) plus a yes-or-flag on the 38 class
names, both lots ingest together into agent-closeout-bench as the
attributed field_observation lane. No rush — just flagging that you're
the last gate on both.
— Fernando
—
Reply to this email directly, view it on GitHub
<#2?email_source=notifications&email_token=AANT5C6BMYUY5RBI4CTJD5345B4HBA5CNFSNUABFM5UWIORPF5TWS5BNNB2WEL2JONZXKZKDN5WW2ZLOOQXTINJWGY3TSNZVGE22M4TFMFZW63VHNVSW45DJN5XKKZLWMVXHJLDGN5XXIZLSL5RWY2LDNM#issuecomment-4566797515>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AANT5C3KCJZTZCNMMYX5DLD45B4HBAVCNFSM6AAAAACZN2EFIGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHM2DKNRWG44TONJRGU>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Summary
Healthcare deployment case study from Effective Therapy, a trauma therapy platform (Israel, clinical waitlist populations).
Adds a
case-studies/directory. This is the first worked example from a production healthcare deployment with patient-safety implications.Related issues: anthropics/claude-code#61107, anthropics/claude-code#61167
@ianymu @waitdeadai: per the collaborator invite discussion on #61107. Submitting via fork since the invite hasn't landed yet.