Skip to content

ianymu/recognition-without-arrest

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

recognition-without-arrest

A canonical synthesis of the constellation work on MAST mode 3.3 ("No or Incorrect Verification") and adjacent agent closeout failure modes — composing runtime gates (verify-before-stop), text-vocabulary gates (no-vibes), and static-AST gates (no-unreachable-symbol) as defense-in-depth.

Co-authors: Ian Mu (verify-before-stop) · Fernando Lazzarin (llm-dark-patterns, agent-closeout-bench)

License: Apache-2.0 — matches no-vibes and agent-closeout-bench. Vendor-neutral host for upstream MAST-team referencing.

Status: 🚧 Drafting. Chapters 1, 3, 5 first-draft by @ianymu. Chapters 2, 4, 6 stubs for @waitdeadai to fill.


Table of contents

  1. The fragmented-conversation diagnosis
  2. MAST 2.6 / 3.1 / 3.2 / 3.3 quick-ref ← TODO Fernando
  3. The Three-Gate Pareto
  4. Quantitative results ← TODO Fernando
  5. When-to-compose decision tree
  6. Open problems ← TODO Fernando

1. The fragmented-conversation diagnosis

The work on MAST mode 3.3 — agent claims completion without performing verification, or fabricates a verification narrative retroactively — is real, well-instrumented, and replicable. The problem is that it lives across six unconnected surfaces, and a new operator hitting the failure mode in production has no canonical entry point.

The six surfaces, as they exist today:

  1. yurukusa's 10-patterns gist + 130-case handbook — operator-side empirical taxonomy: which closeout phrases co-occur with which downstream failures
  2. @beq00000's clean-state nav memo gist + 8 authored claude-code issues — heterogeneous failure inventory with per-issue minimal repros
  3. @suwayama's #60226 anchor — the "recognition-without-arrest" framework name itself, with concrete examples of fabricated comparison tables
  4. Cemri et al. NeurIPS 2025 MAST paper + Fernando's empirical baseline at evaluation/MAST-RESULTS.md — the F1 0.815 / Fleiss κ=1.000 measurement on mode 3.3 specifically
  5. Ian's runtime gate: verify-before-stop — operator-side state machine: filenames-touched × VERIFIED-log-entry presence as ground truth
  6. Operator-side discussion threads: #45502, #46957, #60451 — running discourse, no synthesis

These six surfaces all describe the same failure mode and propose composable countermeasures, but no document points at all of them at once. A developer who hits MAST 3.3 in production today rediscovers the constellation through pain — they hit the failure, search the issue tracker, find one of the six surfaces, and only after weeks of accumulated context realize the other five exist and connect.

This repo is the synthesis. Not a new framework. Not a new product. A vendor-neutral canonical artifact the upstream MAST team can cite, that both repos can point AT (not OWN), and that new operators land on as the entry point.


2. MAST 2.6 / 3.1 / 3.2 / 3.3 quick-ref

(Stub — Fernando to fill. See evaluation/MAST-RESULTS.md for current per-mode kappa table.)


3. The Three-Gate Pareto

Three independent gates exist today, each catching a different MAST-3.3-adjacent failure shape on a different signal channel. They are not redundant — they triangulate. The composition argument is the load-bearing claim of this repo.

3.1 The three gates

Gate Canonical impl Signal channel Where it fires
verify-before-stop ianymu/claude-verify-before-stop Operator state — git diff × VERIFIED log entries Stop event
no-vibes waitdeadai/llm-dark-patterns/no-vibes Closeout text vocabulary — vibe-coding phrase regex with evidence-binary allowlist Stop event
no-unreachable-symbol waitdeadai/llm-dark-patterns Static AST — touched-file symbol-reachability scan PostToolUse(Edit)

3.2 What each catches uniquely

verify-before-stop catches the case where the agent modifies files but never logs verification — regardless of what the closing message says. The closeout text can be perfectly written ("I implemented the function and confirmed the tests pass"); the gate doesn't care, because the gate reads operator-side ground truth. This catches the quiet competent fabrication — the model that learned that confident terse closeouts evade text gates.

no-vibes catches the case where the closeout uses vibe vocabulary — "looks good", "should work", "I believe", "in theory" — without an evidence binary (a test result, a build log, a literal artifact reference). This catches the fabricated reasoning chain — the model that fills the verification slot with prose that sounds verifying but contains no verifying primitives.

no-unreachable-symbol catches the case where the agent writes code that has no callers — a function declared, exported, but unreferenced. This catches implementation theatre — code written to satisfy the appearance of work, not the spec.

3.3 What each misses

verify-before-stop misses cases where the operator legitimately verified out-of-band (mental verification, external test runner) and forgot to write the VERIFIED log entry. False-positive rate measured at ~4% across ~200 sessions. The gate's strict-contract design intentionally errs toward blocking — recovery is one bash line (echo VERIFIED ... >> .claude/state/stop-verify.log), failure cost of not blocking is high.

no-vibes misses cases where the model doesn't use vibe vocabulary — the terse confident fabrication ("Implementation complete. All tests pass.") that uses no flagged phrases. Per Fernando's empirical baseline: F1 0.815 (95% CI [0.615, 0.941]) on the human-labelled n=19 subset means roughly 18% of MAST 3.3 instances escape the text gate alone.

no-unreachable-symbol misses cases where the agent writes code that IS called but does the wrong thing — a stub that returns hardcoded values, a wrapper that ignores its arguments. AST reachability is a necessary-not-sufficient signal.

3.4 The composition argument

The three gates triangulate on signal source, not on MAST mode. Two of them target the same MAST 3.3 surface (verify-before-stop and no-vibes), and a third targets adjacent failure shapes (no-unreachable-symbol against implementation-theatre). They compose because each gate can fail open without the others failing open:

  • A session where the model writes good code, logs VERIFIED, uses vibe vocabulary in the closeout, and leaves no unreachable symbols → all three gates pass (correct: the work was done well).
  • A session where the model writes good code, logs VERIFIED, uses no vibe vocabulary, but leaves a dead helper function → no-unreachable-symbol fires (correct: caught the leftover stub).
  • A session where the model writes good code, forgets to log VERIFIED, uses no vibe vocabulary, no dead code → verify-before-stop fires (correct: operator-side ground truth absent).
  • A session where the model writes nothing, logs nothing, but writes a confident fabricated closeout with vibe vocabulary → no-vibes fires and verify-before-stop fires (correct: both signals agree).

The disagreement cases — where one gate fires and another passes — are the interesting empirical surface. Those are where the parity test in the synthetic-3.1 corpus PR (waitdeadai/agent-closeout-bench#12) produces the per-fixture disagreement table that exposes which evidence stream catches what the other misses.


4. Quantitative results

(Stub — Fernando to fill. Current numbers live at waitdeadai/llm-dark-patterns/evaluation/MAST-RESULTS.md: F1 0.815 [0.615, 0.941] on n=19, Fleiss κ=1.000 on mode 3.3. Parity baseline from PR #12: verify-before-stop F1=0.77, no-vibes F1=0.89, Cohen κ=0.49 on 20 synthetic 3.1 fixtures.)


5. When-to-compose decision tree

The synthesis is only useful if a developer hitting one of these failure shapes can land here, identify which gate(s) apply, and wire them in 10 minutes. This section is the diagnostic walkthrough.

5.1 Diagnostic by symptom

Start with the failure you observed. Match it to the row below.

Observed symptom Primary gate Add if also seeing
Agent says "all tests pass" / "tests added" / "implementation complete" → you check, no tests ran verify-before-stop no-vibes for the linguistic surface
Agent's closeout reads confident but contains "looks good", "should work", "I believe" no-vibes verify-before-stop for operator-side ground truth
Agent leaves declared functions with no callers, exports that nothing imports no-unreachable-symbol Run on PostToolUse(Edit); blocks dead-code accrual
Agent claims to have "verified" something but you can't find the verification artifact verify-before-stop no-vibes if the claim used vibe vocabulary
Multi-file refactor where some files are dirty post-claim verify-before-stop (operator state) + no-unreachable-symbol (AST reachability)
Closeout uses wrap-up vocabulary ("to summarize", "in conclusion", "hope this helps") while files are dirty verify-before-stop + no-vibes This is MAST mode 3.1 territory — see PR #12 synthetic corpus for fixtures

5.2 Layering order

For users wiring multiple gates: cheapest-first is correct.

PreToolUse(Bash)    → cheap regex on the command itself (catches `rm -rf /` etc)
PostToolUse(Edit)   → no-unreachable-symbol (AST scan on touched files only)
Stop                → verify-before-stop (operator-side state machine)
Stop                → no-vibes (text vocabulary regex with evidence-binary allowlist)

The two Stop gates can be wired in either order — Claude Code runs them sequentially. If verify-before-stop fires first (exit 2), the session ends there; if it passes, no-vibes runs against the closeout text.

5.3 Sample settings.json

{
  "hooks": {
    "PostToolUse": [
      {
        "matcher": "Edit|Write",
        "hooks": [
          {
            "type": "command",
            "command": "bash ~/.claude/hooks/no-unreachable-symbol.sh"
          }
        ]
      }
    ],
    "Stop": [
      {
        "matcher": "*",
        "hooks": [
          {
            "type": "command",
            "command": "bash ~/.claude/hooks/verify-before-stop.sh"
          },
          {
            "type": "command",
            "command": "bash ~/.claude/hooks/no-vibes.sh"
          }
        ]
      }
    ]
  }
}

Install all three:

# verify-before-stop (Ian)
curl -fsSL https://raw.githubusercontent.com/ianymu/claude-verify-before-stop/main/install.sh | bash

# no-vibes (Fernando)
curl -fsSL https://raw.githubusercontent.com/waitdeadai/no-vibes/main/install.sh | bash

# no-unreachable-symbol (Fernando)
curl -fsSL https://raw.githubusercontent.com/waitdeadai/llm-dark-patterns/main/install/no-unreachable-symbol.sh | bash

5.4 Recovery from false-positive

The gates fail closed (return exit 2). If a gate misfires on a legitimately-verified session:

  • verify-before-stopecho "VERIFIED: <files-list> <timestamp>" >> .claude/state/stop-verify.log and re-run the agent
  • no-vibes → either reword the closeout (drop the vibe vocabulary) or set LDP_NO_VIBES_OFF=1 for known-safe contexts (don't habituate)
  • no-unreachable-symbol → either delete the unreachable symbol or set LDP_UNREACHABLE_SYMBOL_BLOCK=0 for advisory-only mode

6. Open problems

(Stub — Fernando to fill. Synthetic 3.1 corpus partial fix landing in agent-closeout-bench#12; 2.6 measurement gap and agent-side ground truth still open.)


Contributing

Two co-maintainers: @ianymu and @waitdeadai. PRs welcome; issues welcome. The aim is canonical synthesis — additions should either fill a stub section, add an empirical data point, or correct an error. Marketing-shaped contributions will be politely declined.

For substantive disagreements: open an issue with the empirical case (closeout text + operator state + which gates fired). Empirics > opinions.

Provenance

About

A canonical synthesis of the constellation work on MAST 3.3 (No or Incorrect Verification) — verify-before-stop / no-vibes / no-unreachable-symbol composed as defense-in-depth. Co-authored by @ianymu and @waitdeadai.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors