Add adversarial-verification plugin#148
Conversation
|
|
There was a problem hiding this comment.
Claude Code Review
This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.
Tip: disable this comment in your organization's Code Review settings.
Stress-tests claims, designs, and bug findings by dispatching two isolated sub-agents (advocate + skeptic) and synthesizing their arguments into a structured verdict. Counters sycophancy and single- agent agreement bias by forcing maximal disagreement before commit. Two modes: - Decision mode: free-form arguments organized by evaluation dimensions for approach/design choices - Proof mode: N null hypotheses the skeptic proves and advocate refutes, for verifying bug findings and security claims Includes SKILL.md + 5 reference docs: - anti-patterns.md (10 common failure modes with diagnoses) - decision-mode.md (structure for approach selection) - proof-mode.md (N-null-hypothesis structure for finding verification) - prompt-templates.md (advocate/skeptic templates enforcing anti-balance) - synthesis.md (verdict table format and recommendation structure) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
7f05877 to
833dbef
Compare
|
@claude review once |
|
|
||
| **Why it happens:** Weak prompt. The agent defaulted to balanced reasoning because nothing stopped it. | ||
|
|
||
| **Fix:** Re-dispatch with a stronger prompt. See [prompt-templates.md](prompt-templates.md#re-dispatching-on-hedged-agents). The key phrase: "Do not acknowledge merit in the opposing position. Do not hedge." Do not accept a hedged response as valid output. |
There was a problem hiding this comment.
🟡 Nit: the four reference files chain to each other, which CLAUDE.md:178 explicitly disallows ("SKILL.md links to files, files don't chain to more files"). Specifically: anti-patterns.md:19 and synthesis.md:87 link to prompt-templates.md, and decision-mode.md:13 ↔ proof-mode.md:13 cross-link. Either inline the pointed-to content or have SKILL.md own the cross-doc nav. (Up-link to ../SKILL.md is fine.)
Extended reasoning...
What the bug is
CLAUDE.md:174-180 documents the progressive-disclosure rule for skills:
One level deep — SKILL.md links to files, files don't chain to more files.
Note: Directory depth is fine (
references/guides/topic.md). Reference chains are not (SKILL.md → file1.md → file2.mdwhere file1 references file2). The problem is chained references, not nested folders.
This new plugin has four sibling-to-sibling links inside references/ that form exactly the prohibited shape:
references/anti-patterns.md:19→prompt-templates.md#re-dispatching-on-hedged-agents(chain:SKILL.md → anti-patterns.md → prompt-templates.md)references/synthesis.md:87→prompt-templates.md#re-dispatching-on-hedged-agents(chain:SKILL.md → synthesis.md → prompt-templates.md)references/decision-mode.md:13→proof-mode.md(chain:SKILL.md → decision-mode.md → proof-mode.md)references/proof-mode.md:13→decision-mode.md(chain:SKILL.md → proof-mode.md → decision-mode.md)
The fifth link, anti-patterns.md:49 → ../SKILL.md, is upward navigation back to the root and is fine.
Why this matters
A reader pulled into anti-patterns.md for a specific failure mode now has to follow another link to get the actionable prompt-fix in prompt-templates.md. Same for a reader in either mode doc who needs to switch modes. This is the second-level descent the rule is meant to prevent: the reader can't tell from SKILL.md alone what the full graph of mandatory navigation looks like.
Addressing the refutation
A counter-argument is that this pattern is already used in plugins/agentic-actions-auditor/ (multiple vector-*.md files reference foundations.md) and plugins/semgrep-rule-creator/ (workflow.md references quick-reference.md), and that CLAUDE.md's intent is really about progressive-disclosure depth (deeper-and-deeper learning chains), not navigational sideways links.
That argument has some merit — the cited examples are real, and the mutual decision-mode ↔ proof-mode link is genuinely "sideways" rather than "deeper." But it doesn't fully overcome the rule as written. CLAUDE.md doesn't say "don't make the reader descend to learn more" — it says "files don't chain to more files," with the explicit canonical example being "SKILL.md → file1.md → file2.md where file1 references file2." That is exactly what is happening in all four cases here. A new skill is the right place to apply the rule cleanly rather than perpetuate a pattern that drifted from spec elsewhere.
Step-by-step proof of the chain
Concrete walkthrough for chain #1:
- User invokes the skill. SKILL.md:106-110 says: see
anti-patterns.mdfor full failure modes. - User opens
references/anti-patterns.md. Reads section "2. Hedged agents" at line 13-19. - The "Fix" line at
anti-patterns.md:19points the user toprompt-templates.md#re-dispatching-on-hedged-agentsfor the actual remedial prompt phrasing. - User now has to open a third file (a sibling of
anti-patterns.md) to complete the task that started at SKILL.md.
That's the SKILL.md → file1.md → file2.md chain CLAUDE.md prohibits.
How to fix
Two clean options:
- Inline the cross-references. The mode-switching nudge in
decision-mode.md:13/proof-mode.md:13("if claim type is X, use the other mode") is a one-liner; SKILL.md's Step 2 (lines 35-43) already has the mode-selection table, so the inline note in each mode doc is redundant and can be deleted. The "see prompt-templates.md for the re-dispatch prompt" pointers can either inline a one-sentence summary of the re-dispatch phrasing, or be removed (the prompt-templates file is already linked from SKILL.md:55). - Route nav through SKILL.md. Add a "Related references" sub-section in SKILL.md that maps situations to the right reference doc, so the references themselves never need to point at siblings.
Severity
Nit / convention. No functional impact; the skill works fine. Worth fixing on a fresh skill where the cleanup is cheap.
| ## When to use | ||
|
|
||
| | Situation | Use this skill? | | ||
| |-----------|----------------| | ||
| | Choosing between 2+ technical approaches | YES | | ||
| | Verifying a bug finding is real (not false positive) | YES | | ||
| | Reviewing a design decision before commit | YES | | ||
| | User asks "is this correct?" on non-trivial claim | YES | | ||
| | Any claim you're inclined to agree with by default | YES — that's the tell | | ||
| | Simple factual lookup ("what version is X?") | NO | | ||
| | Obvious syntax error fix | NO | | ||
| | User has already made the decision and is executing | NO | |
There was a problem hiding this comment.
🟡 The new SKILL.md has a single combined ## When to use section (line 16) with a YES/NO table instead of the two separate ## When to Use and ## When NOT to Use sections that CLAUDE.md (lines 153-163) requires for every SKILL.md. This is a pre-existing convention that ~60% of existing skills follow; the YES/NO table covers the same ground informationally, so this is a style/convention nit only — split it into two sections to match the house standard.
Extended reasoning...
The convention. CLAUDE.md lines 153-163 has a ### Required Sections subsection that explicitly states "Every SKILL.md must include" both ## When to Use and ## When NOT to Use as separate top-level headings. The new SKILL.md at plugins/adversarial-verification/skills/adversarial-verification/SKILL.md line 16 has ## When to use (note the lowercase "to use") containing a single YES/NO table — there is no separate ## When NOT to Use heading.
The content is present, just structured differently. The NO rows in the table ("Simple factual lookup", "Obvious syntax error fix", "User has already made the decision and is executing") cover the When-NOT-to-Use scenarios informationally. So this is purely a structural/heading convention violation, not a missing-information issue.
Why it matters anyway. The two-heading convention is the house standard documented in the contributor guide. Splitting them out makes it easier to grep/skim across the marketplace's many skills (grep -A5 "When NOT to Use" works uniformly), and roughly 44 of the 74 existing SKILL.md files (~60%) follow the explicit two-heading convention — including recent additions like zeroize-audit, sharp-edges, and mutation-testing. New additions should match.
Why this is a nit, not a blocker. (1) The information is fully present, just in table form. (2) The CLAUDE.md convention is not validated by CI — validate_codex_skills.py does not enforce section structure, so this slipped past the test plan's validation step. (3) About 30 of the 74 existing SKILL.md files (~40%) also lack a separate When-NOT-to-Use section, so the convention is unevenly enforced in the existing tree. (4) The submitter themselves marked the severity as nit.
Step-by-step proof.
- Open
CLAUDE.mdand read lines 153-163. The### Required Sectionsblock contains a markdown code block showing both## When to Useand## When NOT to Useas separate sections preceded by "Every SKILL.md must include:". - Open
plugins/adversarial-verification/skills/adversarial-verification/SKILL.md. Search for## When NOT to Use— no match. - Search for
## When— only one match: line 16,## When to use(lowercase "to use"). - Lines 17-27 are a single table containing a YES/NO column. The NO rows are the When-NOT-to-Use content but live under the same heading as the YES rows.
- Conclusion: the convention requires two separate headings; the file has one combined heading. Convention violated.
How to fix. Cheap edit. Rename ## When to use → ## When to Use (also fixes a casing inconsistency with the convention), keep the YES rows under it, then add a sibling ## When NOT to Use section with the NO rows (or convert them to a short bulleted list / brief prose). Approximately a 5-line restructure with no semantic change.
| ## Verdict rule for proof mode | ||
|
|
||
| | All Ps REFUTED by advocate, NOT PROVED by skeptic | **CONFIRMED** — the finding is real | | ||
| | Any P clearly PROVED by skeptic | **DISMISSED** — the finding is a false positive | |
There was a problem hiding this comment.
🟡 The 'Verdict rule for proof mode' block at proof-mode.md:47-51 uses GFM table syntax but is missing the required header row and |---|---| delimiter row, so it will render as plain text with literal pipe characters instead of a table. The fix is trivial: add a header line like | Condition | Verdict | followed by |-----------|---------| above the three existing rows. Severity nit — purely a documentation rendering issue, no functional impact.
Extended reasoning...
What the bug is
In plugins/adversarial-verification/skills/adversarial-verification/references/proof-mode.md, the section ## Verdict rule for proof mode (lines 47-51) contains three rows that begin with pipe characters but has no header row and no |---|---| delimiter row. Per the GFM spec, a valid table requires both a header line and a delimiter line — without them, the content is parsed as plain text and the pipes render literally.
How it manifests
When viewed on GitHub, in any standard CommonMark+GFM renderer, or in a Markdown previewer, the section displays as three lines of raw text with visible | characters rather than as a table with two columns. This makes the verdict rule visually broken and inconsistent with the well-formed tables elsewhere in the same file.
Why existing code does not prevent it
There is no Markdown linter wired into the repo to catch malformed GFM table syntax, and the file passed review because the content is still semantically parseable by an LLM consumer (which is the primary audience for skill reference files). The issue only surfaces on rendered output.
Step-by-step proof
- Open
plugins/adversarial-verification/skills/adversarial-verification/references/proof-mode.mdon GitHub (or any GFM renderer). - Scroll to line 47, the heading
## Verdict rule for proof mode. - The next non-blank lines are:
| All Ps REFUTED by advocate, NOT PROVED by skeptic | **CONFIRMED** — the finding is real | | Any P clearly PROVED by skeptic | **DISMISSED** — the finding is a false positive | | Any P in dispute (both sides plausible) | **UNCERTAIN** — gather more evidence for that specific P before committing | - There is no preceding header row (e.g.
| Condition | Verdict |) and no delimiter row (e.g.|-----------|---------|). - Per GFM, this fails the table production rule, so the renderer treats the three lines as a paragraph and shows the pipes literally.
- Compare to the well-formed table at lines 21-22 of the same file —
| # | Null hypothesis | What proves it |followed by|---|----------------|----------------|— and at lines 57-58 —| Null hypothesis | Skeptic says | Advocate says | Outcome |followed by|----------------|--------------|---------------|---------|. Both render correctly because they include both required rows.
Impact
Low — purely cosmetic. The reference file is consumed primarily by an LLM that can extract the verdict rule regardless of rendering, so skill behavior is unaffected. However, when a human reads the file on GitHub (e.g. during PR review or when authoring a related skill) the section appears broken and inconsistent with the rest of the document.
Fix
Add the missing header and delimiter rows immediately after the heading on line 47:
## Verdict rule for proof mode
| Condition | Verdict |
|-----------|---------|
| All Ps REFUTED by advocate, NOT PROVED by skeptic | **CONFIRMED** — the finding is real |
| Any P clearly PROVED by skeptic | **DISMISSED** — the finding is a false positive |
| Any P in dispute (both sides plausible) | **UNCERTAIN** — gather more evidence for that specific P before committing |
Summary
Adds a new
adversarial-verificationplugin that stress-tests claims, designs, and bug findings by dispatching two isolated sub-agents (advocate + skeptic) and synthesizing their arguments into a structured verdict. Counters sycophancy and single-agent agreement bias by forcing maximal disagreement before the caller commits.What it does
Two modes:
Core principle: isolated sub-agent contexts are non-negotiable. An agent that sees the other side's arguments will soften to accommodate them. The adversarial value comes from each agent arguing without knowledge of the counter-argument, with synthesis happening separately.
Structure
SKILL.md(main entry point with decision tree for mode selection)references/decision-mode.md(structure for approach selection)references/proof-mode.md(N-null-hypothesis structure for finding verification)references/prompt-templates.md(advocate/skeptic templates enforcing anti-balance)references/synthesis.md(verdict table format + recommendation structure)references/anti-patterns.md(10 common failure modes with diagnoses)When to use
Test plan
python3 .github/scripts/validate_codex_skills.pypasses (verified locally).claude-plugin/marketplace.json.codex/skills/adversarial-verification🤖 Generated with Claude Code