Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions .claude-plugin/marketplace.json
Original file line number Diff line number Diff line change
Expand Up @@ -392,6 +392,17 @@
"url": "https://github.com/trailofbits"
},
"source": "./plugins/dimensional-analysis"
},
{
"name": "adversarial-verification",
"version": "1.0.0",
"description": "Verify claims, designs, and bug findings by dispatching isolated advocate/skeptic sub-agents and synthesizing their arguments into a structured verdict. Counters sycophancy and single-agent agreement bias.",
"author": {
"name": "Julius Alexandre",
"email": "opensource@trailofbits.com",
"url": "https://github.com/trailofbits"
},
"source": "./plugins/adversarial-verification"
}
]
}
1 change: 1 addition & 0 deletions .codex/skills/adversarial-verification
1 change: 1 addition & 0 deletions CODEOWNERS
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
* @dguido

# Plugin-specific owners (alphabetical)
/plugins/adversarial-verification/ @wizardengineer @dguido
/plugins/agentic-actions-auditor/ @BuffaloWill @elopez @dguido
/plugins/ask-questions-if-underspecified/ @kevin-valerio @dguido
/plugins/audit-context-building/ @omarinuwa @dguido
Expand Down
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,7 @@ cd /path/to/parent # e.g., if repo is at ~/projects/skills, be in ~/projects

| Plugin | Description |
|--------|-------------|
| [adversarial-verification](plugins/adversarial-verification/) | Stress-test claims, designs, and bug findings with isolated advocate/skeptic sub-agents |
| [constant-time-analysis](plugins/constant-time-analysis/) | Detect compiler-induced timing side-channels in cryptographic code |
| [mutation-testing](plugins/mutation-testing/) | Configure mewt/muton mutation testing campaigns — scope targets, tune timeouts, optimize long runs |
| [property-based-testing](plugins/property-based-testing/) | Property-based testing guidance for multiple languages and smart contracts |
Expand Down
10 changes: 10 additions & 0 deletions plugins/adversarial-verification/.claude-plugin/plugin.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
{
"name": "adversarial-verification",
"version": "1.0.0",
"description": "Verify claims, designs, and bug findings by dispatching isolated advocate/skeptic sub-agents and synthesizing their arguments into a structured verdict. Counters sycophancy and single-agent agreement bias.",
"author": {
"name": "Julius Alexandre",
"email": "opensource@trailofbits.com",
"url": "https://github.com/trailofbits"
}
}
29 changes: 29 additions & 0 deletions plugins/adversarial-verification/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
# adversarial-verification

Stress-test claims, designs, and bug findings by dispatching two isolated sub-agents — one advocate, one skeptic — and synthesizing their arguments into a structured verdict.

## When to Use

- Choosing between competing technical approaches
- Verifying a bug finding is real (not a false positive)
- Reviewing a design decision before commit
- Any claim you're inclined to agree with by default
- Stress-testing your own reasoning when you suspect it may be one-sided

## What It Does

Counters sycophancy and single-agent agreement bias by forcing maximal disagreement before committing. Each sub-agent runs in isolated context — the advocate never sees the skeptic's arguments and vice versa. After both return, the caller synthesizes a verdict table that picks winners per dimension and produces a concrete recommendation.

### Two modes

| Mode | Claim type | Structure |
|------|-----------|-----------|
| **Decision mode** | "X is the best approach" | Free-form arguments organized by evaluation dimensions |
| **Proof mode** | "X is a real bug/finding" | N null hypotheses — skeptic proves, advocate refutes |

## Installation

```
/plugin marketplace add trailofbits/skills
/plugin install adversarial-verification
```
Original file line number Diff line number Diff line change
@@ -0,0 +1,133 @@
---
name: adversarial-verification
description: Verify a claim, idea, approach, design, or finding by dispatching two isolated sub-agents — an advocate (argues the claim is correct/best) and a skeptic (argues it is wrong/inferior) — then synthesize their arguments into a structured verdict. Counters sycophancy and agreement bias by forcing maximal disagreement before the caller commits. Use when making technical decisions ("should we use X or Y?"), verifying bug findings ("is this a real bug?"), reviewing system designs ("is this architecture sound?"), evaluating strategic claims, or whenever the caller suspects their own reasoning may be one-sided. Triggers on phrases like "verify this claim", "adversarial verification", "is this right?", "prove this is a real bug", "which approach is best", "stress-test this idea", "get a second opinion on", "argue against this", "devil's advocate", or whenever pattern-breaking from agreement is needed.
---

# Adversarial Verification

## Overview

Dispatch two sub-agents with isolated context — one **advocate**, one **skeptic** — to argue opposite sides of a claim as strongly as possible. Then synthesize their arguments into a structured verdict table. This breaks the pattern of single-agent reasoning converging toward agreement and surfaces the strongest objections and the strongest supports in one pass.

**Core principle:** Independent isolated context is non-negotiable. An agent that has read the other side's arguments will soften to accommodate them. The adversarial value comes from each agent arguing without knowledge of the counter-argument.

**Announce at start:** "I'm using the adversarial-verification skill to stress-test this claim."

## When to Use

- Choosing between 2+ technical approaches
- Verifying a bug finding is real (not a false positive)
- Reviewing a design decision before commit
- User asks "is this correct?" about a non-trivial claim
- Any claim you're inclined to agree with by default — that's the tell

## When NOT to Use

- Simple factual lookup ("what version is X?")
- Obvious syntax error fix
- User has already made the decision and is executing

## The Process

### Step 1: State the claim precisely

Before dispatching agents, state the claim in a single sentence. Ambiguous claims produce worthless verifications.

**Bad:** "Should we use yarpgen?"
**Good:** "YARPGen program-level differential testing is the best strategy for finding semantic translation bugs in Rosetta 2, better than grammar-aware x86 mutation or a Cascade-style oracle."

The claim must be **falsifiable** — something the skeptic could in principle prove wrong.

### Step 2: Select the mode

Two modes, chosen by the claim type:

| Claim type | Mode | Details |
|-----------|------|---------|
| Bug finding / security claim | **Proof mode** | Structured N-proof hypotheses (e.g., P1-P5). See [references/proof-mode.md](references/proof-mode.md) |
| Approach / design decision | **Decision mode** | Free-form arguments with evidence. See [references/decision-mode.md](references/decision-mode.md) |

If unsure, default to decision mode.

### Step 3: Dispatch both agents in parallel

Use the Agent tool with TWO tool calls in a SINGLE message (parallel dispatch). Each agent is a fresh context with no knowledge of the other.

Load prompt templates from [references/prompt-templates.md](references/prompt-templates.md). The templates enforce:
- Each agent argues ONE side maximally, not balanced
- Each agent is told explicitly "do not be balanced" and "argue as hard as possible"
- Each agent cites specific evidence (files, line numbers, facts)
- Each agent anticipates and pre-refutes the obvious counter-arguments

Give each agent the **same claim**, the **same background context**, but **opposite instructions**. Never mention the other agent's existence or arguments in either prompt.

### Step 4: Synthesize with a verdict table

After both agents return, produce a verdict table. For each significant point raised by either side:

| Point | Advocate position | Skeptic position | Verdict |
|-------|-------------------|------------------|---------|

Verdict values:
- **Survives** — one side's position held up; the other failed to counter it
- **Weakens** — partially rebutted; position should be qualified
- **Falls** — cleanly refuted by the other side

Then write a one-paragraph **recommendation**: which overall position won, which specific claims survived, and what the caller should actually do.

See [references/synthesis.md](references/synthesis.md) for the full synthesis template.

### Step 5: Report to the caller

Present three things:
1. The claim (one sentence, as stated in Step 1)
2. The verdict table
3. The recommendation (what action follows from the verdict)

Do NOT dump the raw agent outputs unless the user asks. The verdict is the product.

## Reference Guide

- Mode selection quick reference: [references/decision-mode.md](references/decision-mode.md) and [references/proof-mode.md](references/proof-mode.md)
- Prompt templates for advocate and skeptic dispatch: [references/prompt-templates.md](references/prompt-templates.md)
- Verdict table and recommendation shape: [references/synthesis.md](references/synthesis.md)
- Failure modes and recovery patterns: [references/anti-patterns.md](references/anti-patterns.md)

## Anti-patterns

See [references/anti-patterns.md](references/anti-patterns.md) for full failure modes. The three most important:

1. **False symmetry** — treating both sides as equally valid when one is clearly stronger. The verdict must pick a winner, not split the difference.
2. **Hedged agents** — agents that softened their argument. If an agent returns a balanced view, re-dispatch with a stronger prompt. Real adversarial value requires real adversarial arguments.
3. **Shared context leakage** — mentioning the other agent's arguments in either prompt. This collapses independence. Each prompt must be written as if that agent is the only one you've asked.

## Examples

### Example A — approach decision (decision mode)

Claim: *"Using YARPGen to generate C programs is the fastest path to finding semantic translation bugs in Rosetta 2."*

Dispatch:
- Advocate prompt: "Make the strongest case FOR this claim. Cite known bugs YARPGen would catch, expected exec/s, why compiler-emitted code is the right attack surface. Do not be balanced."
- Skeptic prompt: "Make the strongest case AGAINST. YARPGen has no FP support, known Rosetta bugs are in FP/SIMD/implicit registers, oracle problem without Intel hardware. Do not be balanced."

Result: verdict table shows skeptic's "no FP support" and "oracle problem" survive; advocate's "fastest to set up" survives. Recommendation: use YARPGen as complement, not primary strategy.

### Example B — bug verification (proof mode)

Claim: *"FINDING-001 (pcmpestrm register allocator abort) is a real translation bug, not a false positive."*

Dispatch with 5 proofs, each tests one null hypothesis:
- P1: "This is just normal input rejection (exit -302)."
- P2: "This is a harness artifact (doesn't reproduce in clean env)."
- P3: "This is a benign assertion (SIGABRT in validation code)."
- P4: "The input is unreachable in practice (no compiler emits it)."
- P5: "Already fixed in a newer macOS."

Skeptic tries to prove each null; advocate tries to refute each. If all 5 fail to prove = CONFIRMED bug.

## Integration

**Called by:**
- User directly via explicit request
- Any skill that needs to verify a claim before acting on it — e.g., when brainstorming an approach choice, when evaluating a code review suggestion that seems technically questionable, or when verifying a proposed root cause before applying a fix
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
# Anti-patterns

Common failure modes that destroy the adversarial value of this skill. Each has a diagnosis and a fix.

## 1. False symmetry

**Symptom:** The verdict says "both sides have merit" and splits the difference. No clear winner. Vague recommendation like "consider both approaches."

**Why it happens:** Reluctance to pick a side. Treating the adversarial structure as performative rather than decisional.

**Fix:** Pick a winner. The dimensions in the verdict table each have one winner, not a tie. If truly 3/3 split across dimensions, the winner is whichever dimensions matter most for the caller's actual decision — name those dimensions explicitly in the recommendation.

## 2. Hedged agents

**Symptom:** Agent returns things like "I see merits on both sides," "this is a nuanced question," "both approaches have tradeoffs." No strong adversarial position.

**Why it happens:** Weak prompt. The agent defaulted to balanced reasoning because nothing stopped it.

**Fix:** Re-dispatch with a stronger prompt. The key phrase: "Do not acknowledge merit in the opposing position. Do not hedge." Do not accept a hedged response as valid output. If you need the full templates, return to [SKILL.md](../SKILL.md) Step 3.

## 3. Shared context leakage

**Symptom:** Advocate mentions the skeptic's arguments (or vice versa), softening the tone to accommodate. Arguments collapse toward agreement.

**Why it happens:** Prompt mentioned the other agent's existence or previewed their arguments.

**Fix:** Each prompt must be written as if that agent is the ONLY agent you've asked. Do not mention the other side. Do not say "another agent will argue X." Do not give hints about counter-arguments. The synthesis happens separately after both return.

## 4. Unfalsifiable claim

**Symptom:** The skeptic can't argue against the claim because it's too vague or too broad.

**Why it happens:** Claim wasn't stated precisely in Step 1.

**Fix:** Return to Step 1. Rewrite the claim as a specific, falsifiable sentence. "YARPGen is good" is unfalsifiable. "YARPGen will catch more Rosetta 2 semantic bugs than grammar-aware mutation in 7 days of fuzzing" is falsifiable.

## 5. Missing evidence

**Symptom:** Agents make claims but don't cite specific files, line numbers, CVEs, benchmarks. Arguments are plausible but unverifiable.

**Why it happens:** Prompt didn't require citations.

**Fix:** Every prompt includes: "Cite specific files, line numbers, facts, CVEs, benchmarks where possible." Reject outputs that make unsupported claims on critical dimensions.

## 6. Wrong mode

**Symptom:** Trying to prove a bug is real with "what do you think is best" prompts, or trying to pick between approaches with proof-style null hypotheses.

**Fix:** Reread [SKILL.md](../SKILL.md) Step 2. Decision mode = approach/design choice. Proof mode = bug/finding/security claim verification. Pick the right one.

## 7. Synthesis dump

**Symptom:** Presenting both agents' full outputs as the result. No verdict table. No recommendation.

**Why it happens:** Skipping the synthesis step.

**Fix:** The verdict table is the product, not the raw arguments. Always produce the table + recommendation. Only dump raw agent outputs if the user explicitly asks for them.

## 8. Confirmation bias in prompt

**Symptom:** Advocate wins trivially because the prompt was stacked in its favor. The skeptic has nothing to work with.

**Why it happens:** Caller's preferred answer leaks into the prompt framing.

**Fix:** Both prompts should frame the claim neutrally. "Make the case FOR/AGAINST {CLAIM}" not "Make the case FOR the obviously correct {CLAIM}". Both agents get the SAME background. If the advocate gets more context than the skeptic, the test is rigged.

## 9. Too many dimensions

**Symptom:** Verdict table has 15 rows. Each row is shallow. Recommendation is unclear because so many points survived/fell.

**Fix:** Pick 3-5 dimensions. The dimensions should be the ones that actually determine the decision. Cut dimensions that don't move the verdict either way.

## 10. Ignoring UNCERTAIN

**Symptom:** Proof-mode verdict marks every null as REFUTED/PROVED when some were actually UNCERTAIN. Finding is reported as CONFIRMED when one null is still plausible.

**Fix:** UNCERTAIN is a valid outcome. If P4 ("input unreachable") has evidence on both sides, the finding is NOT confirmed. Either:
- Gather more evidence on that specific P before concluding, OR
- Report the finding WITH the caveat that P4 is uncertain

Do not round UNCERTAIN up to CONFIRMED.
Loading
Loading