Skip to content

Harden review prompts for consistency and noise reduction#579

Draft
mariusvniekerk wants to merge 6 commits intomainfrom
review-skill-improver
Draft

Harden review prompts for consistency and noise reduction#579
mariusvniekerk wants to merge 6 commits intomainfrom
review-skill-improver

Conversation

@mariusvniekerk
Copy link
Collaborator

Summary

  • Define impact-based severity levels — Replace bare high/medium/low labels with concrete definitions tied to real-world impact (data loss, exploitability, blast radius). Gives all agents a shared calibration standard so severity is consistent across reviews.
  • Require concrete harm articulation — Every finding must now explain what specifically goes wrong if left unfixed. Eliminates vague "violates best practices" findings by forcing agents to justify each issue with concrete reasoning.
  • Add evidence thresholds — Explicit "do not report" instructions suppress the most common false positive categories: hypothetical issues in unseen code, style opinions, unfounded "missing tests" claims, and flagging codebase conventions as issues.
  • Add intent-implementation alignment check — Reverse the old "do not review the commit message" instruction. The commit message now serves as the primary lens for evaluating the diff, catching gaps between what the developer intended and what they actually wrote.
  • Add self-review quality gate — Before outputting, agents must verify every finding has a specific file/line reference, severity matches described impact, and no findings contradict each other. Drops findings that fail.
  • Add evidence thresholds to insights analysis — Tiered confidence thresholds (1-2 = data point, 3-5 = candidate, 6+ = strong recommendation) prevent guideline suggestions from single occurrences and give high confidence to well-evidenced patterns.

🤖 Generated with Claude Code

mariusvniekerk and others added 6 commits March 24, 2026 13:09
Bare "high/medium/low" labels give agents no shared calibration standard,
leading to inconsistent severity across reviews. Defining each level in
terms of real-world impact (data loss, exploitability, blast radius) aligns
all agents on the same scale and naturally prevents low-value findings from
being over-rated.

Inspired by the impact × breadth scoring pattern from research-oriented
analysis skills.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace "brief explanation of the problem" with "what specifically goes
wrong if this is not fixed." This is the articulation test pattern from
research-oriented analysis skills — every finding must justify itself with
concrete impact reasoning, not just pattern-matching against a checklist.

Findings like "this violates best practices" become impossible to write
when the prompt demands specific harm. This is the single most effective
noise reduction technique across the mop-mapping skill set.

Applied to all review types: standard, dirty, range, security, and design.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The /rethink skill uses explicit evidence thresholds — "1 observation is a
data point, 3+ is a pattern worth investigating." The /verify skill grounds
every check in specific data. Applied here as negative prompt instructions
that suppress the most common false positive categories: hypothetical issues
in unseen code, style preferences, unfounded "missing tests" claims, and
flagging patterns that match existing codebase conventions.

Security reviews get a lighter version — they should still err toward
reporting, but not flag theoretical vulnerabilities in untouched code.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The /verify skill's "recite" phase is its most powerful technique: read only
the title, predict what the content should be, then check alignment. Applied
here by reversing the old instruction "Do not review the commit message" —
the commit message now becomes the primary lens for evaluating the diff.

When a commit says "fix race condition" but the diff adds a mutex on the
wrong resource, that's a high-value finding that pure diff-scanning misses.
Intent-implementation gaps are now the first check category, above bugs and
security, because they catch the class of errors where the code is
internally consistent but doesn't do what the developer intended.

The dirty-changes prompt is unchanged since uncommitted changes have no
commit message to analyze.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The /verify and /synthesize skills both enforce quality gates — checks that
must pass before output is considered complete. Applied here as a final
self-verification instruction: every finding must reference a specific
diff location, severity must match the described impact, and no two findings
may contradict each other. Findings that fail these checks are dropped.

This catches the most embarrassing review failures (high-severity verdict
with no actual line references, "pass" with critical findings listed) at
near-zero cost since the model performs the check during the same generation.

Applied to all review types: standard, dirty, range, and security.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The /rethink skill's evidence accumulation pattern — "1 observation is a
data point, 3+ is a pattern worth investigating" — directly applies to
the insights system. Without explicit thresholds, the insights agent may
recommend guideline changes from 1-2 occurrences (noise) or hesitate on
strong 6+ patterns.

Added tiered thresholds to the recurring patterns section and gated
guideline suggestions on minimum 3 occurrences. This helps close the
feedback loop between review noise and guideline refinement with
appropriate confidence levels.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@roborev-ci
Copy link

roborev-ci bot commented Mar 25, 2026

roborev: Combined Review (3934535)

Summary Verdict: The changes successfully tighten review prompts with severity definitions and evidence thresholds, but introduce regressions by restricting blind-spot-driven guideline generation and creating false-negative paths for
security vulnerabilities outside the immediate diff.

Medium Severity

  • Location: internal/prompt/prompt.go:39, 78, 108, 921

    • Problem: The new instructions tell reviewers not to report issues in code not shown in the diff
      and require a "plausible exploit path visible in the diff" for security reviews. A malicious contributor can exploit this by submitting a small change that connects external input to an existing dangerous sink outside the patch. The prompt directs the reviewer to drop the finding if the sink or full taint flow lives in unchanged code, creating a
      predictable false-negative path for real security bugs.
    • Fix: Keep the anti-speculation guardrail, but explicitly allow reviewers to inspect unchanged surrounding code when needed to validate whether the changed path introduces a vulnerability. A safer rule is: "Do not speculate without evidence, but you may use nearby unchanged
      code to confirm whether the diff creates or exposes an exploit path."
  • Location: internal/prompt/insights.go:53

    • Problem: Section 5 now allows guideline suggestions only from section 1 or section 3 evidence, but excludes section 2 recurring blind spots. This
      means the insights pass can identify a repeated missing-guideline pattern in section 2 and still be unable to recommend the corresponding guideline text, which is a direct regression in the output's usefulness.
    • Fix: Allow section 2 patterns with the same evidence threshold to feed section 5 guideline suggestions.

Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant