Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 26 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -86,6 +86,32 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
Wardline-private bit would break fact reconciliation entirely. Closing it fully needs
a Clarion read-path contract change; the keying site carries an explicit comment. This
path is opt-in and not the scan gate, so impact is lower.
- **The `--fail-on` gate no longer honours repository-controlled suppressions by
default (closes a CI-gate bypass).** `.wardline/baseline.yaml`, `wardline.yaml`
waivers, and `.wardline/judged.yaml` are all committed repository content, so a
malicious pull request could add a suppression entry keyed to its own new defect's
fingerprint and clear the gate. The gate now evaluates the **unsuppressed**
population by default; baseline / waiver / judged still **annotate** the emitted
findings (`suppressed: baselined | waived | judged`) but cannot clear the gate. The
secure CI ratchet is the operator-supplied, unforgeable `--new-since <merge-base>`,
which scopes **both** the emitted findings and the gate. A new `--trust-suppressions`
flag (CLI) / `trust_suppressions` arg (MCP `scan`), default false, restores the old
post-suppression gate for **trusted local checkouts** (and is what the `judge`
workflow uses internally). `.wardline/judged.yaml` records now also **require**
`verdict: FALSE_POSITIVE` on load — a missing or non-FP verdict is rejected, so a
hand-edited judged entry cannot be smuggled in as a silent suppression
(`build_judged_document` always emits it, so machine round-trips stay valid). New
`ScanResult.gate_findings` field carries the unsuppressed gate population (None
sentinel = trust suppressions / fall back to `findings`).

> **BREAKING (acceptable at 0.x):** a CI job that relies on a committed baseline
> (or waiver / judged file) to keep `wardline scan --fail-on=…` green will now go
> **red** on upgrade, because the baselined defects re-enter the gate population. Add
> `--new-since <merge-base>` (recommended for CI) or `--trust-suppressions` (trusted
> checkouts only) to restore a passing gate. Note: legis's scan artifact and the
> "one judge / reproduces Wardline's gate population exactly" property are derived
> from the annotated `findings`, so they continue to reflect the suppressed view;
> only the local `--fail-on` exit code changed.
- **Dangerous-sink rules now see lambda bodies (closes a false-green).** `_own_calls`
treated `ast.Lambda` as a separate scope and only inspected lambda *default*
expressions, so a sink reached inside a lambda *body* — `cb = lambda: eval(src)`,
Expand Down
47 changes: 43 additions & 4 deletions docs/guides/suppression.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,11 +21,45 @@ $ wardline scan .
scanned 2 file(s); 4 finding(s) — 1 suppressed (1 baseline / 0 waiver / 0 judged), 0 new -> findings.jsonl
```

## Suppressions and the `--fail-on` gate (read this first)

All three layers — baseline, waiver, judged — live in **committed repository
content** (`.wardline/baseline.yaml`, `wardline.yaml`, `.wardline/judged.yaml`).
That makes them attacker-controllable in an untrusted pull request: a PR can add a
suppression entry keyed to its own new defect's fingerprint.

So, **by default the `--fail-on` gate evaluates the *unsuppressed* population.**
Baseline / waiver / judged still **annotate** every emitted finding (you see
`suppressed: baselined | waived | judged` in the output) — they just do **not**
clear the gate. A self-suppressing PR therefore still goes red.

Two ways to scope or relax the gate, depending on trust:

- **`--new-since <merge-base>` — the secure CI ratchet.** The git ref is supplied
by the operator (the pipeline), not by repository content, so it is unforgeable.
It scopes **both** the emitted findings and the gate to findings new since the
ref: a pre-existing defect outside the delta does not trip; a new one inside it
does, and no committed suppression can clear it. This is the recommended adopt-an-
existing-codebase pattern in CI.
- **`--trust-suppressions` — trusted local checkouts only.** Restores the old
behaviour: baseline / waiver / judged clear the gate. Use it when you are running
Wardline on a checkout you trust (your own working tree, the judge DX loop). **Do
not** enable it in CI on untrusted PR content.

The MCP `scan` tool mirrors this exactly: `new_since` and a `trust_suppressions`
boolean (default false).

## Baseline

A baseline is a git-committable snapshot of findings you accept as-is. It is the
fast on-ramp for an existing project: capture everything once, then let the
`--fail-on` gate fire only on findings that appear *after* the snapshot.
fast on-ramp for an existing project: capture everything once so they are
annotated as `baselined` in scan output.

Note (changed): a baseline **annotates** but no longer clears the `--fail-on`
gate by default — see [Suppressions and the `--fail-on` gate](#suppressions-and-the-fail-on-gate-read-this-first)
above. To make the gate "fire only on findings that appear after the snapshot",
use the unforgeable `--new-since <merge-base>` ratchet in CI, or
`--trust-suppressions` on a trusted local checkout.

```
wardline baseline [OPTIONS] COMMAND [ARGS]...
Expand Down Expand Up @@ -136,8 +170,13 @@ findings:

Commit `.wardline/judged.yaml` like the baseline. A judged suppression is
advisory — the rationale is recorded precisely so a human can audit it and revert
by deleting the entry. See the [LLM triage judge](judge.md) guide for how
verdicts are produced and the `--write` confidence floor.
by deleting the entry. Like the other layers it **annotates** but does not clear
the `--fail-on` gate by default (see [the gate section](#suppressions-and-the-fail-on-gate-read-this-first));
the `judge` workflow itself always consults judged records. Each record must carry
`verdict: FALSE_POSITIVE` — a record without it, or with any other verdict, is
rejected on load so a hand-edited entry cannot become a silent suppression. See the
[LLM triage judge](judge.md) guide for how verdicts are produced and the `--write`
confidence floor.

## A note on line sensitivity

Expand Down
14 changes: 14 additions & 0 deletions src/wardline/cli/scan.py
Original file line number Diff line number Diff line change
Expand Up @@ -102,6 +102,17 @@
default=False,
help="Allow wardline.yaml source_roots to resolve outside PATH.",
)
@click.option(
"--trust-suppressions",
is_flag=True,
default=False,
help=(
"Let repository-controlled baseline/waiver/judged files clear the --fail-on gate "
"(they always annotate findings regardless). Use ONLY for trusted local checkouts; "
"in CI prefer the unforgeable --new-since <merge-base> ratchet. Default off: by "
"default the gate evaluates the unsuppressed population so a PR cannot self-suppress."
),
)
def scan(
path: Path,
config_path: Path | None,
Expand All @@ -119,6 +130,7 @@ def scan(
yes: bool,
strict_defaults: bool,
allow_source_root_escape: bool,
trust_suppressions: bool,
) -> None:
"""Scan PATH for findings."""
if fmt == "sarif":
Expand Down Expand Up @@ -158,6 +170,7 @@ def scan(
trusted_packs=trusted_packs,
strict_defaults=strict_defaults,
confine_to_root=not allow_source_root_escape,
trust_suppressions=trust_suppressions,
)
findings = result.findings
if fix:
Expand Down Expand Up @@ -193,6 +206,7 @@ def confirm_cb(rel_path: str, orig: str, replacement: str, f: Finding) -> bool:
trusted_packs=trusted_packs,
strict_defaults=strict_defaults,
confine_to_root=not allow_source_root_escape,
trust_suppressions=trust_suppressions,
)
findings = result.findings
if fmt == "sarif":
Expand Down
4 changes: 4 additions & 0 deletions src/wardline/core/judge_run.py
Original file line number Diff line number Diff line change
Expand Up @@ -179,6 +179,10 @@ def _default_caller(req: JudgeRequest) -> JudgeResponse:
trust_local_packs=trust_local_packs,
trusted_packs=trusted_packs,
strict_defaults=strict_defaults,
# The judge flow is the trusted local path: it consults judged records. The
# emitted ``findings`` are always judged-annotated regardless of this flag;
# passing True keeps the gate (if any consumer reads it) on the trusted set too.
trust_suppressions=True,
)
judged_set = load_judged(root / ".wardline" / "judged.yaml")

Expand Down
9 changes: 9 additions & 0 deletions src/wardline/core/judged.py
Original file line number Diff line number Diff line change
Expand Up @@ -110,6 +110,15 @@ def load_judged(path: Path) -> JudgedSet:
if fp in seen:
raise ConfigError(f"{path.name} findings[{idx}]: duplicate fingerprint {fp!r}")
seen.add(fp)
# A judged record suppresses a finding ONLY as a FALSE_POSITIVE verdict. Require
# the field and reject any other value so a hand-edited TRUE_POSITIVE (or a
# missing verdict) cannot be smuggled in as a silent suppression. write_judged
# always emits verdict: FALSE_POSITIVE, so machine round-trips stay valid.
verdict = _require_str(e, "verdict", idx, path.name)
if verdict != "FALSE_POSITIVE":
raise ConfigError(
f"{path.name} findings[{idx}].verdict must be FALSE_POSITIVE, got {verdict!r}"
)
rationale = _require_str(e, "rationale", idx, path.name)
# Provenance is the audit primitive — never default it. A judged record with
# no attributable model / policy / confidence is an unauditable suppression.
Expand Down
79 changes: 63 additions & 16 deletions src/wardline/core/run.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
from typing import TYPE_CHECKING

from wardline.core import config as config_mod
from wardline.core.baseline import load_baseline
from wardline.core.baseline import Baseline, load_baseline
from wardline.core.delta import get_affected_entities, get_changed_files_since
from wardline.core.discovery import discover, missing_source_roots
from wardline.core.errors import ConfigError
Expand Down Expand Up @@ -45,7 +45,8 @@ def _fp(*parts: str) -> str:
@dataclass(frozen=True, slots=True)
class ScanSummary:
total: int # every finding (defects + facts/metrics)
active: int # non-suppressed DEFECTs — the gate population
active: int # non-suppressed DEFECTs in the emitted findings (NOT the gate population —
# the gate evaluates ScanResult.gate_findings unless --trust-suppressions)
baselined: int
waived: int
judged: int
Expand All @@ -66,6 +67,14 @@ class ScanResult:
# this exact run instead of re-deriving. Never serialised over MCP.
context: AnalysisContext | None
scanned_paths: tuple[str, ...] = ()
# The UNSUPPRESSED gate population (None SENTINEL — never a falsy-empty fallback).
# Repository-controlled baseline/waiver/judged still ANNOTATE ``findings`` (visible
# as ``suppressed=…``), but a malicious PR must not be able to clear the ``--fail-on``
# gate by committing a suppression keyed to its own new defect. ``gate_decision``
# evaluates this when it is not None, else falls back to ``findings`` (the trusted,
# local ``--trust-suppressions`` / directly-constructed-ScanResult behaviour). It is
# scoped by ``--new-since`` identically to ``findings``.
gate_findings: list[Finding] | None = None


@dataclass(frozen=True, slots=True)
Expand All @@ -85,6 +94,7 @@ def run_scan(
trust_local_packs: bool = False,
trusted_packs: tuple[str, ...] = (),
strict_defaults: bool = False,
trust_suppressions: bool = False,
) -> ScanResult:
"""Discover → analyze → apply suppressions. Pure function of (disk + config).

Expand All @@ -94,6 +104,16 @@ def run_scan(
``confine_to_root`` (default True) makes ``discover`` reject any
``source_root`` that resolves outside ``root``. Callers that intentionally
scan outside the project root must opt out explicitly.

``trust_suppressions`` (default False) is the SECURITY default. When False the
``--fail-on`` gate evaluates a separately-built UNSUPPRESSED population
(``ScanResult.gate_findings``): repository-controlled baseline/waiver/judged
files still annotate the emitted ``findings`` but cannot clear the gate, so a
malicious PR cannot self-suppress its own new defect. When True the gate falls
back to the suppressed ``findings`` (``gate_findings`` is set to None) — the
trusted local / judge-DX behaviour, an explicit operator trust decision suitable
only for a trusted checkout, never for enforcement on untrusted PR content. The
secure CI ratchet is the operator-supplied, unforgeable ``--new-since`` instead.
"""
from wardline.scanner.analyzer import build_analyzer
from wardline.scanner.grammar import TrustGrammar, default_grammar
Expand Down Expand Up @@ -185,7 +205,21 @@ def run_scan(
baseline = load_baseline(root / ".wardline" / "baseline.yaml")
waivers = WaiverSet(parse_waivers(cfg.waivers))
judged = load_judged(root / ".wardline" / "judged.yaml")
findings = apply_suppressions(raw, baseline, waivers, today=date.today(), judged=judged)
today = date.today()
# The emitted findings ALWAYS carry the full suppression annotations (baseline,
# waiver, judged) so ``suppressed=…`` is visible in output regardless of trust.
findings = apply_suppressions(raw, baseline, waivers, today=today, judged=judged)
# The gate population applies ZERO suppression but runs the SAME structural
# transforms apply_suppressions does (esp. the lineless-DEFECT→non-gating-FACT
# downgrade), so the only difference vs ``findings`` is the suppression sources —
# NOT ``list(raw)``, which would let a lineless DEFECT trip the gate. When the
# operator trusts repo suppressions, gate_findings is None and the gate falls back
# to the suppressed ``findings`` (None SENTINEL, never an accidental falsy-empty).
gate_findings: list[Finding] | None
if trust_suppressions:
gate_findings = None
else:
gate_findings = apply_suppressions(raw, Baseline(frozenset()), WaiverSet([]), today=today, judged=None)

if new_since is not None:
changed_files = get_changed_files_since(new_since, root)
Expand All @@ -195,18 +229,26 @@ def run_scan(
else:
affected = set()

new_findings = []
for f in findings:
if f.kind is Kind.DEFECT and f.suppressed is SuppressionState.ACTIVE:
is_new = (f.location.path in changed_files) or (f.qualname is not None and f.qualname in affected)
if not is_new:
f = replace(
f,
suppressed=SuppressionState.BASELINED,
suppression_reason=f"delta: unchanged since {new_since}",
)
new_findings.append(f)
findings = new_findings
def apply_delta_scope(candidates: list[Finding]) -> list[Finding]:
# Suppress any ACTIVE defect outside the delta so the gate only fires on
# findings new since ``new_since``. Applied to BOTH emitted and gate
# populations so the operator-supplied (unforgeable) ratchet scopes the gate.
scoped: list[Finding] = []
for f in candidates:
if f.kind is Kind.DEFECT and f.suppressed is SuppressionState.ACTIVE:
is_new = (f.location.path in changed_files) or (f.qualname is not None and f.qualname in affected)
if not is_new:
f = replace(
f,
suppressed=SuppressionState.BASELINED,
suppression_reason=f"delta: unchanged since {new_since}",
)
scoped.append(f)
return scoped

findings = apply_delta_scope(findings)
if gate_findings is not None:
gate_findings = apply_delta_scope(gate_findings)

defects = [f for f in findings if f.kind is Kind.DEFECT]
summary = ScanSummary(
Expand All @@ -227,12 +269,17 @@ def run_scan(
path.relative_to(resolved_root).as_posix() if path.is_relative_to(resolved_root) else path.as_posix()
for path in files
),
gate_findings=gate_findings,
)


def gate_decision(result: ScanResult, fail_on: Severity | None) -> GateDecision:
"""Translate a scan into a pass/fail verdict. A trip is data, not an error."""
if fail_on is None:
return GateDecision(tripped=False, fail_on=None, exit_class=0)
tripped = gate_trips(result.findings, fail_on)
# None SENTINEL: evaluate the unsuppressed gate population when present (secure
# default), else the suppressed ``findings`` (trusted ``--trust-suppressions`` /
# a directly-constructed ScanResult with no gate_findings).
gate_population = result.gate_findings if result.gate_findings is not None else result.findings
tripped = gate_trips(gate_population, fail_on)
return GateDecision(tripped=tripped, fail_on=fail_on.value, exit_class=1 if tripped else 0)
14 changes: 13 additions & 1 deletion src/wardline/mcp/server.py
Original file line number Diff line number Diff line change
Expand Up @@ -189,6 +189,7 @@ def _scan(
new_since = args.get("new_since")
trusted_packs = _trusted_packs_arg(args)
cache_dir = _cache_dir_arg(args, root)
trust_suppressions = bool(args.get("trust_suppressions") or False)
result = run_scan(
path,
config_path=_cfg(args, root),
Expand All @@ -198,6 +199,7 @@ def _scan(
trust_local_packs=trust_local_packs,
trusted_packs=trusted_packs,
strict_defaults=strict_defaults,
trust_suppressions=trust_suppressions,
)
# Fail-soft Clarion write: only when a client was injected (server has a URL).
# An outage/403 yields a not-reachable WriteResult; never raises here.
Expand Down Expand Up @@ -722,7 +724,10 @@ def _register_tools(self) -> None:
Tool(
name="scan",
description="Whole-program taint scan of the project. Returns structured "
"findings, the suppression summary (active = the gate population), "
"findings, the suppression summary (active = unsuppressed defects; "

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Correct MCP summary active semantics

For scans where a repository baseline/waiver/judged record suppresses the only defect, the MCP response still reports summary.active from the emitted, suppressed result.findings population, so it will be 0 while the new unsuppressed gate can still trip. This description tells agents that active is the unsuppressed gate population, which makes the structured tool contract misleading precisely in the new secure-default scenario and can cause callers to conclude there are no defects to address even though gate.tripped is true.

Useful? React with 👍 / 👎.

"by default the --fail-on gate evaluates the UNSUPPRESSED population so "
"repo-controlled baseline/waiver/judged annotate but do not clear it — "
"pass `trust_suppressions: true` for the trusted-local behaviour), "
"and the gate verdict. Pass `where` to filter the returned findings "
"(conjunctive; summary/gate stay whole-project) and `explain: true` to inline "
"each active defect's taint provenance — one call, no per-finding explain_taint. "
Expand Down Expand Up @@ -778,6 +783,13 @@ def _register_tools(self) -> None:
"type": "boolean",
"description": "Ignore repository-supplied custom configuration overrides (wardline.yaml)",
},
"trust_suppressions": {
"type": "boolean",
"description": "Let repository-controlled baseline/waiver/judged clear the gate "
"(they always annotate findings regardless). Default false — the gate "
"evaluates the unsuppressed population so a PR cannot self-suppress its "
"own defect. Use only on a trusted checkout; in CI prefer new_since.",
},
},
},
handler=lambda args, root: _scan(
Expand Down
Loading
Loading