foundryside-dev · tachyon-beep · Jun 5, 2026 · Jun 5, 2026 · chatgpt-codex-connector · Jun 5, 2026
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -86,6 +86,32 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
   Wardline-private bit would break fact reconciliation entirely. Closing it fully needs
   a Clarion read-path contract change; the keying site carries an explicit comment. This
   path is opt-in and not the scan gate, so impact is lower.
+- **The `--fail-on` gate no longer honours repository-controlled suppressions by
+  default (closes a CI-gate bypass).** `.wardline/baseline.yaml`, `wardline.yaml`
+  waivers, and `.wardline/judged.yaml` are all committed repository content, so a
+  malicious pull request could add a suppression entry keyed to its own new defect's
+  fingerprint and clear the gate. The gate now evaluates the **unsuppressed**
+  population by default; baseline / waiver / judged still **annotate** the emitted
+  findings (`suppressed: baselined | waived | judged`) but cannot clear the gate. The
+  secure CI ratchet is the operator-supplied, unforgeable `--new-since <merge-base>`,
+  which scopes **both** the emitted findings and the gate. A new `--trust-suppressions`
+  flag (CLI) / `trust_suppressions` arg (MCP `scan`), default false, restores the old
+  post-suppression gate for **trusted local checkouts** (and is what the `judge`
+  workflow uses internally). `.wardline/judged.yaml` records now also **require**
+  `verdict: FALSE_POSITIVE` on load — a missing or non-FP verdict is rejected, so a
+  hand-edited judged entry cannot be smuggled in as a silent suppression
+  (`build_judged_document` always emits it, so machine round-trips stay valid). New
+  `ScanResult.gate_findings` field carries the unsuppressed gate population (None
+  sentinel = trust suppressions / fall back to `findings`).
+
+  > **BREAKING (acceptable at 0.x):** a CI job that relies on a committed baseline
+  > (or waiver / judged file) to keep `wardline scan --fail-on=…` green will now go
+  > **red** on upgrade, because the baselined defects re-enter the gate population. Add
+  > `--new-since <merge-base>` (recommended for CI) or `--trust-suppressions` (trusted
+  > checkouts only) to restore a passing gate. Note: legis's scan artifact and the
+  > "one judge / reproduces Wardline's gate population exactly" property are derived
+  > from the annotated `findings`, so they continue to reflect the suppressed view;
+  > only the local `--fail-on` exit code changed.
 - **Dangerous-sink rules now see lambda bodies (closes a false-green).** `_own_calls`
   treated `ast.Lambda` as a separate scope and only inspected lambda *default*
   expressions, so a sink reached inside a lambda *body* — `cb = lambda: eval(src)`,

diff --git a/docs/guides/suppression.md b/docs/guides/suppression.md
@@ -21,11 +21,45 @@ $ wardline scan .
 scanned 2 file(s); 4 finding(s) — 1 suppressed (1 baseline / 0 waiver / 0 judged), 0 new -> findings.jsonl
 ```
 
+## Suppressions and the `--fail-on` gate (read this first)
+
+All three layers — baseline, waiver, judged — live in **committed repository
+content** (`.wardline/baseline.yaml`, `wardline.yaml`, `.wardline/judged.yaml`).
+That makes them attacker-controllable in an untrusted pull request: a PR can add a
+suppression entry keyed to its own new defect's fingerprint.
+
+So, **by default the `--fail-on` gate evaluates the *unsuppressed* population.**
+Baseline / waiver / judged still **annotate** every emitted finding (you see
+`suppressed: baselined | waived | judged` in the output) — they just do **not**
+clear the gate. A self-suppressing PR therefore still goes red.
+
+Two ways to scope or relax the gate, depending on trust:
+
+- **`--new-since <merge-base>` — the secure CI ratchet.** The git ref is supplied
+  by the operator (the pipeline), not by repository content, so it is unforgeable.
+  It scopes **both** the emitted findings and the gate to findings new since the
+  ref: a pre-existing defect outside the delta does not trip; a new one inside it
+  does, and no committed suppression can clear it. This is the recommended adopt-an-
+  existing-codebase pattern in CI.
+- **`--trust-suppressions` — trusted local checkouts only.** Restores the old
+  behaviour: baseline / waiver / judged clear the gate. Use it when you are running
+  Wardline on a checkout you trust (your own working tree, the judge DX loop). **Do
+  not** enable it in CI on untrusted PR content.
+
+The MCP `scan` tool mirrors this exactly: `new_since` and a `trust_suppressions`
+boolean (default false).
+
 ## Baseline
 
 A baseline is a git-committable snapshot of findings you accept as-is. It is the
-fast on-ramp for an existing project: capture everything once, then let the
-`--fail-on` gate fire only on findings that appear *after* the snapshot.
+fast on-ramp for an existing project: capture everything once so they are
+annotated as `baselined` in scan output.
+
+Note (changed): a baseline **annotates** but no longer clears the `--fail-on`
+gate by default — see [Suppressions and the `--fail-on` gate](#suppressions-and-the-fail-on-gate-read-this-first)
+above. To make the gate "fire only on findings that appear after the snapshot",
+use the unforgeable `--new-since <merge-base>` ratchet in CI, or
+`--trust-suppressions` on a trusted local checkout.
 
 ```
 wardline baseline [OPTIONS] COMMAND [ARGS]...
@@ -136,8 +170,13 @@ findings:
 
 Commit `.wardline/judged.yaml` like the baseline. A judged suppression is
 advisory — the rationale is recorded precisely so a human can audit it and revert
-by deleting the entry. See the [LLM triage judge](judge.md) guide for how
-verdicts are produced and the `--write` confidence floor.
+by deleting the entry. Like the other layers it **annotates** but does not clear
+the `--fail-on` gate by default (see [the gate section](#suppressions-and-the-fail-on-gate-read-this-first));
+the `judge` workflow itself always consults judged records. Each record must carry
+`verdict: FALSE_POSITIVE` — a record without it, or with any other verdict, is
+rejected on load so a hand-edited entry cannot become a silent suppression. See the
+[LLM triage judge](judge.md) guide for how verdicts are produced and the `--write`
+confidence floor.
 
 ## A note on line sensitivity
 

diff --git a/src/wardline/cli/scan.py b/src/wardline/cli/scan.py
@@ -102,6 +102,17 @@
     default=False,
     help="Allow wardline.yaml source_roots to resolve outside PATH.",
 )
+@click.option(
+    "--trust-suppressions",
+    is_flag=True,
+    default=False,
+    help=(
+        "Let repository-controlled baseline/waiver/judged files clear the --fail-on gate "
+        "(they always annotate findings regardless). Use ONLY for trusted local checkouts; "
+        "in CI prefer the unforgeable --new-since <merge-base> ratchet. Default off: by "
+        "default the gate evaluates the unsuppressed population so a PR cannot self-suppress."
+    ),
+)
 def scan(
     path: Path,
     config_path: Path | None,
@@ -119,6 +130,7 @@ def scan(
     yes: bool,
     strict_defaults: bool,
     allow_source_root_escape: bool,
+    trust_suppressions: bool,
 ) -> None:
     """Scan PATH for findings."""
     if fmt == "sarif":
@@ -158,6 +170,7 @@ def scan(
             trusted_packs=trusted_packs,
             strict_defaults=strict_defaults,
             confine_to_root=not allow_source_root_escape,
+            trust_suppressions=trust_suppressions,
         )
         findings = result.findings
         if fix:
@@ -193,6 +206,7 @@ def confirm_cb(rel_path: str, orig: str, replacement: str, f: Finding) -> bool:
                         trusted_packs=trusted_packs,
                         strict_defaults=strict_defaults,
                         confine_to_root=not allow_source_root_escape,
+                        trust_suppressions=trust_suppressions,
                     )
                     findings = result.findings
         if fmt == "sarif":

diff --git a/src/wardline/core/judge_run.py b/src/wardline/core/judge_run.py
@@ -179,6 +179,10 @@ def _default_caller(req: JudgeRequest) -> JudgeResponse:
         trust_local_packs=trust_local_packs,
         trusted_packs=trusted_packs,
         strict_defaults=strict_defaults,
+        # The judge flow is the trusted local path: it consults judged records. The
+        # emitted ``findings`` are always judged-annotated regardless of this flag;
+        # passing True keeps the gate (if any consumer reads it) on the trusted set too.
+        trust_suppressions=True,
     )
     judged_set = load_judged(root / ".wardline" / "judged.yaml")
 

diff --git a/src/wardline/core/judged.py b/src/wardline/core/judged.py
@@ -110,6 +110,15 @@ def load_judged(path: Path) -> JudgedSet:
         if fp in seen:
             raise ConfigError(f"{path.name} findings[{idx}]: duplicate fingerprint {fp!r}")
         seen.add(fp)
+        # A judged record suppresses a finding ONLY as a FALSE_POSITIVE verdict. Require
+        # the field and reject any other value so a hand-edited TRUE_POSITIVE (or a
+        # missing verdict) cannot be smuggled in as a silent suppression. write_judged
+        # always emits verdict: FALSE_POSITIVE, so machine round-trips stay valid.
+        verdict = _require_str(e, "verdict", idx, path.name)
+        if verdict != "FALSE_POSITIVE":
+            raise ConfigError(
+                f"{path.name} findings[{idx}].verdict must be FALSE_POSITIVE, got {verdict!r}"
+            )
         rationale = _require_str(e, "rationale", idx, path.name)
         # Provenance is the audit primitive — never default it. A judged record with
         # no attributable model / policy / confidence is an unauditable suppression.

diff --git a/src/wardline/core/run.py b/src/wardline/core/run.py
@@ -15,7 +15,7 @@
 from typing import TYPE_CHECKING
 
 from wardline.core import config as config_mod
-from wardline.core.baseline import load_baseline
+from wardline.core.baseline import Baseline, load_baseline
 from wardline.core.delta import get_affected_entities, get_changed_files_since
 from wardline.core.discovery import discover, missing_source_roots
 from wardline.core.errors import ConfigError
@@ -45,7 +45,8 @@ def _fp(*parts: str) -> str:
 @dataclass(frozen=True, slots=True)
 class ScanSummary:
     total: int  # every finding (defects + facts/metrics)
-    active: int  # non-suppressed DEFECTs — the gate population
+    active: int  # non-suppressed DEFECTs in the emitted findings (NOT the gate population —
+    # the gate evaluates ScanResult.gate_findings unless --trust-suppressions)
     baselined: int
     waived: int
     judged: int
@@ -66,6 +67,14 @@ class ScanResult:
     # this exact run instead of re-deriving. Never serialised over MCP.
     context: AnalysisContext | None
     scanned_paths: tuple[str, ...] = ()
+    # The UNSUPPRESSED gate population (None SENTINEL — never a falsy-empty fallback).
+    # Repository-controlled baseline/waiver/judged still ANNOTATE ``findings`` (visible
+    # as ``suppressed=…``), but a malicious PR must not be able to clear the ``--fail-on``
+    # gate by committing a suppression keyed to its own new defect. ``gate_decision``
+    # evaluates this when it is not None, else falls back to ``findings`` (the trusted,
+    # local ``--trust-suppressions`` / directly-constructed-ScanResult behaviour). It is
+    # scoped by ``--new-since`` identically to ``findings``.
+    gate_findings: list[Finding] | None = None
 
 
 @dataclass(frozen=True, slots=True)
@@ -85,6 +94,7 @@ def run_scan(
     trust_local_packs: bool = False,
     trusted_packs: tuple[str, ...] = (),
     strict_defaults: bool = False,
+    trust_suppressions: bool = False,
 ) -> ScanResult:
     """Discover → analyze → apply suppressions. Pure function of (disk + config).
 
@@ -94,6 +104,16 @@ def run_scan(
     ``confine_to_root`` (default True) makes ``discover`` reject any
     ``source_root`` that resolves outside ``root``. Callers that intentionally
     scan outside the project root must opt out explicitly.
+
+    ``trust_suppressions`` (default False) is the SECURITY default. When False the
+    ``--fail-on`` gate evaluates a separately-built UNSUPPRESSED population
+    (``ScanResult.gate_findings``): repository-controlled baseline/waiver/judged
+    files still annotate the emitted ``findings`` but cannot clear the gate, so a
+    malicious PR cannot self-suppress its own new defect. When True the gate falls
+    back to the suppressed ``findings`` (``gate_findings`` is set to None) — the
+    trusted local / judge-DX behaviour, an explicit operator trust decision suitable
+    only for a trusted checkout, never for enforcement on untrusted PR content. The
+    secure CI ratchet is the operator-supplied, unforgeable ``--new-since`` instead.
     """
     from wardline.scanner.analyzer import build_analyzer
     from wardline.scanner.grammar import TrustGrammar, default_grammar
@@ -185,7 +205,21 @@ def run_scan(
     baseline = load_baseline(root / ".wardline" / "baseline.yaml")
     waivers = WaiverSet(parse_waivers(cfg.waivers))
     judged = load_judged(root / ".wardline" / "judged.yaml")
-    findings = apply_suppressions(raw, baseline, waivers, today=date.today(), judged=judged)
+    today = date.today()
+    # The emitted findings ALWAYS carry the full suppression annotations (baseline,
+    # waiver, judged) so ``suppressed=…`` is visible in output regardless of trust.
+    findings = apply_suppressions(raw, baseline, waivers, today=today, judged=judged)
+    # The gate population applies ZERO suppression but runs the SAME structural
+    # transforms apply_suppressions does (esp. the lineless-DEFECT→non-gating-FACT
+    # downgrade), so the only difference vs ``findings`` is the suppression sources —
+    # NOT ``list(raw)``, which would let a lineless DEFECT trip the gate. When the
+    # operator trusts repo suppressions, gate_findings is None and the gate falls back
+    # to the suppressed ``findings`` (None SENTINEL, never an accidental falsy-empty).
+    gate_findings: list[Finding] | None
+    if trust_suppressions:
+        gate_findings = None
+    else:
+        gate_findings = apply_suppressions(raw, Baseline(frozenset()), WaiverSet([]), today=today, judged=None)
 
     if new_since is not None:
         changed_files = get_changed_files_since(new_since, root)
@@ -195,18 +229,26 @@ def run_scan(
         else:
             affected = set()
 
-        new_findings = []
-        for f in findings:
-            if f.kind is Kind.DEFECT and f.suppressed is SuppressionState.ACTIVE:
-                is_new = (f.location.path in changed_files) or (f.qualname is not None and f.qualname in affected)
-                if not is_new:
-                    f = replace(
-                        f,
-                        suppressed=SuppressionState.BASELINED,
-                        suppression_reason=f"delta: unchanged since {new_since}",
-                    )
-            new_findings.append(f)
-        findings = new_findings
+        def apply_delta_scope(candidates: list[Finding]) -> list[Finding]:
+            # Suppress any ACTIVE defect outside the delta so the gate only fires on
+            # findings new since ``new_since``. Applied to BOTH emitted and gate
+            # populations so the operator-supplied (unforgeable) ratchet scopes the gate.
+            scoped: list[Finding] = []
+            for f in candidates:
+                if f.kind is Kind.DEFECT and f.suppressed is SuppressionState.ACTIVE:
+                    is_new = (f.location.path in changed_files) or (f.qualname is not None and f.qualname in affected)
+                    if not is_new:
+                        f = replace(
+                            f,
+                            suppressed=SuppressionState.BASELINED,
+                            suppression_reason=f"delta: unchanged since {new_since}",
+                        )
+                scoped.append(f)
+            return scoped
+
+        findings = apply_delta_scope(findings)
+        if gate_findings is not None:
+            gate_findings = apply_delta_scope(gate_findings)
 
     defects = [f for f in findings if f.kind is Kind.DEFECT]
     summary = ScanSummary(
@@ -227,12 +269,17 @@ def run_scan(
             path.relative_to(resolved_root).as_posix() if path.is_relative_to(resolved_root) else path.as_posix()
             for path in files
         ),
+        gate_findings=gate_findings,
     )
 
 
 def gate_decision(result: ScanResult, fail_on: Severity | None) -> GateDecision:
     """Translate a scan into a pass/fail verdict. A trip is data, not an error."""
     if fail_on is None:
         return GateDecision(tripped=False, fail_on=None, exit_class=0)
-    tripped = gate_trips(result.findings, fail_on)
+    # None SENTINEL: evaluate the unsuppressed gate population when present (secure
+    # default), else the suppressed ``findings`` (trusted ``--trust-suppressions`` /
+    # a directly-constructed ScanResult with no gate_findings).
+    gate_population = result.gate_findings if result.gate_findings is not None else result.findings
+    tripped = gate_trips(gate_population, fail_on)
     return GateDecision(tripped=tripped, fail_on=fail_on.value, exit_class=1 if tripped else 0)
diff --git a/src/wardline/mcp/server.py b/src/wardline/mcp/server.py
@@ -189,6 +189,7 @@ def _scan(
     new_since = args.get("new_since")
     trusted_packs = _trusted_packs_arg(args)
     cache_dir = _cache_dir_arg(args, root)
+    trust_suppressions = bool(args.get("trust_suppressions") or False)
     result = run_scan(
         path,
         config_path=_cfg(args, root),
@@ -198,6 +199,7 @@ def _scan(
         trust_local_packs=trust_local_packs,
         trusted_packs=trusted_packs,
         strict_defaults=strict_defaults,
+        trust_suppressions=trust_suppressions,
     )
     # Fail-soft Clarion write: only when a client was injected (server has a URL).
     # An outage/403 yields a not-reachable WriteResult; never raises here.
@@ -722,7 +724,10 @@ def _register_tools(self) -> None:
             Tool(
                 name="scan",
                 description="Whole-program taint scan of the project. Returns structured "
-                "findings, the suppression summary (active = the gate population), "
+                "findings, the suppression summary (active = unsuppressed defects; "
+                "by default the --fail-on gate evaluates the UNSUPPRESSED population so "
+                "repo-controlled baseline/waiver/judged annotate but do not clear it — "
+                "pass `trust_suppressions: true` for the trusted-local behaviour), "
                 "and the gate verdict. Pass `where` to filter the returned findings "
                 "(conjunctive; summary/gate stay whole-project) and `explain: true` to inline "
                 "each active defect's taint provenance — one call, no per-finding explain_taint. "
@@ -778,6 +783,13 @@ def _register_tools(self) -> None:
                             "type": "boolean",
                             "description": "Ignore repository-supplied custom configuration overrides (wardline.yaml)",
                         },
+                        "trust_suppressions": {
+                            "type": "boolean",
+                            "description": "Let repository-controlled baseline/waiver/judged clear the gate "
+                            "(they always annotate findings regardless). Default false — the gate "
+                            "evaluates the unsuppressed population so a PR cannot self-suppress its "
+                            "own defect. Use only on a trusted checkout; in CI prefer new_since.",
+                        },
                     },
                 },
                 handler=lambda args, root: _scan(