Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
17 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
159 changes: 124 additions & 35 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,43 +7,42 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

## [Unreleased]

### Fixed
- **Loomweave HMAC signer resync (auth path was 401ing every signed request).**
Wardline's request signature drifted from Loomweave's verifier (ADR-042): the
canonical message is now `METHOD\nPATH\nSHA256HEX(body)\nTIMESTAMP\nNONCE` (the
body-hash and timestamp were transposed) and every signed request now carries a
fresh high-entropy `X-Weft-Nonce` (`secrets.token_hex(16)`) — Loomweave hard-requires
the nonce (300s freshness window + replay cache) and 401s without it. The HMAC unit
test is no longer self-referential: it pins the canonical message as a literal,
Loomweave's HMAC known-answer vector (`auth.rs`), a frozen signature, and the
three-header/fresh-nonce wire shape. Affects only the authenticated Loomweave path
(reads against an unauthenticated serve were already fine).
- **legis one-judge property (P1 `wardline-48a5a8d062`).** `build_legis_artifact` now
projects the **gate** population (`result.gate_findings`, the unsuppressed view the
`--fail-on` gate evaluates) instead of the suppressed `result.findings`, mirroring
`gate_decision`'s exact `is not None` fallback. A defect a committed
baseline/waiver/judged self-suppresses now reaches legis as `active` (legis enforces
it), so legis and Wardline's own gate judge the same population. `--trust-suppressions`
(gate_findings is None) still projects the suppressed view. `finding_count` stays
honest (both populations are the same length).

### Changed
- **Filigree clients no longer crash the scan loop when Filigree auth is enabled.**
`401`/`403` from `/api/weft/*` are now treated as **soft** (enrichment unavailable,
like a 5xx/outage) across the emit and promote/file clients — previously a loud
`FiligreeEmitError` while the dossier client degraded softly (now coherent). `400`
(a Wardline payload bug) stays loud. Wardline can also now **send** a bearer token:
a new `WARDLINE_FILIGREE_TOKEN` loader threads `Authorization: Bearer` through all
three Filigree clients (emit, issue/promote, dossier work-provider) at every call
boundary; absent a token, no header is sent (default-off loopback-trust posture,
unchanged). No HMAC on this seam — it is bearer-only by design (ADR-018).
- Filigree gained the same consume-time published-port self-heal as Loomweave
(ADR-044 twin): `resolve_filigree_url` now reads `<root>/.filigree/ephemeral.port`
(precedence `flag > env > published > wardline.yaml`, skipped under `strict_defaults`),
returning `http://localhost:<port>/api/weft/scan-results` to match `install/detect.py`'s
writer. A live dashboard on a new port self-heals over a stale install-stamped literal.
## [1.0.0rc2] - 2026-06-06

### Added
- **MCP `scan` payload controls — `where` now shrinks the payload, plus
`summary_only` / `max_findings` / `include_suppressed` and a default explain cap
(dogfood friction #4).** `where` previously filtered only the top-level `findings`
list; the `agent_summary` arrays still inlined every suppressed finding, so a filter
matching zero findings still returned dozens. `where` now filters the `agent_summary`
arrays too. New args: `summary_only: true` (counts + gate, no finding bodies — the
smallest "did the gate pass?" payload), `include_suppressed: false` (drop suppressed
bodies; counts stay in `summary`), and `max_findings: N` (cap the returned bodies).
`explain: true` no longer inlines provenance for *every* active defect — the one-shot
blowup that returned 56,820 chars on one line — it is capped at 10 by default
(raise/lower with `max_findings`). Every cut is reported in a new `truncation` block
(`findings_total` / `findings_returned` / `findings_truncated` /
`explanations_truncated`) so a bounded payload never reads as "covered everything."
`summary`/`gate` always describe the whole project; the CLI `--format agent-summary`
output is unchanged.
- **The `--fail-on` gate verdict now explains itself (dogfood friction #2/#3).** A scan
reporting `summary.active: 0` while `gate.tripped: true` no longer reads as a bug. The
gate block (CLI stderr, MCP `scan` result, and the agent-summary) carries a human
`reason` — e.g. `"34 suppressed ERROR+ defect(s) (baseline/waiver/judged) not cleared;
pass --trust-suppressions (trusted checkout) or --new-since <ref> (PR)"` for a
suppressed-only trip, `"N active ERROR+ defect(s) at or above ERROR"` for a genuine one
(no misdirection to the suppression flags) — and an `evaluated` string naming the judged population
(`unsuppressed …` by default vs `post-suppression … honored` under
`--trust-suppressions`). Counts come from the annotated findings, so they match
`summary`.
- **Loud migration signal for the secure gate-default rollout (dogfood friction #3).**
When a committed `.wardline/baseline.yaml` exists, the gate trips **solely** because
baselined defects re-enter the unsuppressed population, and neither
`--trust-suppressions` nor `--new-since` was passed, Wardline now prints a one-line
`migration:` hint (CLI stderr; MCP `scan` `gate.migration_hint`; and the agent-summary
`gate.migration_hint`) pointing at the escape hatches and the new **`UPGRADING.md`**.
This is the "my repo went red with no code change" case made self-explaining; the
secure default itself is unchanged.
- Live Loomweave port resolution (consumer half of Loomweave **ADR-044**): Wardline
now reads Loomweave's published read-API port from `<project>/.loomweave/ephemeral.port`
and inserts it into `resolve_loomweave_url` precedence as `flag > env > published
Expand Down Expand Up @@ -100,6 +99,96 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
provenance — killing the scan-then-N-explains round-trips. New read-only `wardline findings`
CLI verb shares the same filter core. (WS-B1, WS-B2)

### Fixed
- **`next_actions` is gate-aware — never reads as "passed" when the gate failed
(dogfood re-test, #2).** When the gate trips solely on baselined findings,
`summary.active` is 0, so the agent-summary's `next_actions` used to say
*"no active defects; rescan after edits"* — telling the agent it passed while the
gate FAILED. It now emits a scan action naming the gate failure and the escape
hatches (trust_suppressions / new_since / clear the baseline; see `gate.reason` /
`gate.migration_hint`). The active-defects and genuinely-clean paths are unchanged.
- **CLI/MCP distinguish a Filigree `401` (auth-rejected) from transport-unreachable
(dogfood friction #5).** A `401` (token absent) was reported as *"could not reach
Filigree"*, sending agents to chase a broken-bridge theory. `EmitResult` now carries
`status` + `auth_rejected`; the CLI prints *"Filigree returned 401 (auth rejected) …
set WARDLINE_FILIGREE_TOKEN"* (and a distinct `5xx` "server error" vs the genuine
"could not reach"), and the MCP `scan` `filigree_emit` block / agent-summary carry the
same discriminated `disabled_reason`. A `403` is reported as *"forbidden (token present
but lacks access)"* rather than telling the agent to set a token that won't help.
`401`/`403` stays **soft** (non-load-bearing, never exit-2) — only the message changed.
- **`scan --format legis --allow-dirty` emits an unsigned dev artifact instead of
refusing (dogfood friction #1).** On a dirty working tree `scan --format legis`
failed `exit 2` naming an `allow_dirty` flag that was never exposed — presenting
identically to "legis is broken," the session's single biggest rabbit hole. The flag
is now exposed (`--allow-dirty` CLI / `allow_dirty` MCP `scan`). The honest fix: a
dirty tree under `--allow-dirty` does **not** sign — the only readable `tree_sha` is
the *committed* one, which does not describe dirty working content, so signing it
would be false provenance. It falls through to the **unsigned** dev artifact, clearly
marked `dirty: true` (legis records it `unverified`). Signing stays clean-tree-only;
the loud refusal without `--allow-dirty` is unchanged. Lets the dev/tour loop exercise
the Wardline→legis handshake without a commit.
- **PY-WL-110 (contradictory-trust) now fires for the `weft_markers` namespace
(soundness; `wardline-d62845bb18`).** The rule hardcoded
`wardline.decorators.*` as the only recognised marker prefix, so a contradictory
`@trusted` + `@external_boundary` stack imported from the renamed `weft_markers`
shim (the namespace authors are steered toward post-rebrand) was silently *not*
flagged. The prefix set is now derived from `BUILTIN_BOUNDARY_TYPES`
(`{wardline.decorators, weft_markers}`) so the rule cannot drift from the grammar
that seeds provenance. The other boundary rules read resolved provenance and never
had this gap.
- **Taint: lambda bindings are now branch-local (`wardline-36016d26f3`).** The
`_CURRENT_LAMBDA_BINDINGS` map was shared across `if`/`else`, `try`/`except`, and
`match` arms (unlike `var_taints`), so a lambda bound in one arm leaked into a
mutually-exclusive sibling and could over-fire (false positive) in adversarial
branch layouts. Each arm is now walked against an arm-local copy and re-converged by
layering each arm's *delta* onto the pre-branch state in source order — which both
removes the cross-arm leak and preserves a rebinding made in a no-`else` / no-catch-all
arm for a call after the branch (so no new false negative is introduced).
- **Loomweave HMAC signer resync (auth path was 401ing every signed request).**
Wardline's request signature drifted from Loomweave's verifier (ADR-042): the
canonical message is now `METHOD\nPATH\nSHA256HEX(body)\nTIMESTAMP\nNONCE` (the
body-hash and timestamp were transposed) and every signed request now carries a
fresh high-entropy `X-Weft-Nonce` (`secrets.token_hex(16)`) — Loomweave hard-requires
the nonce (300s freshness window + replay cache) and 401s without it. The HMAC unit
test is no longer self-referential: it pins the canonical message as a literal,
Loomweave's HMAC known-answer vector (`auth.rs`), a frozen signature, and the
three-header/fresh-nonce wire shape. Affects only the authenticated Loomweave path
(reads against an unauthenticated serve were already fine).
- **legis one-judge property (P1 `wardline-48a5a8d062`).** `build_legis_artifact` now
projects the **gate** population (`result.gate_findings`, the unsuppressed view the
`--fail-on` gate evaluates) instead of the suppressed `result.findings`, mirroring
`gate_decision`'s exact `is not None` fallback. A defect a committed
baseline/waiver/judged self-suppresses now reaches legis as `active` (legis enforces
it), so legis and Wardline's own gate judge the same population. `--trust-suppressions`
(gate_findings is None) still projects the suppressed view. `finding_count` stays
honest (both populations are the same length).

### Changed
- **CLI scan summary now labels the non-suppressed count `active`, not `new`**
(`wardline-26e84dbd44`). The human summary line previously printed
`… N new`, but every other surface — the `SuppressionState.ACTIVE` enum, the
`ScanSummary.active` field, the MCP `summary.active` key, the agent-summary
`active_defects` key, and the `wardline:loop` prompt — already said `active`.
The CLI now matches, so an agent never reconciles a CLI "N new" against an MCP
"active". Text-only (the count value is unchanged); no JSON/SARIF/wire field
renamed. The new [Finding lifecycle & gate vocabulary](https://github.com/foundryside-dev/wardline/blob/main/docs/reference/finding-lifecycle-vocabulary.md)
reference page is the single source of truth for these state words (and the
three distinct meanings of "new" across the suite).
- **Filigree clients no longer crash the scan loop when Filigree auth is enabled.**
`401`/`403` from `/api/weft/*` are now treated as **soft** (enrichment unavailable,
like a 5xx/outage) across the emit and promote/file clients — previously a loud
`FiligreeEmitError` while the dossier client degraded softly (now coherent). `400`
(a Wardline payload bug) stays loud. Wardline can also now **send** a bearer token:
a new `WARDLINE_FILIGREE_TOKEN` loader threads `Authorization: Bearer` through all
three Filigree clients (emit, issue/promote, dossier work-provider) at every call
boundary; absent a token, no header is sent (default-off loopback-trust posture,
unchanged). No HMAC on this seam — it is bearer-only by design (ADR-018).
- Filigree gained the same consume-time published-port self-heal as Loomweave
(ADR-044 twin): `resolve_filigree_url` now reads `<root>/.filigree/ephemeral.port`
(precedence `flag > env > published > wardline.yaml`, skipped under `strict_defaults`),
returning `http://localhost:<port>/api/weft/scan-results` to match `install/detect.py`'s
writer. A live dashboard on a new port self-heals over a stale install-stamped literal.

### Security
- **Builtin trust-marker decorators are now trusted only when they resolve to the
real exports — closes a spoofable false-green.** The default decorator seeding
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ def build_record(req):

```console
$ wardline scan . --fail-on ERROR
scanned 1 file(s); 3 finding(s) — 0 suppressed (0 baseline / 0 waiver / 0 judged), 1 new -> findings.jsonl
scanned 1 file(s); 3 finding(s) — 0 suppressed (0 baseline / 0 waiver / 0 judged), 1 active -> findings.jsonl
$ echo $?
1
```
Expand Down
46 changes: 46 additions & 0 deletions UPGRADING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
# Upgrading Wardline

Migration notes for changes that can alter a previously-green run. Newest first.

## To v1.0 — the `--fail-on` gate no longer honors committed suppressions by default

**What changed.** `.wardline/baseline.yaml`, `wardline.yaml` waivers, and
`.wardline/judged.yaml` are all committed repository content, so a malicious pull
request could add a suppression entry keyed to its own new defect's fingerprint and
clear the gate. The `--fail-on` gate now evaluates the **unsuppressed** population by
default: baseline / waiver / judged still **annotate** the emitted findings
(`suppressed: baselined | waived | judged`) but no longer clear the gate.

**Symptom on upgrade.** A repository whose committed baseline used to clear
`wardline scan --fail-on=ERROR` goes **red with no change to its own code**, because
the baselined defects re-enter the gate population. Wardline now says so out loud — a
clean run that trips solely on baselined findings (and was given neither
`--trust-suppressions` nor `--new-since`) prints:

```
migration: baseline present but not honored by default since v1.0 (secure gate default) —
N baselined ERROR+ defect(s) re-enter the gate. Pass --trust-suppressions for a trusted
local checkout or --new-since <merge-base> in CI. See UPGRADING.md.
```

The same signal rides the MCP `scan` result at `gate.migration_hint`, and the gate
block always carries a `reason` and the `evaluated` population so "0 active + gate
FAILED" never reads as a bug.

**How to restore a passing gate.** Pick the one that matches your trust posture:

- **CI (recommended): `--new-since <merge-base>`.** Scopes both the emitted findings
and the gate to what changed since the ref — an operator-supplied, unforgeable
ratchet a PR cannot tamper with. A baselined defect that is *not* in the diff stops
gating; a brand-new defect still trips.
- **Trusted local checkout: `--trust-suppressions`** (CLI) / `trust_suppressions: true`
(MCP `scan`). Restores the old post-suppression gate. Use **only** where the
suppression files are trusted — never to enforce on untrusted PR content. This is
what the `judge` workflow uses internally.

Keeping the baseline up to date (`wardline baseline update`) and clearing real debt is
the durable fix; the flags above are the migration bridge.

**Not affected.** legis's scan artifact and the "one judge / reproduces Wardline's gate
population exactly" property are derived from the gate population, so they already
reflect the secure view. Only the local `--fail-on` exit code changed.
2 changes: 1 addition & 1 deletion docs/getting-started.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ wardline scan . --format jsonl
```

```text
scanned 2 file(s); 4 finding(s) — 0 suppressed (0 baseline / 0 waiver / 0 judged), 1 new -> findings.jsonl
scanned 2 file(s); 4 finding(s) — 0 suppressed (0 baseline / 0 waiver / 0 judged), 1 active -> findings.jsonl
```

!!! note "Where the findings go"
Expand Down
24 changes: 21 additions & 3 deletions docs/guides/agents.md
Original file line number Diff line number Diff line change
Expand Up @@ -105,7 +105,7 @@ By default a scan reports but never fails — the gate is opt-in:

```console
$ wardline scan .
scanned 1 file(s); 3 finding(s) — 0 suppressed (0 baseline / 0 waiver / 0 judged), 1 new -> findings.jsonl
scanned 1 file(s); 3 finding(s) — 0 suppressed (0 baseline / 0 waiver / 0 judged), 1 active -> findings.jsonl
```

```console
Expand All @@ -118,7 +118,7 @@ at or above the threshold drives a non-zero exit:

```console
$ wardline scan . --fail-on ERROR
scanned 1 file(s); 3 finding(s) — 0 suppressed (0 baseline / 0 waiver / 0 judged), 1 new -> findings.jsonl
scanned 1 file(s); 3 finding(s) — 0 suppressed (0 baseline / 0 waiver / 0 judged), 1 active -> findings.jsonl
```

```console
Expand Down Expand Up @@ -190,7 +190,7 @@ a sibling Weft tool — emit SARIF 2.1.0:

```console
$ wardline scan . --format sarif --output results.sarif --fail-on ERROR
scanned 1 file(s); 3 finding(s) — 0 suppressed (0 baseline / 0 waiver / 0 judged), 1 new -> results.sarif
scanned 1 file(s); 3 finding(s) — 0 suppressed (0 baseline / 0 waiver / 0 judged), 1 active -> results.sarif
```

The log is standard SARIF 2.1.0 with a `wardline` driver and one result per
Expand Down Expand Up @@ -224,6 +224,24 @@ Resources expose the trust vocabulary, rule catalog, config, and config schema.
The `wardline:loop` prompt documents the intended
scan → explain → fix-at-the-boundary → rescan cycle.

`scan` payload controls (the `summary`/`gate` blocks always describe the whole
project — these only bound the returned finding bodies):

- `where` — a conjunctive read-lens (keys: `rule_id`, `qualname`, `severity`,
`suppression`, `kind`, `path_glob`, `sink`, `tier`) that filters **both** the
`findings` list and the `agent_summary` arrays.
- `summary_only: true` — counts + gate only, no finding bodies. The smallest
"did the gate pass?" payload.
- `include_suppressed: false` — drop suppressed (baselined/waived/judged) bodies;
the suppression counts stay in `summary`.
- `max_findings: N` — cap the returned bodies (and inlined explanations).
- `explain: true` — inline each active defect's provenance; capped at 10 by
default (raise/lower with `max_findings`).

Every cut is reported in the response `truncation` block (`findings_total`,
`findings_returned`, `findings_truncated`, `explanations_truncated`) so a bounded
payload never reads as "covered everything."

With an opt-in Loomweave taint store configured (`wardline mcp --loomweave-url
<URL>`), `explain_taint` becomes a query when you pass the finding's `qualname`
as `sink_qualname`: a fresh fact is served from the store without re-scanning
Expand Down
13 changes: 13 additions & 0 deletions docs/guides/legis-handoff.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,19 @@ as `unverified` — the trust-the-agent posture before a key is set).
`tree_sha` that does not match the scanned content is false provenance, so it is
refused rather than emitted.

!!! tip "Dev/tour loop on a dirty tree: `--allow-dirty`"
Signing is clean-tree-only, but you do not need a commit to exercise the
Wardline→legis handshake. Pass `--allow-dirty` (CLI) / `allow_dirty: true` (MCP
`scan`) to emit an **unsigned**, clearly-marked artifact on a dirty tree:

```bash
wardline scan . --format legis --allow-dirty --output /tmp/scan.legis.json
```

The artifact carries `"dirty": true` and **no** `artifact_signature`; legis records
it as `unverified`. The committed tree is never signed as if it described dirty
working content. Use it for the dev loop and the tour — never to gate CI.

### From the MCP server (agents)

The `scan` tool attaches the artifact automatically once the secret is provisioned —
Expand Down
Loading
Loading