Skip to content

Dogfood 2026-06-06 + 1.0.0rc3: gate verdict clarity, legis dev-loop, payload controls, 401 distinction, review hardening#33

Closed
tachyon-beep wants to merge 17 commits into
mainfrom
rc3
Closed

Dogfood 2026-06-06 + 1.0.0rc3: gate verdict clarity, legis dev-loop, payload controls, 401 distinction, review hardening#33
tachyon-beep wants to merge 17 commits into
mainfrom
rc3

Conversation

@tachyon-beep

Copy link
Copy Markdown
Collaborator

Closes the actionable wardline items from the 2026-06-06 Loom dogfood friction report
(label `dogfood-2026-06-06`). All five concerns from the re-test are addressed and
live-verified against a freshly-spawned MCP server.

⚠️ Deployment note for the federation (read first)

The re-test reported #2/#3/#4 as "not addressed". Root cause: stale long-running
`wardline mcp` processes
, not missing code. The install is editable
(`~/wardline/src`), so a fresh spawn already has every fix — but seven long-lived MCP
servers were frozen at their spawn-time source (one was internally inconsistent and
crashed on `GateDecision.reason`). The re-tester tested #1 via the CLI (fresh process →
worked) and #2/#3/#4 via a stale MCP server (→ looked unaddressed).

Action required after merge: partners must restart their `wardline mcp` server
(or session)
to pick up the code. No restart ⇒ same "broken" output.

Fixes

# Concern Fix Issue
1 (P0) `--allow-dirty` on `scan --format legis` unsigned, `dirty:true`-marked dev artifact; signing stays clean-tree-only wardline-30f3d38fa5
2 (P1) gate contradicts its summary `gate.reason` + `gate.evaluated`; `next_actions` now gate-aware (no "rescan after edits" on a tripped gate) wardline-be75c6676d
3 (P1) silent gate-default breaking change `gate.migration_hint` (CLI stderr + MCP) + `UPGRADING.md` wardline-5f662e7a4f
4 (P1) `where` didn't shrink payload; `explain` blew budget `where` filters the agent_summary; `summary_only`/`max_findings`/`include_suppressed`; default explain cap (10); `truncation` block wardline-2957009961
5 (P2) 401 reported as "could not reach" `EmitResult.status`/`auth_rejected`; CLI/MCP print "401 (auth rejected) … set WARDLINE_FILIGREE_TOKEN"; stays soft wardline-53a44a3bb1

Live verification (fresh server, the re-tester's exact scenarios)

  • `tools/list` exposes the new `summary_only`/`max_findings`/`include_suppressed` args (proves fresh server).
  • 34-baselined gate trip → `gate.reason`, `gate.evaluated`, `gate.migration_hint`, and gate-aware `next_actions` all present; no crash.
  • `where:{active,CRITICAL}` (0 match) + `explain:true` → 1,585 chars (was 57,639), `suppressed_findings` empty inline, `truncation` present.
  • `summary_only` → 0 finding bodies, counts intact.

Tests / quality

Full suite 2482 passing, ruff + mypy + mkdocs-strict clean. Every fix is TDD'd
(red→green); golden legis signature byte-unchanged; CLI↔MCP parity preserved; CLI
`--format agent-summary` output unchanged.

🤖 Generated with Claude Code


Supersedes #30 (head branch renamed fix/dogfood-2026-06-06-gate-legis-payloadrc/1.0.0rc2; GitHub cannot move a PR's head branch, so this is its continuation). Adds the PR #30 multi-reviewer hardening pass and cuts release candidate 1.0.0rc2.


Continued from #32 (auto-closed by the rc2→rc3 branch rename). Version bumped to 1.0.0rc3.

John Morrissey and others added 17 commits June 6, 2026 12:44
…efusing (wardline-30f3d38fa5)

Dogfood friction #1: on a dirty tree `scan --format legis` failed exit 2 naming
an `allow_dirty` flag that was never exposed on the CLI — presenting identically
to "legis is broken." Expose `--allow-dirty` (CLI) / `allow_dirty` (MCP scan).

The honest fix: a dirty tree under allow_dirty does NOT sign. The only tree_sha
readable is the *committed* one, which does not describe dirty working content —
signing it would be false provenance (the `_git_tree_sha` guard). Instead it
falls through to the UNSIGNED dev artifact, clearly marked `dirty: true` (legis
records it `unverified`). Signing stays clean-tree-only; verification stays
clean-tree/CI. The loud refusal without --allow-dirty is unchanged.

CLI emits a stderr warning when the artifact is dirty/unsigned; MCP reports
`signed:false` + `dirty:true` in legis_artifact_status. legis ignores the unknown
`dirty` top-level key on the unverified path, so ingest is unaffected; the golden
clean-tree signature is byte-unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…rdline-be75c6676d)

Dogfood friction #2: a scan reporting summary.active:0 AND gate.tripped:true read
as a bug — the agent had to run scan twice (with/without trust_suppressions) and
read --help to learn the gate evaluates the unsuppressed (baselined-included)
population by default.

GateDecision now carries `reason` and `evaluated`. `reason` names the count and
class that decided the verdict — "1 suppressed ERROR+ defect(s) (baseline/waiver/
judged) not cleared; pass --trust-suppressions (trusted checkout) or --new-since
<ref> (PR)" when the trip is solely from suppressed-but-gated findings, "N active
ERROR+ defect(s)" on a genuine trip (no misdirection to the suppression flags),
and the mixed form when both. `evaluated` names the population: "unsuppressed
(repository baseline/waiver/judged ignored)" by default, "post-suppression …
honored" under --trust-suppressions. Counts come from `gate_breakdown` over the
ANNOTATED findings so they match what the agent reads in `summary`.

Surfaced in the MCP scan gate block, the agent_summary gate block, and on CLI
stderr when the gate trips (never a silent exit 1). Both None when no --fail-on.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…dline-5f662e7a4f)

Dogfood friction #3: the secure gate-default (gate on the unsuppressed population)
is correct, but the rollout was silent — a repo whose committed baseline used to
clear --fail-on goes red with no code change, and an agent can't tell whether IT
broke scan or HEAD was already red.

New `baseline_migration_hint`: fires ONLY in the exact 'my repo went red with no
code change' case — a committed .wardline/baseline.yaml exists, the gate trips
SOLELY because baselined defects re-enter the unsuppressed population (no
genuinely-active defect, no waiver/judged-only trip), and neither
--trust-suppressions nor --new-since was passed. It points at both escape hatches
and UPGRADING.md. Silent on a genuine active trip, a trusted/PR-scoped run, or no
baseline file.

Surfaced loudly on CLI stderr and as MCP `scan` gate.migration_hint (None
otherwise). New UPGRADING.md documents the secure-default migration; CHANGELOG
[Unreleased] gains entries for dogfood #1/#2/#3. Secure default unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…y/max_findings/include_suppressed, default explain cap (wardline-2957009961)

Dogfood friction #5: the documented cost lever (`where`) did not control cost and
one-shot `explain:true` was unusable on a real repo.

- `where` now filters the agent_summary arrays too (it only filtered the top-level
  findings list before) — a filter matching 0 findings no longer returns dozens of
  suppressed findings inline. agent_summary build takes a display_findings view;
  its summary COUNTS stay whole-project.
- New `summary_only:true` (counts + gate, no bodies — smallest "did the gate pass?"
  payload), `include_suppressed:false` (drop suppressed bodies; counts stay),
  `max_findings:N` (cap returned bodies).
- DEFAULT explain ceiling: `explain:true` inlined provenance for EVERY active
  defect (56,820 chars on one line over a whole repo). Capped at 25 by default;
  max_findings tightens it. Findings past the cap are still returned, sans inline
  explanation.
- New `truncation` block (findings_total/findings_returned/findings_truncated/
  explanations_truncated/summary_only/include_suppressed/max_findings) so a bounded
  payload is never mistaken for "covered everything."

CLI --format agent-summary is byte-unchanged (defaults preserve whole-project,
uncapped behaviour). Docs (agents.md, legis-handoff.md --allow-dirty) + CHANGELOG
updated. Full suite 2476 green; ruff/mypy/mkdocs-strict clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…wardline-be75c6676d follow-up)

The gate reason counted `gate_breakdown(result.findings)` — the annotated population —
so under `--new-since` a delta-scoped-out defect (converted to BASELINED by
apply_delta_scope) was wrongly counted as "suppressed >= threshold", inflating the
count and pointing at `--new-since` (already supplied).

_gate_reason now classifies the defects that ACTUALLY gate (the unsuppressed gate
population, where out-of-delta defects are BASELINED and so excluded) by their state in
the emitted findings. The count is exactly what tripped the gate; the `--new-since`
path no longer over-counts. The trust-suppressions branch is unchanged (gate == emitted
findings there). Locked by extending the new_since differential to assert 1, not 2.

Verified: legis `ScanResultsIn.scan` is typed `dict` (arbitrary mapping), so the new
unsigned `dirty:true` marker rides through intake untouched — confirmed the dev artifact
stays postable.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ow-up)

The reported one-shot blowup was 56,820 chars over 34 findings and exceeded the tool
token limit; a default of 25 inlined provenances was still uncomfortably close. Lower
the default ceiling to 10 — comfortably under the limit, still plenty to triage in one
call — and let max_findings RAISE it when the agent explicitly accepts the larger
payload (summary_only covers the common "did the gate pass?" case). New test locks that
max_findings can lift the count above the default. Docs/CHANGELOG updated.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ped gate (wardline-be75c6676d follow-up)

Dogfood re-test, #2 "Worse" half: when the gate trips solely on baselined findings
summary.active is 0, so next_actions said "no active defects; rescan after edits" —
telling the agent it PASSED while the gate FAILED.

_next_actions_for now takes the GateDecision. With 0 active defects but a tripped
gate it emits a scan action whose reason names the gate failure + the escape hatches
(trust_suppressions / new_since / clear the baseline; see gate.reason /
gate.migration_hint) instead of the passive "rescan after edits". The active>0 and
genuinely-clean paths are unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…le (wardline-53a44a3bb1)

Dogfood #5: a 401 (token absent from the CLI env) was reported as "could not reach
Filigree" — a wrong diagnosis that sent the agent chasing a broken-bridge / wrong-
endpoint theory. The prior seam work deliberately made 401/403 SOFT (auth failure must
not crash the scan loop); that is kept — only the MESSAGE changes.

EmitResult now carries `status` (the HTTP status when one reached us; None when the
transport itself failed) and `auth_rejected` (the 401/403 case). The CLI prints
"Filigree returned 401 (auth rejected) … set WARDLINE_FILIGREE_TOKEN" vs a 5xx
"server error" vs the genuine "could not reach"; the MCP scan filigree_emit block and
agent_summary carry the same discriminated disabled_reason. 401/403 stays
reachable=False (non-load-bearing), never exit-2.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…nreachable (#5)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ince the rebrand)

uv.lock still carried the pre-rebrand `clarion` optional-dependency extra; pyproject
already renamed it to `loomweave` (Clarion→Loomweave). Regenerated to match — no
dependency change (blake3 >=1.0, unchanged), just the extra name.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…bda branch-locality, finding-lifecycle glossary

Resolves three Filigree ready-queue items, built TDD with adversarial review.

PY-WL-110 weft_markers soundness gap (wardline-d62845bb18, P2)
  contradictory_trust.py hardcoded `wardline.decorators.*` as the only marker
  prefix, silently missing contradictory stacks imported from the renamed
  `weft_markers` shim. Now derives _MARKER_NAMES + _MARKER_MODULE_PREFIXES from
  BUILTIN_BOUNDARY_TYPES so the rule can't drift from the grammar. +2 tests.

Lambda bindings are branch-local (wardline-36016d26f3, P3)
  _CURRENT_LAMBDA_BINDINGS was shared across if/else, try/except, match arms,
  leaking a lambda bound in one arm into siblings (over-fire). Each arm now walks
  an arm-local copy.

  NOTE: the first cut of the merge-out (clear()+full-union with the synthetic
  fall-through arm last) introduced a *false-negative regression* — verified
  empirically against HEAD: a lambda rebound in a no-else `if` / no-catch-all
  `match` and called after the branch resolved EXTERNAL_RAW on HEAD but INTEGRAL
  after the naive fix. Replaced with a delta merge (layer each arm's net
  add/changed bindings onto the pre-branch state in source order) that keeps the
  leak fix AND reproduces HEAD's after-branch bindings, so no new false negative.
  +3 over-fire guards, +3 no-false-negative guards.

Finding-lifecycle vocabulary glossary (wardline-26e84dbd44, P3)
  Audited wardline's own usage: `active` is already the canonical word on every
  surface except the CLI summary, which printed `N new`. Relabelled to `N active`
  (text only; no JSON/SARIF/wire field renamed). Added the canonical glossary
  docs/reference/finding-lifecycle-vocabulary.md (single source of truth for
  new/active/suppressed/baselined/waived/judged + emitted-active vs gate
  population) with discipline tests + nav wiring. Cross-tool asks (Filigree
  first-seen "new", legis active) recorded as coordination context, not renamed.

Full suite 2471 passed, ruff + mypy clean, mkdocs --strict OK.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…, MCP legis reason, strict arg validation

Applies the PR #30 multi-reviewer findings (code/tests/errors/comments/types):

- GateDecision.__post_init__ makes "tripped gate that reads as passed" (dogfood
  #2) unconstructible, not merely avoided by the factory.
- Filigree 403 is now distinguished from 401 across all three render sites
  (CLI stderr, CLI disabled_reason, MCP) — "forbidden (token lacks access)"
  rather than the misleading "set WARDLINE_FILIGREE_TOKEN".
- MCP dirty-unsigned legis artifact carries a loud `reason` (parity with the
  CLI "never gate CI on it" warning) — agent-first surfaces stay equally loud.
- migration_hint threaded into the agent-summary gate block so the
  "see gate.migration_hint" pointer in next_actions resolves on that surface too.
- Strict boolean validation for summary_only/include_suppressed/allow_dirty/
  explain (reject non-bool rather than silently coercing "false"→True) +
  max_findings JSON schema gains `minimum: 0`.
- CHANGELOG: payload-controls entry corrected to dogfood #4 (verified against
  the friction report: #4=payload, #5=auth); genuine-trip reason quoted verbatim.
- Glossary file:line anchors tightened to the WAIVED/JUDGED assignment lines.

Quality consolidation (behavior-preserving): shared severity_gates() and
filigree_disabled_reason() helpers, enum-identity (`is`) unified.

New tests pin 5xx rendering (CLI+MCP), the MCP legis dirty/signed projection,
the mixed active+suppressed gate-reason branch, the GateDecision invariant
guard, strict arg validation, and the agent-summary migration_hint.

Suite 2515 passed; ruff/mypy clean; mkdocs --strict builds.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Increment the release candidate (rc1 → rc2) to carry the PR #30 review
hardening (gate invariant, 403/5xx distinction, strict MCP arg validation).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ust version test

- CHANGELOG: stamp the accumulated [Unreleased] work as [1.0.0rc2] - 2026-06-06
  and open a fresh empty [Unreleased]; consolidate the two `### Added` blocks
  into one (no content change, removes a Keep-a-Changelog duplicate-section smell).
- README: the quick-start scan output said "1 new" — corrected to "1 active",
  matching the CLI relabel shipped in this same release (and getting-started.md).
- test_package: assert __version__ starts with "1.0.0" (release line) instead of
  the exact rc suffix, so cutting a new rc no longer breaks the test.

Suite 2515 passed; ruff clean; mkdocs --strict builds.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`ruff format --check src tests` (run in CI's Lint+Format job) was red. Reformats
6 test files: two touched in this rc2 work (test_run.py, test_server_query_explain.py),
test_variable_level.py (dogfood branch change), and three with pre-existing drift
already on main (test_legis_intake_contract.py, test_client.py, test_sei_client_wire.py)
— the gate checks the whole tree, so all six must be clean. Formatting only; no
behavior change. Suite 2515 passed; ruff check + format + mypy all clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…, FP guard, doc-anchor rot)

Addresses the three Important findings from the PR #32 review panel, each
validated with an actual RED->GREEN cycle under debugging discipline.

I-1 EmitResult contradictory states (core/filigree_emit.py):
  - auth_rejected is now a derived @Property (status in {401,403}), deleting the
    redundant axis so "auth-rejected (200)" is unrepresentable, not merely unbuilt.
  - __post_init__ guard mirrors GateDecision: a reachable/success result carries no
    error status; a soft-failure created/updated nothing. Rejects reachable+503.
  - Docstring corrected (status is the error status; None on transport-fail AND 2xx).
  - No wire change: server.py still serializes auth_rejected via the property.

I-2 false-positive guard for PY-WL-110 (test_contradictory_trust.py):
  - Empirically: a foreign-only marker stack is filtered at the anchoring gate
    (provenance "fallback"), never reaching the line-81 prefix check. Added both the
    system-level test and the isolating test (real trust_boundary anchor + a
    coincidental foreign `trusted`). Mutation-proven: breaking the prefix check makes
    the isolating test fire a false PY-WL-110.

I-3 stale file:line anchors (finding-lifecycle-vocabulary.md):
  - Re-derived every churned-file anchor from HEAD; corrected ~26 citations.
  - Added a two-way content-binding discipline test: each load-bearing anchor's token
    must be on the cited source line AND the doc must cite that line, so doc and code
    can never silently diverge again.

Full suite 2520 passed; ruff/format/mypy clean; mkdocs --strict builds.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant