Dogfood 2026-06-06 + 1.0.0rc2: gate verdict clarity, legis dev-loop, payload controls, 401 distinction, review hardening#32
Dogfood 2026-06-06 + 1.0.0rc2: gate verdict clarity, legis dev-loop, payload controls, 401 distinction, review hardening#32tachyon-beep wants to merge 17 commits into
Conversation
…efusing (wardline-30f3d38fa5) Dogfood friction #1: on a dirty tree `scan --format legis` failed exit 2 naming an `allow_dirty` flag that was never exposed on the CLI — presenting identically to "legis is broken." Expose `--allow-dirty` (CLI) / `allow_dirty` (MCP scan). The honest fix: a dirty tree under allow_dirty does NOT sign. The only tree_sha readable is the *committed* one, which does not describe dirty working content — signing it would be false provenance (the `_git_tree_sha` guard). Instead it falls through to the UNSIGNED dev artifact, clearly marked `dirty: true` (legis records it `unverified`). Signing stays clean-tree-only; verification stays clean-tree/CI. The loud refusal without --allow-dirty is unchanged. CLI emits a stderr warning when the artifact is dirty/unsigned; MCP reports `signed:false` + `dirty:true` in legis_artifact_status. legis ignores the unknown `dirty` top-level key on the unverified path, so ingest is unaffected; the golden clean-tree signature is byte-unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…rdline-be75c6676d) Dogfood friction #2: a scan reporting summary.active:0 AND gate.tripped:true read as a bug — the agent had to run scan twice (with/without trust_suppressions) and read --help to learn the gate evaluates the unsuppressed (baselined-included) population by default. GateDecision now carries `reason` and `evaluated`. `reason` names the count and class that decided the verdict — "1 suppressed ERROR+ defect(s) (baseline/waiver/ judged) not cleared; pass --trust-suppressions (trusted checkout) or --new-since <ref> (PR)" when the trip is solely from suppressed-but-gated findings, "N active ERROR+ defect(s)" on a genuine trip (no misdirection to the suppression flags), and the mixed form when both. `evaluated` names the population: "unsuppressed (repository baseline/waiver/judged ignored)" by default, "post-suppression … honored" under --trust-suppressions. Counts come from `gate_breakdown` over the ANNOTATED findings so they match what the agent reads in `summary`. Surfaced in the MCP scan gate block, the agent_summary gate block, and on CLI stderr when the gate trips (never a silent exit 1). Both None when no --fail-on. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…dline-5f662e7a4f) Dogfood friction #3: the secure gate-default (gate on the unsuppressed population) is correct, but the rollout was silent — a repo whose committed baseline used to clear --fail-on goes red with no code change, and an agent can't tell whether IT broke scan or HEAD was already red. New `baseline_migration_hint`: fires ONLY in the exact 'my repo went red with no code change' case — a committed .wardline/baseline.yaml exists, the gate trips SOLELY because baselined defects re-enter the unsuppressed population (no genuinely-active defect, no waiver/judged-only trip), and neither --trust-suppressions nor --new-since was passed. It points at both escape hatches and UPGRADING.md. Silent on a genuine active trip, a trusted/PR-scoped run, or no baseline file. Surfaced loudly on CLI stderr and as MCP `scan` gate.migration_hint (None otherwise). New UPGRADING.md documents the secure-default migration; CHANGELOG [Unreleased] gains entries for dogfood #1/#2/#3. Secure default unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…y/max_findings/include_suppressed, default explain cap (wardline-2957009961) Dogfood friction #5: the documented cost lever (`where`) did not control cost and one-shot `explain:true` was unusable on a real repo. - `where` now filters the agent_summary arrays too (it only filtered the top-level findings list before) — a filter matching 0 findings no longer returns dozens of suppressed findings inline. agent_summary build takes a display_findings view; its summary COUNTS stay whole-project. - New `summary_only:true` (counts + gate, no bodies — smallest "did the gate pass?" payload), `include_suppressed:false` (drop suppressed bodies; counts stay), `max_findings:N` (cap returned bodies). - DEFAULT explain ceiling: `explain:true` inlined provenance for EVERY active defect (56,820 chars on one line over a whole repo). Capped at 25 by default; max_findings tightens it. Findings past the cap are still returned, sans inline explanation. - New `truncation` block (findings_total/findings_returned/findings_truncated/ explanations_truncated/summary_only/include_suppressed/max_findings) so a bounded payload is never mistaken for "covered everything." CLI --format agent-summary is byte-unchanged (defaults preserve whole-project, uncapped behaviour). Docs (agents.md, legis-handoff.md --allow-dirty) + CHANGELOG updated. Full suite 2476 green; ruff/mypy/mkdocs-strict clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…wardline-be75c6676d follow-up) The gate reason counted `gate_breakdown(result.findings)` — the annotated population — so under `--new-since` a delta-scoped-out defect (converted to BASELINED by apply_delta_scope) was wrongly counted as "suppressed >= threshold", inflating the count and pointing at `--new-since` (already supplied). _gate_reason now classifies the defects that ACTUALLY gate (the unsuppressed gate population, where out-of-delta defects are BASELINED and so excluded) by their state in the emitted findings. The count is exactly what tripped the gate; the `--new-since` path no longer over-counts. The trust-suppressions branch is unchanged (gate == emitted findings there). Locked by extending the new_since differential to assert 1, not 2. Verified: legis `ScanResultsIn.scan` is typed `dict` (arbitrary mapping), so the new unsigned `dirty:true` marker rides through intake untouched — confirmed the dev artifact stays postable. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ow-up) The reported one-shot blowup was 56,820 chars over 34 findings and exceeded the tool token limit; a default of 25 inlined provenances was still uncomfortably close. Lower the default ceiling to 10 — comfortably under the limit, still plenty to triage in one call — and let max_findings RAISE it when the agent explicitly accepts the larger payload (summary_only covers the common "did the gate pass?" case). New test locks that max_findings can lift the count above the default. Docs/CHANGELOG updated. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ped gate (wardline-be75c6676d follow-up) Dogfood re-test, #2 "Worse" half: when the gate trips solely on baselined findings summary.active is 0, so next_actions said "no active defects; rescan after edits" — telling the agent it PASSED while the gate FAILED. _next_actions_for now takes the GateDecision. With 0 active defects but a tripped gate it emits a scan action whose reason names the gate failure + the escape hatches (trust_suppressions / new_since / clear the baseline; see gate.reason / gate.migration_hint) instead of the passive "rescan after edits". The active>0 and genuinely-clean paths are unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…le (wardline-53a44a3bb1) Dogfood #5: a 401 (token absent from the CLI env) was reported as "could not reach Filigree" — a wrong diagnosis that sent the agent chasing a broken-bridge / wrong- endpoint theory. The prior seam work deliberately made 401/403 SOFT (auth failure must not crash the scan loop); that is kept — only the MESSAGE changes. EmitResult now carries `status` (the HTTP status when one reached us; None when the transport itself failed) and `auth_rejected` (the 401/403 case). The CLI prints "Filigree returned 401 (auth rejected) … set WARDLINE_FILIGREE_TOKEN" vs a 5xx "server error" vs the genuine "could not reach"; the MCP scan filigree_emit block and agent_summary carry the same discriminated disabled_reason. 401/403 stays reachable=False (non-load-bearing), never exit-2. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…nreachable (#5) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ince the rebrand) uv.lock still carried the pre-rebrand `clarion` optional-dependency extra; pyproject already renamed it to `loomweave` (Clarion→Loomweave). Regenerated to match — no dependency change (blake3 >=1.0, unchanged), just the extra name. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…bda branch-locality, finding-lifecycle glossary Resolves three Filigree ready-queue items, built TDD with adversarial review. PY-WL-110 weft_markers soundness gap (wardline-d62845bb18, P2) contradictory_trust.py hardcoded `wardline.decorators.*` as the only marker prefix, silently missing contradictory stacks imported from the renamed `weft_markers` shim. Now derives _MARKER_NAMES + _MARKER_MODULE_PREFIXES from BUILTIN_BOUNDARY_TYPES so the rule can't drift from the grammar. +2 tests. Lambda bindings are branch-local (wardline-36016d26f3, P3) _CURRENT_LAMBDA_BINDINGS was shared across if/else, try/except, match arms, leaking a lambda bound in one arm into siblings (over-fire). Each arm now walks an arm-local copy. NOTE: the first cut of the merge-out (clear()+full-union with the synthetic fall-through arm last) introduced a *false-negative regression* — verified empirically against HEAD: a lambda rebound in a no-else `if` / no-catch-all `match` and called after the branch resolved EXTERNAL_RAW on HEAD but INTEGRAL after the naive fix. Replaced with a delta merge (layer each arm's net add/changed bindings onto the pre-branch state in source order) that keeps the leak fix AND reproduces HEAD's after-branch bindings, so no new false negative. +3 over-fire guards, +3 no-false-negative guards. Finding-lifecycle vocabulary glossary (wardline-26e84dbd44, P3) Audited wardline's own usage: `active` is already the canonical word on every surface except the CLI summary, which printed `N new`. Relabelled to `N active` (text only; no JSON/SARIF/wire field renamed). Added the canonical glossary docs/reference/finding-lifecycle-vocabulary.md (single source of truth for new/active/suppressed/baselined/waived/judged + emitted-active vs gate population) with discipline tests + nav wiring. Cross-tool asks (Filigree first-seen "new", legis active) recorded as coordination context, not renamed. Full suite 2471 passed, ruff + mypy clean, mkdocs --strict OK. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…, MCP legis reason, strict arg validation Applies the PR #30 multi-reviewer findings (code/tests/errors/comments/types): - GateDecision.__post_init__ makes "tripped gate that reads as passed" (dogfood #2) unconstructible, not merely avoided by the factory. - Filigree 403 is now distinguished from 401 across all three render sites (CLI stderr, CLI disabled_reason, MCP) — "forbidden (token lacks access)" rather than the misleading "set WARDLINE_FILIGREE_TOKEN". - MCP dirty-unsigned legis artifact carries a loud `reason` (parity with the CLI "never gate CI on it" warning) — agent-first surfaces stay equally loud. - migration_hint threaded into the agent-summary gate block so the "see gate.migration_hint" pointer in next_actions resolves on that surface too. - Strict boolean validation for summary_only/include_suppressed/allow_dirty/ explain (reject non-bool rather than silently coercing "false"→True) + max_findings JSON schema gains `minimum: 0`. - CHANGELOG: payload-controls entry corrected to dogfood #4 (verified against the friction report: #4=payload, #5=auth); genuine-trip reason quoted verbatim. - Glossary file:line anchors tightened to the WAIVED/JUDGED assignment lines. Quality consolidation (behavior-preserving): shared severity_gates() and filigree_disabled_reason() helpers, enum-identity (`is`) unified. New tests pin 5xx rendering (CLI+MCP), the MCP legis dirty/signed projection, the mixed active+suppressed gate-reason branch, the GateDecision invariant guard, strict arg validation, and the agent-summary migration_hint. Suite 2515 passed; ruff/mypy clean; mkdocs --strict builds. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Increment the release candidate (rc1 → rc2) to carry the PR #30 review hardening (gate invariant, 403/5xx distinction, strict MCP arg validation). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ust version test - CHANGELOG: stamp the accumulated [Unreleased] work as [1.0.0rc2] - 2026-06-06 and open a fresh empty [Unreleased]; consolidate the two `### Added` blocks into one (no content change, removes a Keep-a-Changelog duplicate-section smell). - README: the quick-start scan output said "1 new" — corrected to "1 active", matching the CLI relabel shipped in this same release (and getting-started.md). - test_package: assert __version__ starts with "1.0.0" (release line) instead of the exact rc suffix, so cutting a new rc no longer breaks the test. Suite 2515 passed; ruff clean; mkdocs --strict builds. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`ruff format --check src tests` (run in CI's Lint+Format job) was red. Reformats 6 test files: two touched in this rc2 work (test_run.py, test_server_query_explain.py), test_variable_level.py (dogfood branch change), and three with pre-existing drift already on main (test_legis_intake_contract.py, test_client.py, test_sei_client_wire.py) — the gate checks the whole tree, so all six must be clean. Formatting only; no behavior change. Suite 2515 passed; ruff check + format + mypy all clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6f2c1d6e45
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if self.max_findings is not None: | ||
| shown_active = shown_active[: self.max_findings] | ||
| shown_suppressed = shown_suppressed[: self.max_findings] | ||
| shown_facts = shown_facts[: self.max_findings] |
There was a problem hiding this comment.
Apply max_findings across the whole agent summary
When MCP scan is called with max_findings, this slices each agent_summary array independently, so a mixed result set can still return up to 3 * max_findings finding bodies inside agent_summary even though the top-level findings list and truncation.findings_returned imply the payload is capped. For example, a scan with active defects plus suppressed defects and engine facts and max_findings: 10 can still inline 30 bodies in agent_summary, which defeats the payload-control knob added here and makes the truncation metadata misleading.
Useful? React with 👍 / 👎.
| for name, lam in arm.items(): | ||
| if pre.get(name) is not lam: | ||
| parent[name] = lam |
There was a problem hiding this comment.
Drop stale lambda bindings when branches remove them
When a variable was bound to a lambda before a branch and every branch reassigns it to a non-lambda, those arm-local maps no longer contain the name, but this merge only applies entries that remain present and never removes the stale parent binding. A later cb(raw) after if/else can therefore still be resolved as a call to the pre-branch lambda even though no runtime path leaves cb as that lambda, producing spurious taint/sink findings for impossible flows.
Useful? React with 👍 / 👎.
| if last in {"external_boundary", "trust_boundary", "trusted"} and ( | ||
| fqn.startswith("wardline.decorators.") or fqn.startswith("wardline.decorators.trust.") | ||
| ): | ||
| if last in _MARKER_NAMES and any(fqn.startswith(prefix + ".") for prefix in _MARKER_MODULE_PREFIXES): |
There was a problem hiding this comment.
Match only the builtin decorator exports
This prefix test treats any nested symbol under a builtin marker namespace as a trust marker, even though the decorator provider only accepts the exact public exports (P.<name> and P.trust.<name>). If a function has one real marker such as @trusted and another project/imported decorator like @wardline.decorators.evil.external_boundary, the entity is anchored by the real marker and this rule now counts the nested non-marker as a second builtin marker, causing a false PY-WL-110 contradictory-trust finding.
Useful? React with 👍 / 👎.
…, FP guard, doc-anchor rot) Addresses the three Important findings from the PR #32 review panel, each validated with an actual RED->GREEN cycle under debugging discipline. I-1 EmitResult contradictory states (core/filigree_emit.py): - auth_rejected is now a derived @Property (status in {401,403}), deleting the redundant axis so "auth-rejected (200)" is unrepresentable, not merely unbuilt. - __post_init__ guard mirrors GateDecision: a reachable/success result carries no error status; a soft-failure created/updated nothing. Rejects reachable+503. - Docstring corrected (status is the error status; None on transport-fail AND 2xx). - No wire change: server.py still serializes auth_rejected via the property. I-2 false-positive guard for PY-WL-110 (test_contradictory_trust.py): - Empirically: a foreign-only marker stack is filtered at the anchoring gate (provenance "fallback"), never reaching the line-81 prefix check. Added both the system-level test and the isolating test (real trust_boundary anchor + a coincidental foreign `trusted`). Mutation-proven: breaking the prefix check makes the isolating test fire a false PY-WL-110. I-3 stale file:line anchors (finding-lifecycle-vocabulary.md): - Re-derived every churned-file anchor from HEAD; corrected ~26 citations. - Added a two-way content-binding discipline test: each load-bearing anchor's token must be on the cited source line AND the doc must cite that line, so doc and code can never silently diverge again. Full suite 2520 passed; ruff/format/mypy clean; mkdocs --strict builds. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
wardline/src/wardline/mcp/server.py
Lines 398 to 399 in da9c535
When an MCP scan targets a subdirectory and passes config relative to the project root, the scan itself uses _cfg(args, root) but the legis attachment reloads it with _cfg(args, path), where path is the scan subdir. In that case config: "wardline.yaml" is looked up under the subdir and silently falls back to defaults if absent (or parent-relative configs are rejected), so the emitted/signed legis_artifact.rule_set_version can describe a different policy than the findings were produced with.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| handler_lambdas = _branch_copy(parent_lambdas) | ||
| _walk_branch_body(handler.body, function_taint, taint_map, handler_taints, call_site_taints, handler_lambdas) |
There was a problem hiding this comment.
Preserve try-body lambdas for exception handlers
When a lambda is assigned earlier in a try block and a later statement raises, the except handler can legally call that lambda, but this starts each handler from parent_lambdas instead of the try-arm bindings. The handler call therefore no longer resolves through _CURRENT_LAMBDA_BINDINGS, so sinks hidden in cb = lambda x: sink(x); may_raise() followed by except: cb(raw) are missed; before this branch-local change the shared binding map still exposed the lambda to the handler.
Useful? React with 👍 / 👎.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Closes the actionable wardline items from the 2026-06-06 Loom dogfood friction report
(label `dogfood-2026-06-06`). All five concerns from the re-test are addressed and
live-verified against a freshly-spawned MCP server.
The re-test reported #2/#3/#4 as "not addressed". Root cause: stale long-running
`wardline mcp` processes, not missing code. The install is editable
(`~/wardline/src`), so a fresh spawn already has every fix — but seven long-lived MCP
servers were frozen at their spawn-time source (one was internally inconsistent and
crashed on `GateDecision.reason`). The re-tester tested #1 via the CLI (fresh process →
worked) and #2/#3/#4 via a stale MCP server (→ looked unaddressed).
Action required after merge: partners must restart their `wardline mcp` server
(or session) to pick up the code. No restart ⇒ same "broken" output.
Fixes
Live verification (fresh server, the re-tester's exact scenarios)
Tests / quality
Full suite 2482 passing, ruff + mypy + mkdocs-strict clean. Every fix is TDD'd
(red→green); golden legis signature byte-unchanged; CLI↔MCP parity preserved; CLI
`--format agent-summary` output unchanged.
🤖 Generated with Claude Code
Supersedes #30 (head branch renamed
fix/dogfood-2026-06-06-gate-legis-payload→rc/1.0.0rc2; GitHub cannot move a PR's head branch, so this is its continuation). Adds the PR #30 multi-reviewer hardening pass and cuts release candidate1.0.0rc2.