feat(v1.0): Phase 1 ship-readiness — 21-case corpus + release plumbing by ree2raz · Pull Request #2 · ree2raz/attest

ree2raz · 2026-06-06T09:07:58Z

Summary

Closes the corpus coverage gap flagged in WU2 and adds release plumbing for v1.0.
This PR is the cumulative ship-readiness cut — it includes WU1–WU10 (10 work
units) and brings the repo to a taggable, publishable v1.0 state.

What's in this PR (cumulative)

WU	What	Tests added
WU1	@attest/schema rebuilt to v1.0 contracts	25
WU2	Fixture corpus / regression oracle (TS ref)	—
WU3	@attest/diff unified-diff parser	54
WU4	@attest/symbols tree-sitter extraction	18
WU5	@attest/core verification engine	27
WU6	@attest/runner worktree-isolated execution	16
WU7	@attest/cli v1.0 (verify + schema commands)	6
WU8	detectors-ts demoted to opt-in advisory plugin	34
WU9	§6.7 acceptance gate (13-case corpus e2e)	30
WU10	Full 21-case corpus (this PR's delta) + v1.0 changeset	+44

What this PR adds on top of WU9 (the WU10 delta)

8 new corpus cases (Py/Go × {partial, allowlisted, outcome-fail, behavioral}),
copying the TS pattern. The oracle is now 21 cases (7 × 3 languages) — the
full regression target for v1.0.
Base lockfile additions (pre-change only, does not affect other cases' diffs):
- corpus/py/base/poetry.lock (minimal)
- corpus/go/base/go.sum (minimal)
Test count fix: packages/core/test/corpus.test.ts hard-coded count 13 → 21.
Release plumbing:
- .changeset/v1.0.0.md — @changesets/cli entry, all 7 packages major.
- .gitignore — removed erroneous .changeset/*.md (changesets must be tracked).
- corpus/README.md — coverage matrix updated to all-✅.

SPEC §6.7 acceptance gate (all 7)

✅ attest verify on real repo in TS, Python, and Go (21 cases).
✅ Undeclared detection catches planted scope-drift (file + intra-file symbol);
suppresses allowlisted (lockfile) changes.
✅ Outcome verification runs real build+test in worktree isolation, reports
true exit code (verified against outcome-fail fixture).
✅ Behavioral claim returns unverifiable with LLM-review pointer; never
falls through to a heuristic.
✅ Fixture corpus (21 cases) is the regression oracle; CI runs it on every commit
(.github/workflows/ci.yml corpus-acceptance job: Node 20 + Py 3.12 + Go 1.22).
✅ detectors-ts is opt-in only; does not influence exit code (structural
guarantee per WU8).
✅ README shows 20-minute zero-to-first-verdict path on a real repo.

Test counts (delta)

Package	Before WU10	After WU10	Delta
@attest/schema	25	25	—
@attest/diff	54	78	+24
@attest/symbols	18	18	—
@attest/core	27	35	+8
@attest/runner	16	16	—
@attest/cli	36	48	+12
@attest/detectors-ts	34	34	—
Total	210	254	+44

Green: lint ✓, build ✓, 254 tests ✓ (locally; CI exercises the full 21-case
corpus including the Go cases skipped locally when go is not on PATH).

What's next (out of this PR — Phase 2 per SPEC)

GitHub Action composite (action.yml)
Pre-commit / Claude Code hook
attest derive for spec-kit tasks.md, Kiro, EARS
@attest/audit package (Phase 3)

Checklist

SPEC §6.7 acceptance gate green (7/7)
Corpus: 21 cases (7 × 3 languages)
All 7 packages green; total 254 tests
README v1.0 with 20-minute quickstart
Changeset entry for v1.0.0
CI corpus-acceptance job in place
pnpm lint && pnpm build && pnpm test green

Replace the v0.1 schema package wholesale (clean-rebuild) with the v1.0 manifest/verdict/audit contracts from SPEC §4. - manifest.schema.json (§4.1): attest_version, task, agent, generated_at, declared_scope.files, and the closed claim taxonomy (file_change, symbol_added/removed/modified, test_added/modified, outcome). Known kinds validated strictly via if/then; unknown kinds pass validation (the verifier reports them unverifiable/unsupported_claim_kind, never rejected here), and a smuggled semantic description is allowed-but-ignored. - verdict.schema.json (§4.2): result, exit_code, per-claim status with reason required on failed/unverifiable, undeclared_changes, summary. - audit.schema.json (§4.3): PROVISIONAL stub; finalized in Phase 3. - validator.ts: createManifestValidator/createVerdictValidator. The v0.1 behavior_present semantic params check is removed with the semantic model. Raw schemas exported for the future `attest schema` command. - Bumped to 1.0.0; build copies all three schema JSONs to dist/. Schema package is green in isolation (build, typecheck, 25 tests, eslint, prettier). Downstream core/cli/detectors-ts remain red until their work units.

Build corpus/ as the backbone test oracle (SPEC §10). Each case = a per-language base/ repo + an overlay/ (post-change files only) + manifest.json + generated change.diff + expected-verdict.json. Working tree = base overlaid with overlay; change.diff = git diff(base -> tree), regenerable by tools/generate-diffs.sh. Cases: TypeScript full set (honest, lying, partial, undeclared, allowlisted, outcome-fail, behavioral); Python and Go core trio (honest, lying, undeclared). 13 cases total. Conventions (corpus/README.md): the WU5/WU9 harness asserts the stable projection of the verdict (result, exit_code, summary, per-claim id+status with reason-presence, undeclared_changes fields) but not evidence or exact reason text. Allowlisted changes are listed with severity=suppressed and excluded from summary.undeclared. The behavioral case expects unverifiable + pass/exit 0 (unverifiable is an allowed status, §6.6) — the Camp-3 guard. Verified: all 13 manifests + 13 verdicts validate against @attest/schema; every case's change.diff applied to base reproduces the build-tree output (triangle consistency); prettier clean repo-wide. Tooling guards: .prettierignore and eslint corpus/** ignore keep fixtures byte-stable (reformatting a fixture would silently invalidate its diff).

Self-contained unified-diff parser (no third-party diff lib) producing a structured FileDiff model with create/modify/delete ops matching the manifest taxonomy, per-line old/new line numbers, hunks, and binary/rename handling. - applyFileDiff reconstructs post-change content from base + diff (SPEC §6.2), throwing on a wrong-base mismatch rather than silently mis-patching. - query helpers: changedPaths, findFile, hunkCount, added/removedLines. - 54 tests incl. a corpus oracle that reconstructs every created/modified file from base + change.diff and asserts byte-equality with overlay/ (triangle consistency now enforced in code). Green in isolation: build, typecheck, tests, eslint, prettier.

Language-agnostic structural symbol extraction (SPEC §5.1) over web-tree-sitter WASM grammars (TS/TSX/Python/Go). Answers 'does a declaration of this name+kind exist, and where' — structure only, never behavior, no detector logic. - WASM runtime (no native compile); pinned web-tree-sitter@0.22.6 to match the prebuilt tree-sitter-wasms grammar ABI. Grammars vendored under grammars/ for a self-contained runtime; binary wasm git-tracked and prettier-ignored. - Node-kind maps grounded by probing the real grammars; shallow recursion (top-level + class methods, not function bodies) keeps undeclared detection low-noise. Python module bindings satisfy both constant and variable (the distinction is convention = semantic = out of scope). - API: extractSymbols, locateSymbol/symbolMatches, diffSymbols (added/removed/ modified via deterministic declaration-text compare), langFromPath. SymbolKind re-exported from @attest/schema. - 18 tests incl. a corpus oracle over the honest fixtures' post-change sources; built-dist smoke test confirms the runtime grammar path. Green in isolation: build, typecheck, tests, eslint, prettier.

WU5 — clean-rebuild @attest/core to v1.0 (drops ts-morph/parse-diff/detectors): - Sources reconstructs post-change content from base + diff (no worktree needed for structural verification); caches base/post content + symbols. - Verifiers: file_change (diff op), symbol_* (diffSymbols + locateSymbol), test_* (hunk + test-file class + structural covers reference; unconfirmable -> unverifiable), outcome (injected runner results; core never shells out), and unknown/behavioral kinds -> unverifiable unsupported_claim_kind (never fails). - Undeclared moat: diff-ordered file + intra-file symbol drift, allowlist (lockfiles/generated dirs) suppression; test files skipped for symbol drift. - Exit policy per SPEC 6.6. 27 tests incl. the corpus oracle: verify() conforms to all 13 expected-verdict.json across TS/Py/Go. WU6 — new @attest/runner: - git worktree isolation (apply diff -> post-change state, run, always cleanup); never runs in the live tree. Container isolation stays a Phase-3 gap. - command resolution: explicit config else auto-detect (Node/Go/Python/Make); unresolved -> omitted so core reports unverifiable, never guesses. - truncated logs, timeout -> exit 124. 16 tests incl. real-git isolation, diff application, cleanup. RunOutcomes is assignable to core OutcomeResults. Migrated + green: schema, diff, symbols, core, runner (140 tests).

Complete rewrite: drop v0.1 API (parseDiffContent, registerDetectors, @attest/detectors-ts dep). Wire parseDiff + verify + runOutcomes behind the attest verify command; add attest schema [manifest|verdict]; load attest.config.json for runner + allowlist config; human and JSON renderers against Verdict/Manifest v1.0 types. All 6 Phase-1 packages green: 146 tests total.

The v0.1 detectors-ts package consumed a Claim shape (target.kind: "endpoint", verification_contract.check: "behavior_present") that the v1.0 schema no longer accepts, and pointed at a behavioural claim kind v1.0 explicitly rejects as unverifiable with the LLM-review pointer. The package has been broken at the type level since WU5 (12 TS2305/TS2339 errors on the type re-exports) and the pre-push build gate failed on it. Demote it to an opt-in, best-effort, advisory plugin with no path into verdict.exit_code. New public surface: - runDetectors({ diff, repoRoot }) -> DetectorOutput[] - detectAuthentication({ path, symbol, content }) -> DetectorOutput - findRoutesInFile(path, content) -> string[] - DETECTOR_WARNINGS (carried on every output) DetectorOutput.status is one of {advisory_present, advisory_absent, advisory_inconclusive} — never verified / failed / unverifiable (those belong to the closed ClaimResult taxonomy owned by @attest/core). Structural guarantees: - grep -rn "@attest/detectors" packages/core packages/cli packages/runner -> empty - grep -rn "registerDetectors" packages/detectors-ts/src -> empty - packages/detectors-ts has no import from @attest/core or @attest/schema - verdict.exit_code is computed in @attest/core/verify.ts and depends only on claimResults and undeclared; this package touches neither Tests: 34 (was 25). All 18 v0.1 fixtures kept verbatim, verdict vocabulary translated to advisory status at test time. 11 new tests for runDetectors and the new public surface. Coverage 91.11% lines (threshold 85%). All 7 Phase-1 packages green: 180 tests total.

- corpus acceptance test: 13 cases (ts/py/go) × 3 assertions = 30 tests in packages/cli/test/corpus.test.ts; stable projection per SPEC §10 - corpus tooling: build-tree.sh now git-init+commits the base; README corrected (base-only --repo-root, not base+overlay) - attest.config.json added to corpus/{ts,py,go}/base/ with explicit test_cmd/build_cmd (worktree has no node_modules/pytest pre-installed) - CI: corpus-acceptance job (Node 20 + Python 3.12 + Go 1.22) - README.md: rewrite for v1.0 (20-min quickstart, multi-language, removed v0.1 references) - BUILD_LOG.md: WU9 entry All 7 packages green: 210 tests (180 + 30).

Closed the corpus coverage gap flagged in WU2: Py/Go now carry the full 7-case set (honest, lying, partial, undeclared, allowlisted, outcome-fail, behavioral). The oracle is now 21 cases (7 x 3 languages) and is the regression target for v1.0. Cases added (8): - py/{partial,allowlisted,outcome-fail,behavioral} - go/{partial,allowlisted,outcome-fail,behavioral} Base updates (pre-change only, does not affect other cases' diffs): - corpus/py/base/poetry.lock (minimal lockfile so the allowlisted case has a pre-existing lockfile to modify) - corpus/go/base/go.sum (same rationale) Test fixes: - packages/core/test/corpus.test.ts: hard-coded case count 13 -> 21 - packages/diff/test/corpus.test.ts: uses toBeGreaterThan(0), no fix needed Release plumbing: - .changeset/v1.0.0.md: marks all 7 packages as major with release notes - .gitignore: remove .changeset/*.md (changesets must be tracked) - corpus/README.md: coverage matrix updated to all green All 7 packages green: 254 tests (210 + 44). - core: 27 -> 35 (+8, one verify() test per new case) - diff: 54 -> 78 (+24, 3 reconstruction assertions per new case x 8) - cli: 36 -> 48 (+12, end-to-end corpus.test.ts) Phase 1 ship-readiness: package-level green, corpus-level green, release plumbing in place. Ready to merge to main and cut v1.0.0.

…U11, WU12, WU14) WU14 — legible manifest errors - @attest/schema: formatValidationError/formatValidationErrors turn ajv errors into path→problem→fix lines with concrete enums and patterns. - CLI surfaces them on stderr and exits 2 for malformed manifests (distinct from exit 1 for verification-fail, so CI signals stay meaningful). - 10 schema formatter tests + 3 negative CLI corpus cases. WU11 — npx-installable self-contained CLI - tsup bundles @attest/* workspace deps (noExternal) so the published package has no workspace:* deps. web-tree-sitter stays external (CJS, dynamic require breaks ESM bundle) and is a runtime dep. - schemas inlined via 'with { type: "json" }' import attributes + module: esnext + resolveJsonModule — no dist/*/*.json layout coupling. - setGrammarsDir() exported from @attest/symbols; CLI startup overrides the wasm path to <cli>/grammars so the bundled tree-sitter still finds grammars. copy-grammars.mjs copies 4 wasm files from symbols into cli/grammars. - strip-workspace-deps.mjs (prepack) rewrites package.json to publish-ready form and backs up the dev copy; restore-package.mjs (postpack) restores. - Attestation: 'npm pack' produces a 771KB tarball that installs in <4s on a clean machine with no workspace:* and runs 'attest verify' against the corpus from outside the repo (pass and fail both). WU12 — agent ergonomics - docs/manifest-contract.md: paste-in block for agent instructions (closed claim taxonomy, declared_scope rule, exit code table, minimal example). - 'attest init --diff <path> --repo-root <dir>' produces a deterministic manifest skeleton: 1 file_change per touched file, symbol_added/removed/ modified derived from tree-sitter extraction against git show HEAD (pre) and the current worktree (post), test_added/test_modified for files matching the test-path heuristic. declared_scope.files is the full set of touched paths. Same diff + worktree = same skeleton, byte- for-byte (modulo task.description, agent.id, generated_at). - README: 'Generating a manifest from a diff' section + npx install. - 6 init tests cover skeleton shape, claim id sequence, --task/--description /--agent flags, and error paths (empty diff → 65, missing repo-root → 66). Test counts: 254 → 273 (schema 35, diff 78, symbols 18, detectors-ts 34, core 35, runner 16, cli 57). All 7 packages lint, build, and test green.

- action.yml (composite): inputs manifest, diff, repo-root, format, version; runs 'npx @attest/cli@<version> verify ...' and propagates the exit code. Outputs result, exit-code, verdict (when format=json). - .github/workflows/attest-fixture.yml: live fixture — materialises the corpus/ts/base repo, runs the honest case (expect pass) and the lying case (expect non-zero exit + result=fail) against the action. This is the marketplace acceptance test from MVP_PLAN §WU13. - Marketplace branding: icon 'check-circle', color 'blue'. Authored under ree2raz/attest. Pinned to a specific version by default (1.0.0) so users opt into 'latest' explicitly. - 7 action.test.ts tests cover: YAML structure of action.yml (inputs, outputs, branding, the npx step), the example workflow references './' as the action source, and the underlying CLI call behaves correctly on the corpus honest + lying fixtures (exit 0 / exit 1, result pass / fail). The e2e test uses the local built dist, which is byte-identical to the tarball artifact; on GitHub Actions the npx call resolves to the published package. Test counts: 273 → 280 (cli 57 → 64, +7 action tests). Lint, build, and test green across all 7 packages.

… copy - scripts/demo.sh: turnkey reproduction of the gotcha. 'demo.sh lying' materialises corpus/ts/base, runs the lying case, and prints the transcript with the failed claim annotated. Pass CLI='npx --yes @attest/cli@1.0.0' to record against the published tarball; default is the local build. - docs/demo/{honest,lying,partial}-case.txt: static captured transcripts of the three demo cases. - docs/launch/show-hn.md: Show HN post draft — leads with the bug the community has lived, links the manifest contract doc, ends with the security model caveat. Headline is 'deterministic checker for what your AI agent actually changed,' not 'compliance' or 'provenance.' - docs/launch/community-seed.md: per-community (Cursor, Claude Code, Aider, ML, HN) variants of the same story, with a 'posting tips' section on what to lead with and what to defer. - README: 30-second pitch at the top (the npx one-liner + what each exit code means), 5-minute quickstart with the demo script as the fastest path, GitHub Action as step 4, manifest format / init / config / security model / packages / corpus sections in the middle, 'Contributing / building from source' section at the end (source build moved out of the happy path), GitHub Action section added between Configuration and Contributing, '@attest/cli' package description updated to mention verify + init + schema. All 280 tests still pass; lint clean; build clean.

Documents the 4-commit release-readiness batch (WU11, WU12, WU13, WU14, WU15) with per-WU acceptance notes, the test count delta (254 -> 280), and a checklist against the MVP done gate from docs/MVP_PLAN.md. All 7 gate criteria are now satisfied; the branch is shippable once merged and the npm package + GitHub Action are published.

ree2raz and others added 13 commits June 4, 2026 23:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(v1.0): Phase 1 ship-readiness — 21-case corpus + release plumbing#2

feat(v1.0): Phase 1 ship-readiness — 21-case corpus + release plumbing#2
ree2raz wants to merge 13 commits into
mainfrom
feat/v1-phase1-schema-corpus

ree2raz commented Jun 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ree2raz commented Jun 6, 2026

Summary

What's in this PR (cumulative)

What this PR adds on top of WU9 (the WU10 delta)

SPEC §6.7 acceptance gate (all 7)

Test counts (delta)

What's next (out of this PR — Phase 2 per SPEC)

Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant