Skip to content

feat(v1.0): Phase 1 ship-readiness — 21-case corpus + release plumbing#2

Open
ree2raz wants to merge 13 commits into
mainfrom
feat/v1-phase1-schema-corpus
Open

feat(v1.0): Phase 1 ship-readiness — 21-case corpus + release plumbing#2
ree2raz wants to merge 13 commits into
mainfrom
feat/v1-phase1-schema-corpus

Conversation

@ree2raz
Copy link
Copy Markdown
Owner

@ree2raz ree2raz commented Jun 6, 2026

Summary

Closes the corpus coverage gap flagged in WU2 and adds release plumbing for v1.0.
This PR is the cumulative ship-readiness cut — it includes WU1–WU10 (10 work
units) and brings the repo to a taggable, publishable v1.0 state.

What's in this PR (cumulative)

WU What Tests added
WU1 @attest/schema rebuilt to v1.0 contracts 25
WU2 Fixture corpus / regression oracle (TS ref)
WU3 @attest/diff unified-diff parser 54
WU4 @attest/symbols tree-sitter extraction 18
WU5 @attest/core verification engine 27
WU6 @attest/runner worktree-isolated execution 16
WU7 @attest/cli v1.0 (verify + schema commands) 6
WU8 detectors-ts demoted to opt-in advisory plugin 34
WU9 §6.7 acceptance gate (13-case corpus e2e) 30
WU10 Full 21-case corpus (this PR's delta) + v1.0 changeset +44

What this PR adds on top of WU9 (the WU10 delta)

  • 8 new corpus cases (Py/Go × {partial, allowlisted, outcome-fail, behavioral}),
    copying the TS pattern. The oracle is now 21 cases (7 × 3 languages) — the
    full regression target for v1.0.
  • Base lockfile additions (pre-change only, does not affect other cases' diffs):
    • corpus/py/base/poetry.lock (minimal)
    • corpus/go/base/go.sum (minimal)
  • Test count fix: packages/core/test/corpus.test.ts hard-coded count 13 → 21.
  • Release plumbing:
    • .changeset/v1.0.0.md@changesets/cli entry, all 7 packages major.
    • .gitignore — removed erroneous .changeset/*.md (changesets must be tracked).
    • corpus/README.md — coverage matrix updated to all-✅.

SPEC §6.7 acceptance gate (all 7)

  1. attest verify on real repo in TS, Python, and Go (21 cases).
  2. ✅ Undeclared detection catches planted scope-drift (file + intra-file symbol);
    suppresses allowlisted (lockfile) changes.
  3. ✅ Outcome verification runs real build+test in worktree isolation, reports
    true exit code (verified against outcome-fail fixture).
  4. ✅ Behavioral claim returns unverifiable with LLM-review pointer; never
    falls through to a heuristic.
  5. ✅ Fixture corpus (21 cases) is the regression oracle; CI runs it on every commit
    (.github/workflows/ci.yml corpus-acceptance job: Node 20 + Py 3.12 + Go 1.22).
  6. ✅ detectors-ts is opt-in only; does not influence exit code (structural
    guarantee per WU8).
  7. ✅ README shows 20-minute zero-to-first-verdict path on a real repo.

Test counts (delta)

Package Before WU10 After WU10 Delta
@attest/schema 25 25
@attest/diff 54 78 +24
@attest/symbols 18 18
@attest/core 27 35 +8
@attest/runner 16 16
@attest/cli 36 48 +12
@attest/detectors-ts 34 34
Total 210 254 +44

Green: lint ✓, build ✓, 254 tests ✓ (locally; CI exercises the full 21-case
corpus including the Go cases skipped locally when go is not on PATH).

What's next (out of this PR — Phase 2 per SPEC)

  • GitHub Action composite (action.yml)
  • Pre-commit / Claude Code hook
  • attest derive for spec-kit tasks.md, Kiro, EARS
  • @attest/audit package (Phase 3)

Checklist

  • SPEC §6.7 acceptance gate green (7/7)
  • Corpus: 21 cases (7 × 3 languages)
  • All 7 packages green; total 254 tests
  • README v1.0 with 20-minute quickstart
  • Changeset entry for v1.0.0
  • CI corpus-acceptance job in place
  • pnpm lint && pnpm build && pnpm test green

ree2raz and others added 13 commits June 4, 2026 23:58
Replace the v0.1 schema package wholesale (clean-rebuild) with the v1.0
manifest/verdict/audit contracts from SPEC §4.

- manifest.schema.json (§4.1): attest_version, task, agent, generated_at,
  declared_scope.files, and the closed claim taxonomy (file_change,
  symbol_added/removed/modified, test_added/modified, outcome). Known kinds
  validated strictly via if/then; unknown kinds pass validation (the verifier
  reports them unverifiable/unsupported_claim_kind, never rejected here), and a
  smuggled semantic description is allowed-but-ignored.
- verdict.schema.json (§4.2): result, exit_code, per-claim status with reason
  required on failed/unverifiable, undeclared_changes, summary.
- audit.schema.json (§4.3): PROVISIONAL stub; finalized in Phase 3.
- validator.ts: createManifestValidator/createVerdictValidator. The v0.1
  behavior_present semantic params check is removed with the semantic model.
  Raw schemas exported for the future `attest schema` command.
- Bumped to 1.0.0; build copies all three schema JSONs to dist/.

Schema package is green in isolation (build, typecheck, 25 tests, eslint,
prettier). Downstream core/cli/detectors-ts remain red until their work units.
Build corpus/ as the backbone test oracle (SPEC §10). Each case = a per-language
base/ repo + an overlay/ (post-change files only) + manifest.json + generated
change.diff + expected-verdict.json. Working tree = base overlaid with overlay;
change.diff = git diff(base -> tree), regenerable by tools/generate-diffs.sh.

Cases: TypeScript full set (honest, lying, partial, undeclared, allowlisted,
outcome-fail, behavioral); Python and Go core trio (honest, lying, undeclared).
13 cases total.

Conventions (corpus/README.md): the WU5/WU9 harness asserts the stable projection
of the verdict (result, exit_code, summary, per-claim id+status with reason-presence,
undeclared_changes fields) but not evidence or exact reason text. Allowlisted changes
are listed with severity=suppressed and excluded from summary.undeclared. The
behavioral case expects unverifiable + pass/exit 0 (unverifiable is an allowed
status, §6.6) — the Camp-3 guard.

Verified: all 13 manifests + 13 verdicts validate against @attest/schema; every
case's change.diff applied to base reproduces the build-tree output (triangle
consistency); prettier clean repo-wide.

Tooling guards: .prettierignore and eslint corpus/** ignore keep fixtures
byte-stable (reformatting a fixture would silently invalidate its diff).
Self-contained unified-diff parser (no third-party diff lib) producing a
structured FileDiff model with create/modify/delete ops matching the manifest
taxonomy, per-line old/new line numbers, hunks, and binary/rename handling.

- applyFileDiff reconstructs post-change content from base + diff (SPEC §6.2),
  throwing on a wrong-base mismatch rather than silently mis-patching.
- query helpers: changedPaths, findFile, hunkCount, added/removedLines.
- 54 tests incl. a corpus oracle that reconstructs every created/modified file
  from base + change.diff and asserts byte-equality with overlay/ (triangle
  consistency now enforced in code).

Green in isolation: build, typecheck, tests, eslint, prettier.
Language-agnostic structural symbol extraction (SPEC §5.1) over web-tree-sitter
WASM grammars (TS/TSX/Python/Go). Answers 'does a declaration of this name+kind
exist, and where' — structure only, never behavior, no detector logic.

- WASM runtime (no native compile); pinned web-tree-sitter@0.22.6 to match the
  prebuilt tree-sitter-wasms grammar ABI. Grammars vendored under grammars/ for a
  self-contained runtime; binary wasm git-tracked and prettier-ignored.
- Node-kind maps grounded by probing the real grammars; shallow recursion
  (top-level + class methods, not function bodies) keeps undeclared detection
  low-noise. Python module bindings satisfy both constant and variable (the
  distinction is convention = semantic = out of scope).
- API: extractSymbols, locateSymbol/symbolMatches, diffSymbols (added/removed/
  modified via deterministic declaration-text compare), langFromPath. SymbolKind
  re-exported from @attest/schema.
- 18 tests incl. a corpus oracle over the honest fixtures' post-change sources;
  built-dist smoke test confirms the runtime grammar path.

Green in isolation: build, typecheck, tests, eslint, prettier.
WU5 — clean-rebuild @attest/core to v1.0 (drops ts-morph/parse-diff/detectors):
- Sources reconstructs post-change content from base + diff (no worktree needed
  for structural verification); caches base/post content + symbols.
- Verifiers: file_change (diff op), symbol_* (diffSymbols + locateSymbol),
  test_* (hunk + test-file class + structural covers reference; unconfirmable ->
  unverifiable), outcome (injected runner results; core never shells out), and
  unknown/behavioral kinds -> unverifiable unsupported_claim_kind (never fails).
- Undeclared moat: diff-ordered file + intra-file symbol drift, allowlist
  (lockfiles/generated dirs) suppression; test files skipped for symbol drift.
- Exit policy per SPEC 6.6. 27 tests incl. the corpus oracle: verify() conforms
  to all 13 expected-verdict.json across TS/Py/Go.

WU6 — new @attest/runner:
- git worktree isolation (apply diff -> post-change state, run, always cleanup);
  never runs in the live tree. Container isolation stays a Phase-3 gap.
- command resolution: explicit config else auto-detect (Node/Go/Python/Make);
  unresolved -> omitted so core reports unverifiable, never guesses.
- truncated logs, timeout -> exit 124. 16 tests incl. real-git isolation,
  diff application, cleanup. RunOutcomes is assignable to core OutcomeResults.

Migrated + green: schema, diff, symbols, core, runner (140 tests).
Complete rewrite: drop v0.1 API (parseDiffContent, registerDetectors,
@attest/detectors-ts dep). Wire parseDiff + verify + runOutcomes behind
the attest verify command; add attest schema [manifest|verdict]; load
attest.config.json for runner + allowlist config; human and JSON
renderers against Verdict/Manifest v1.0 types.

All 6 Phase-1 packages green: 146 tests total.
The v0.1 detectors-ts package consumed a Claim shape (target.kind:
"endpoint", verification_contract.check: "behavior_present") that the
v1.0 schema no longer accepts, and pointed at a behavioural claim kind
v1.0 explicitly rejects as unverifiable with the LLM-review pointer.
The package has been broken at the type level since WU5 (12 TS2305/TS2339
errors on the type re-exports) and the pre-push build gate failed on it.

Demote it to an opt-in, best-effort, advisory plugin with no path into
verdict.exit_code. New public surface:

  - runDetectors({ diff, repoRoot }) -> DetectorOutput[]
  - detectAuthentication({ path, symbol, content }) -> DetectorOutput
  - findRoutesInFile(path, content) -> string[]
  - DETECTOR_WARNINGS (carried on every output)

DetectorOutput.status is one of {advisory_present, advisory_absent,
advisory_inconclusive} — never verified / failed / unverifiable (those
belong to the closed ClaimResult taxonomy owned by @attest/core).

Structural guarantees:
  - grep -rn "@attest/detectors" packages/core packages/cli packages/runner
    -> empty
  - grep -rn "registerDetectors" packages/detectors-ts/src -> empty
  - packages/detectors-ts has no import from @attest/core or @attest/schema
  - verdict.exit_code is computed in @attest/core/verify.ts and depends
    only on claimResults and undeclared; this package touches neither

Tests: 34 (was 25). All 18 v0.1 fixtures kept verbatim, verdict vocabulary
translated to advisory status at test time. 11 new tests for runDetectors
and the new public surface. Coverage 91.11% lines (threshold 85%).

All 7 Phase-1 packages green: 180 tests total.
- corpus acceptance test: 13 cases (ts/py/go) × 3 assertions = 30 tests
  in packages/cli/test/corpus.test.ts; stable projection per SPEC §10
- corpus tooling: build-tree.sh now git-init+commits the base; README
  corrected (base-only --repo-root, not base+overlay)
- attest.config.json added to corpus/{ts,py,go}/base/ with explicit
  test_cmd/build_cmd (worktree has no node_modules/pytest pre-installed)
- CI: corpus-acceptance job (Node 20 + Python 3.12 + Go 1.22)
- README.md: rewrite for v1.0 (20-min quickstart, multi-language,
  removed v0.1 references)
- BUILD_LOG.md: WU9 entry

All 7 packages green: 210 tests (180 + 30).
Closed the corpus coverage gap flagged in WU2: Py/Go now carry the full
7-case set (honest, lying, partial, undeclared, allowlisted, outcome-fail,
behavioral). The oracle is now 21 cases (7 x 3 languages) and is the
regression target for v1.0.

Cases added (8):
- py/{partial,allowlisted,outcome-fail,behavioral}
- go/{partial,allowlisted,outcome-fail,behavioral}

Base updates (pre-change only, does not affect other cases' diffs):
- corpus/py/base/poetry.lock (minimal lockfile so the allowlisted case
  has a pre-existing lockfile to modify)
- corpus/go/base/go.sum (same rationale)

Test fixes:
- packages/core/test/corpus.test.ts: hard-coded case count 13 -> 21
- packages/diff/test/corpus.test.ts: uses toBeGreaterThan(0), no fix needed

Release plumbing:
- .changeset/v1.0.0.md: marks all 7 packages as major with release notes
- .gitignore: remove .changeset/*.md (changesets must be tracked)
- corpus/README.md: coverage matrix updated to all green

All 7 packages green: 254 tests (210 + 44).
- core: 27 -> 35 (+8, one verify() test per new case)
- diff: 54 -> 78 (+24, 3 reconstruction assertions per new case x 8)
- cli:  36 -> 48 (+12, end-to-end corpus.test.ts)

Phase 1 ship-readiness: package-level green, corpus-level green, release
plumbing in place. Ready to merge to main and cut v1.0.0.
…U11, WU12, WU14)

WU14 — legible manifest errors
- @attest/schema: formatValidationError/formatValidationErrors turn ajv errors
  into path→problem→fix lines with concrete enums and patterns.
- CLI surfaces them on stderr and exits 2 for malformed manifests (distinct
  from exit 1 for verification-fail, so CI signals stay meaningful).
- 10 schema formatter tests + 3 negative CLI corpus cases.

WU11 — npx-installable self-contained CLI
- tsup bundles @attest/* workspace deps (noExternal) so the published
  package has no workspace:* deps. web-tree-sitter stays external
  (CJS, dynamic require breaks ESM bundle) and is a runtime dep.
- schemas inlined via 'with { type: "json" }' import attributes
  + module: esnext + resolveJsonModule — no dist/*/*.json layout coupling.
- setGrammarsDir() exported from @attest/symbols; CLI startup overrides
  the wasm path to <cli>/grammars so the bundled tree-sitter still finds
  grammars. copy-grammars.mjs copies 4 wasm files from symbols into cli/grammars.
- strip-workspace-deps.mjs (prepack) rewrites package.json to publish-ready
  form and backs up the dev copy; restore-package.mjs (postpack) restores.
- Attestation: 'npm pack' produces a 771KB tarball that installs in <4s
  on a clean machine with no workspace:* and runs 'attest verify' against
  the corpus from outside the repo (pass and fail both).

WU12 — agent ergonomics
- docs/manifest-contract.md: paste-in block for agent instructions (closed
  claim taxonomy, declared_scope rule, exit code table, minimal example).
- 'attest init --diff <path> --repo-root <dir>' produces a deterministic
  manifest skeleton: 1 file_change per touched file, symbol_added/removed/
  modified derived from tree-sitter extraction against git show HEAD
  (pre) and the current worktree (post), test_added/test_modified for
  files matching the test-path heuristic. declared_scope.files is the
  full set of touched paths. Same diff + worktree = same skeleton, byte-
  for-byte (modulo task.description, agent.id, generated_at).
- README: 'Generating a manifest from a diff' section + npx install.
- 6 init tests cover skeleton shape, claim id sequence, --task/--description
  /--agent flags, and error paths (empty diff → 65, missing repo-root → 66).

Test counts: 254 → 273 (schema 35, diff 78, symbols 18, detectors-ts 34,
core 35, runner 16, cli 57). All 7 packages lint, build, and test green.
- action.yml (composite): inputs manifest, diff, repo-root, format, version;
  runs 'npx @attest/cli@<version> verify ...' and propagates the exit code.
  Outputs result, exit-code, verdict (when format=json).
- .github/workflows/attest-fixture.yml: live fixture — materialises the
  corpus/ts/base repo, runs the honest case (expect pass) and the lying
  case (expect non-zero exit + result=fail) against the action. This is
  the marketplace acceptance test from MVP_PLAN §WU13.
- Marketplace branding: icon 'check-circle', color 'blue'. Authored under
  ree2raz/attest. Pinned to a specific version by default (1.0.0) so users
  opt into 'latest' explicitly.
- 7 action.test.ts tests cover: YAML structure of action.yml (inputs,
  outputs, branding, the npx step), the example workflow references './'
  as the action source, and the underlying CLI call behaves correctly on
  the corpus honest + lying fixtures (exit 0 / exit 1, result pass / fail).
  The e2e test uses the local built dist, which is byte-identical to the
  tarball artifact; on GitHub Actions the npx call resolves to the
  published package.

Test counts: 273 → 280 (cli 57 → 64, +7 action tests). Lint, build, and
test green across all 7 packages.
… copy

- scripts/demo.sh: turnkey reproduction of the gotcha. 'demo.sh lying'
  materialises corpus/ts/base, runs the lying case, and prints the
  transcript with the failed claim annotated. Pass CLI='npx --yes
  @attest/cli@1.0.0' to record against the published tarball; default
  is the local build.
- docs/demo/{honest,lying,partial}-case.txt: static captured
  transcripts of the three demo cases.
- docs/launch/show-hn.md: Show HN post draft — leads with the bug the
  community has lived, links the manifest contract doc, ends with the
  security model caveat. Headline is 'deterministic checker for what
  your AI agent actually changed,' not 'compliance' or 'provenance.'
- docs/launch/community-seed.md: per-community (Cursor, Claude Code,
  Aider, ML, HN) variants of the same story, with a 'posting tips'
  section on what to lead with and what to defer.
- README: 30-second pitch at the top (the npx one-liner + what each
  exit code means), 5-minute quickstart with the demo script as the
  fastest path, GitHub Action as step 4, manifest format / init /
  config / security model / packages / corpus sections in the
  middle, 'Contributing / building from source' section at the end
  (source build moved out of the happy path), GitHub Action section
  added between Configuration and Contributing, '@attest/cli' package
  description updated to mention verify + init + schema.

All 280 tests still pass; lint clean; build clean.
Documents the 4-commit release-readiness batch (WU11, WU12, WU13, WU14,
WU15) with per-WU acceptance notes, the test count delta (254 -> 280),
and a checklist against the MVP done gate from docs/MVP_PLAN.md. All 7
gate criteria are now satisfied; the branch is shippable once merged
and the npm package + GitHub Action are published.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant