feat(v1.0): Phase 1 ship-readiness — 21-case corpus + release plumbing#2
Open
ree2raz wants to merge 13 commits into
Open
feat(v1.0): Phase 1 ship-readiness — 21-case corpus + release plumbing#2ree2raz wants to merge 13 commits into
ree2raz wants to merge 13 commits into
Conversation
Replace the v0.1 schema package wholesale (clean-rebuild) with the v1.0 manifest/verdict/audit contracts from SPEC §4. - manifest.schema.json (§4.1): attest_version, task, agent, generated_at, declared_scope.files, and the closed claim taxonomy (file_change, symbol_added/removed/modified, test_added/modified, outcome). Known kinds validated strictly via if/then; unknown kinds pass validation (the verifier reports them unverifiable/unsupported_claim_kind, never rejected here), and a smuggled semantic description is allowed-but-ignored. - verdict.schema.json (§4.2): result, exit_code, per-claim status with reason required on failed/unverifiable, undeclared_changes, summary. - audit.schema.json (§4.3): PROVISIONAL stub; finalized in Phase 3. - validator.ts: createManifestValidator/createVerdictValidator. The v0.1 behavior_present semantic params check is removed with the semantic model. Raw schemas exported for the future `attest schema` command. - Bumped to 1.0.0; build copies all three schema JSONs to dist/. Schema package is green in isolation (build, typecheck, 25 tests, eslint, prettier). Downstream core/cli/detectors-ts remain red until their work units.
Build corpus/ as the backbone test oracle (SPEC §10). Each case = a per-language base/ repo + an overlay/ (post-change files only) + manifest.json + generated change.diff + expected-verdict.json. Working tree = base overlaid with overlay; change.diff = git diff(base -> tree), regenerable by tools/generate-diffs.sh. Cases: TypeScript full set (honest, lying, partial, undeclared, allowlisted, outcome-fail, behavioral); Python and Go core trio (honest, lying, undeclared). 13 cases total. Conventions (corpus/README.md): the WU5/WU9 harness asserts the stable projection of the verdict (result, exit_code, summary, per-claim id+status with reason-presence, undeclared_changes fields) but not evidence or exact reason text. Allowlisted changes are listed with severity=suppressed and excluded from summary.undeclared. The behavioral case expects unverifiable + pass/exit 0 (unverifiable is an allowed status, §6.6) — the Camp-3 guard. Verified: all 13 manifests + 13 verdicts validate against @attest/schema; every case's change.diff applied to base reproduces the build-tree output (triangle consistency); prettier clean repo-wide. Tooling guards: .prettierignore and eslint corpus/** ignore keep fixtures byte-stable (reformatting a fixture would silently invalidate its diff).
Self-contained unified-diff parser (no third-party diff lib) producing a structured FileDiff model with create/modify/delete ops matching the manifest taxonomy, per-line old/new line numbers, hunks, and binary/rename handling. - applyFileDiff reconstructs post-change content from base + diff (SPEC §6.2), throwing on a wrong-base mismatch rather than silently mis-patching. - query helpers: changedPaths, findFile, hunkCount, added/removedLines. - 54 tests incl. a corpus oracle that reconstructs every created/modified file from base + change.diff and asserts byte-equality with overlay/ (triangle consistency now enforced in code). Green in isolation: build, typecheck, tests, eslint, prettier.
Language-agnostic structural symbol extraction (SPEC §5.1) over web-tree-sitter WASM grammars (TS/TSX/Python/Go). Answers 'does a declaration of this name+kind exist, and where' — structure only, never behavior, no detector logic. - WASM runtime (no native compile); pinned web-tree-sitter@0.22.6 to match the prebuilt tree-sitter-wasms grammar ABI. Grammars vendored under grammars/ for a self-contained runtime; binary wasm git-tracked and prettier-ignored. - Node-kind maps grounded by probing the real grammars; shallow recursion (top-level + class methods, not function bodies) keeps undeclared detection low-noise. Python module bindings satisfy both constant and variable (the distinction is convention = semantic = out of scope). - API: extractSymbols, locateSymbol/symbolMatches, diffSymbols (added/removed/ modified via deterministic declaration-text compare), langFromPath. SymbolKind re-exported from @attest/schema. - 18 tests incl. a corpus oracle over the honest fixtures' post-change sources; built-dist smoke test confirms the runtime grammar path. Green in isolation: build, typecheck, tests, eslint, prettier.
WU5 — clean-rebuild @attest/core to v1.0 (drops ts-morph/parse-diff/detectors): - Sources reconstructs post-change content from base + diff (no worktree needed for structural verification); caches base/post content + symbols. - Verifiers: file_change (diff op), symbol_* (diffSymbols + locateSymbol), test_* (hunk + test-file class + structural covers reference; unconfirmable -> unverifiable), outcome (injected runner results; core never shells out), and unknown/behavioral kinds -> unverifiable unsupported_claim_kind (never fails). - Undeclared moat: diff-ordered file + intra-file symbol drift, allowlist (lockfiles/generated dirs) suppression; test files skipped for symbol drift. - Exit policy per SPEC 6.6. 27 tests incl. the corpus oracle: verify() conforms to all 13 expected-verdict.json across TS/Py/Go. WU6 — new @attest/runner: - git worktree isolation (apply diff -> post-change state, run, always cleanup); never runs in the live tree. Container isolation stays a Phase-3 gap. - command resolution: explicit config else auto-detect (Node/Go/Python/Make); unresolved -> omitted so core reports unverifiable, never guesses. - truncated logs, timeout -> exit 124. 16 tests incl. real-git isolation, diff application, cleanup. RunOutcomes is assignable to core OutcomeResults. Migrated + green: schema, diff, symbols, core, runner (140 tests).
Complete rewrite: drop v0.1 API (parseDiffContent, registerDetectors, @attest/detectors-ts dep). Wire parseDiff + verify + runOutcomes behind the attest verify command; add attest schema [manifest|verdict]; load attest.config.json for runner + allowlist config; human and JSON renderers against Verdict/Manifest v1.0 types. All 6 Phase-1 packages green: 146 tests total.
The v0.1 detectors-ts package consumed a Claim shape (target.kind:
"endpoint", verification_contract.check: "behavior_present") that the
v1.0 schema no longer accepts, and pointed at a behavioural claim kind
v1.0 explicitly rejects as unverifiable with the LLM-review pointer.
The package has been broken at the type level since WU5 (12 TS2305/TS2339
errors on the type re-exports) and the pre-push build gate failed on it.
Demote it to an opt-in, best-effort, advisory plugin with no path into
verdict.exit_code. New public surface:
- runDetectors({ diff, repoRoot }) -> DetectorOutput[]
- detectAuthentication({ path, symbol, content }) -> DetectorOutput
- findRoutesInFile(path, content) -> string[]
- DETECTOR_WARNINGS (carried on every output)
DetectorOutput.status is one of {advisory_present, advisory_absent,
advisory_inconclusive} — never verified / failed / unverifiable (those
belong to the closed ClaimResult taxonomy owned by @attest/core).
Structural guarantees:
- grep -rn "@attest/detectors" packages/core packages/cli packages/runner
-> empty
- grep -rn "registerDetectors" packages/detectors-ts/src -> empty
- packages/detectors-ts has no import from @attest/core or @attest/schema
- verdict.exit_code is computed in @attest/core/verify.ts and depends
only on claimResults and undeclared; this package touches neither
Tests: 34 (was 25). All 18 v0.1 fixtures kept verbatim, verdict vocabulary
translated to advisory status at test time. 11 new tests for runDetectors
and the new public surface. Coverage 91.11% lines (threshold 85%).
All 7 Phase-1 packages green: 180 tests total.
- corpus acceptance test: 13 cases (ts/py/go) × 3 assertions = 30 tests
in packages/cli/test/corpus.test.ts; stable projection per SPEC §10
- corpus tooling: build-tree.sh now git-init+commits the base; README
corrected (base-only --repo-root, not base+overlay)
- attest.config.json added to corpus/{ts,py,go}/base/ with explicit
test_cmd/build_cmd (worktree has no node_modules/pytest pre-installed)
- CI: corpus-acceptance job (Node 20 + Python 3.12 + Go 1.22)
- README.md: rewrite for v1.0 (20-min quickstart, multi-language,
removed v0.1 references)
- BUILD_LOG.md: WU9 entry
All 7 packages green: 210 tests (180 + 30).
Closed the corpus coverage gap flagged in WU2: Py/Go now carry the full
7-case set (honest, lying, partial, undeclared, allowlisted, outcome-fail,
behavioral). The oracle is now 21 cases (7 x 3 languages) and is the
regression target for v1.0.
Cases added (8):
- py/{partial,allowlisted,outcome-fail,behavioral}
- go/{partial,allowlisted,outcome-fail,behavioral}
Base updates (pre-change only, does not affect other cases' diffs):
- corpus/py/base/poetry.lock (minimal lockfile so the allowlisted case
has a pre-existing lockfile to modify)
- corpus/go/base/go.sum (same rationale)
Test fixes:
- packages/core/test/corpus.test.ts: hard-coded case count 13 -> 21
- packages/diff/test/corpus.test.ts: uses toBeGreaterThan(0), no fix needed
Release plumbing:
- .changeset/v1.0.0.md: marks all 7 packages as major with release notes
- .gitignore: remove .changeset/*.md (changesets must be tracked)
- corpus/README.md: coverage matrix updated to all green
All 7 packages green: 254 tests (210 + 44).
- core: 27 -> 35 (+8, one verify() test per new case)
- diff: 54 -> 78 (+24, 3 reconstruction assertions per new case x 8)
- cli: 36 -> 48 (+12, end-to-end corpus.test.ts)
Phase 1 ship-readiness: package-level green, corpus-level green, release
plumbing in place. Ready to merge to main and cut v1.0.0.
…U11, WU12, WU14)
WU14 — legible manifest errors
- @attest/schema: formatValidationError/formatValidationErrors turn ajv errors
into path→problem→fix lines with concrete enums and patterns.
- CLI surfaces them on stderr and exits 2 for malformed manifests (distinct
from exit 1 for verification-fail, so CI signals stay meaningful).
- 10 schema formatter tests + 3 negative CLI corpus cases.
WU11 — npx-installable self-contained CLI
- tsup bundles @attest/* workspace deps (noExternal) so the published
package has no workspace:* deps. web-tree-sitter stays external
(CJS, dynamic require breaks ESM bundle) and is a runtime dep.
- schemas inlined via 'with { type: "json" }' import attributes
+ module: esnext + resolveJsonModule — no dist/*/*.json layout coupling.
- setGrammarsDir() exported from @attest/symbols; CLI startup overrides
the wasm path to <cli>/grammars so the bundled tree-sitter still finds
grammars. copy-grammars.mjs copies 4 wasm files from symbols into cli/grammars.
- strip-workspace-deps.mjs (prepack) rewrites package.json to publish-ready
form and backs up the dev copy; restore-package.mjs (postpack) restores.
- Attestation: 'npm pack' produces a 771KB tarball that installs in <4s
on a clean machine with no workspace:* and runs 'attest verify' against
the corpus from outside the repo (pass and fail both).
WU12 — agent ergonomics
- docs/manifest-contract.md: paste-in block for agent instructions (closed
claim taxonomy, declared_scope rule, exit code table, minimal example).
- 'attest init --diff <path> --repo-root <dir>' produces a deterministic
manifest skeleton: 1 file_change per touched file, symbol_added/removed/
modified derived from tree-sitter extraction against git show HEAD
(pre) and the current worktree (post), test_added/test_modified for
files matching the test-path heuristic. declared_scope.files is the
full set of touched paths. Same diff + worktree = same skeleton, byte-
for-byte (modulo task.description, agent.id, generated_at).
- README: 'Generating a manifest from a diff' section + npx install.
- 6 init tests cover skeleton shape, claim id sequence, --task/--description
/--agent flags, and error paths (empty diff → 65, missing repo-root → 66).
Test counts: 254 → 273 (schema 35, diff 78, symbols 18, detectors-ts 34,
core 35, runner 16, cli 57). All 7 packages lint, build, and test green.
- action.yml (composite): inputs manifest, diff, repo-root, format, version; runs 'npx @attest/cli@<version> verify ...' and propagates the exit code. Outputs result, exit-code, verdict (when format=json). - .github/workflows/attest-fixture.yml: live fixture — materialises the corpus/ts/base repo, runs the honest case (expect pass) and the lying case (expect non-zero exit + result=fail) against the action. This is the marketplace acceptance test from MVP_PLAN §WU13. - Marketplace branding: icon 'check-circle', color 'blue'. Authored under ree2raz/attest. Pinned to a specific version by default (1.0.0) so users opt into 'latest' explicitly. - 7 action.test.ts tests cover: YAML structure of action.yml (inputs, outputs, branding, the npx step), the example workflow references './' as the action source, and the underlying CLI call behaves correctly on the corpus honest + lying fixtures (exit 0 / exit 1, result pass / fail). The e2e test uses the local built dist, which is byte-identical to the tarball artifact; on GitHub Actions the npx call resolves to the published package. Test counts: 273 → 280 (cli 57 → 64, +7 action tests). Lint, build, and test green across all 7 packages.
… copy
- scripts/demo.sh: turnkey reproduction of the gotcha. 'demo.sh lying'
materialises corpus/ts/base, runs the lying case, and prints the
transcript with the failed claim annotated. Pass CLI='npx --yes
@attest/cli@1.0.0' to record against the published tarball; default
is the local build.
- docs/demo/{honest,lying,partial}-case.txt: static captured
transcripts of the three demo cases.
- docs/launch/show-hn.md: Show HN post draft — leads with the bug the
community has lived, links the manifest contract doc, ends with the
security model caveat. Headline is 'deterministic checker for what
your AI agent actually changed,' not 'compliance' or 'provenance.'
- docs/launch/community-seed.md: per-community (Cursor, Claude Code,
Aider, ML, HN) variants of the same story, with a 'posting tips'
section on what to lead with and what to defer.
- README: 30-second pitch at the top (the npx one-liner + what each
exit code means), 5-minute quickstart with the demo script as the
fastest path, GitHub Action as step 4, manifest format / init /
config / security model / packages / corpus sections in the
middle, 'Contributing / building from source' section at the end
(source build moved out of the happy path), GitHub Action section
added between Configuration and Contributing, '@attest/cli' package
description updated to mention verify + init + schema.
All 280 tests still pass; lint clean; build clean.
Documents the 4-commit release-readiness batch (WU11, WU12, WU13, WU14, WU15) with per-WU acceptance notes, the test count delta (254 -> 280), and a checklist against the MVP done gate from docs/MVP_PLAN.md. All 7 gate criteria are now satisfied; the branch is shippable once merged and the npm package + GitHub Action are published.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes the corpus coverage gap flagged in WU2 and adds release plumbing for v1.0.
This PR is the cumulative ship-readiness cut — it includes WU1–WU10 (10 work
units) and brings the repo to a taggable, publishable v1.0 state.
What's in this PR (cumulative)
What this PR adds on top of WU9 (the WU10 delta)
copying the TS pattern. The oracle is now 21 cases (7 × 3 languages) — the
full regression target for v1.0.
corpus/py/base/poetry.lock(minimal)corpus/go/base/go.sum(minimal)packages/core/test/corpus.test.tshard-coded count 13 → 21..changeset/v1.0.0.md—@changesets/clientry, all 7 packagesmajor..gitignore— removed erroneous.changeset/*.md(changesets must be tracked).corpus/README.md— coverage matrix updated to all-✅.SPEC §6.7 acceptance gate (all 7)
attest verifyon real repo in TS, Python, and Go (21 cases).suppresses allowlisted (lockfile) changes.
true exit code (verified against
outcome-failfixture).unverifiablewith LLM-review pointer; neverfalls through to a heuristic.
(
.github/workflows/ci.ymlcorpus-acceptancejob: Node 20 + Py 3.12 + Go 1.22).guarantee per WU8).
Test counts (delta)
Green: lint ✓, build ✓, 254 tests ✓ (locally; CI exercises the full 21-case
corpus including the Go cases skipped locally when
gois not on PATH).What's next (out of this PR — Phase 2 per SPEC)
action.yml)attest derivefor spec-kittasks.md, Kiro, EARS@attest/auditpackage (Phase 3)Checklist
pnpm lint && pnpm build && pnpm testgreen