Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 10 additions & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ COMPARE_RENDERED_CROPS_LEFT ?= $(VERIFY_RENDERED_CROPS_OUT)/run1
COMPARE_RENDERED_CROPS_RIGHT ?= $(VERIFY_RENDERED_CROPS_OUT)/run2
LAYOUT_EVALUATOR_OUT ?= $(ROOT)/target/layout-evaluator-alpha

.PHONY: verify-alpha verify-alpha-tree verify-rendered-crops compare-rendered-crops layout-evaluator-alpha python-surface-test release-hygiene release-advisory third-party-license-manifest release-notice-draft
.PHONY: verify-alpha verify-alpha-tree verify-rendered-crops compare-rendered-crops layout-evaluator-alpha python-surface-test milestone-b-internal-checks release-hygiene release-advisory third-party-license-manifest release-notice-draft

$(ETHOS_BIN):
cargo build --locked -p ethos-cli
Expand Down Expand Up @@ -48,6 +48,15 @@ layout-evaluator-alpha:
python-surface-test:
PYTHONPATH=$(ROOT)/python $(PYTHON) -m unittest discover -s python/tests

milestone-b-internal-checks:
$(PYTHON) fixtures/validate_fixtures.py
$(MAKE) verify-alpha PYTHON=$(PYTHON)
$(MAKE) layout-evaluator-alpha PYTHON=$(PYTHON)
$(MAKE) python-surface-test PYTHON=$(PYTHON)
$(PYTHON) .github/scripts/claims_gate.py
$(PYTHON) .github/scripts/readiness_gate.py public
git diff --check

release-hygiene:
cargo metadata --locked --offline --format-version 1 --no-deps >/dev/null
$(CARGO_DENY) --version
Expand Down
33 changes: 33 additions & 0 deletions crates/ethos-cli/tests/fixtures/font-isolation/manifest.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
{
"manifest_version": 1,
"root": "crates/ethos-cli/tests/fixtures/font-isolation",
"fixtures": [
{
"id": "cid-cjk-like",
"file": "cid-cjk-like.pdf",
"sha256": "9eb58cc130b6d4d03d3cb7c0d6c71fbd5bdbc6839c51e2fc14f6dea9d9514d0e",
"subsets": ["fonts", "cid"],
"expected_behavior": "deterministic_success_or_stable_error",
"provenance": "Synthetic PDF generated by Ethos maintainers for internal CID/CJK-like font isolation checks.",
"license": "Apache-2.0"
},
{
"id": "missing-font",
"file": "missing-font.pdf",
"sha256": "bdc7633fb937493153fd063d561bf066a2a226aaf43913f98a514cc92610511c",
"subsets": ["fonts", "missing_font"],
"expected_behavior": "deterministic_substitution_id",
"provenance": "Synthetic PDF generated by Ethos maintainers for internal missing-font substitution isolation checks.",
"license": "Apache-2.0"
},
{
"id": "standard14-fonts",
"file": "standard14-fonts.pdf",
"sha256": "af767edb5d64ded23fa4c23c35f93cf3765b5952905c46cde3e7179d9660d421",
"subsets": ["fonts", "standard14"],
"expected_behavior": "deterministic_substitution_id",
"provenance": "Synthetic PDF generated by Ethos maintainers for internal Standard 14 font substitution isolation checks.",
"license": "Apache-2.0"
}
]
}
6 changes: 3 additions & 3 deletions docs/execution-status.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,10 +17,10 @@ The committed implementation now includes:
- `ethos doc parse` / `ethos fingerprint` PDF execution through a worker process with `max_parse_ms` timeout enforcement, stable error-envelope relay, diagnostics-gated worker stderr, and page-range validation/filtering.
- Quantized page/span extraction at the backend boundary, plus a basic deterministic layout pass that assembles paragraph `text_block` elements, fixture-backed alpha heading and flat list-item elements, and simple column reading order for the current born-digital fixtures. Current alpha layout confidence is explicit for heading signals, and below-threshold layout confidence emits deterministic `low_confidence_reading_order` diagnostics instead of staying silent. Fixture validation binds selected `fixture.json` expectations to committed extraction/layout goldens and binds current alpha text/Markdown exports to committed layout output so current read-order, element-type, heading-export, list-item, and export cases fail closed on drift.
- An internal layout evaluator scaffold exists at `fixtures/evaluate_layout_alpha.py` and `make layout-evaluator-alpha`. It reads committed `fixture.json`, `extraction.json`, `layout.json`, `text.txt`, and `markdown.md` files, summarizes alpha element-type and subset coverage, and fails closed on missing layout expectations, dangling/invalid warning references, confidence-policy drift, export-golden drift, invalid span expectation metadata, expected page/span-text/font-id drift, expected rotation drift, or drift in fixture-backed reading order / heading / list-item / hyphenation / ligature cases. PR CI runs the evaluator and has a static workflow guard for that wiring.
- Schema/example/profile validation is green through `schemas/validate_examples.py` using `jsonschema` draft 2020-12 validation, including the crop descriptor artifact contract plus referential-integrity and bbox sanity checks outside JSON Schema.
- Schema/example/profile validation is green through `schemas/validate_examples.py` using `jsonschema` draft 2020-12 validation, including the crop descriptor artifact contract plus referential-integrity and bbox sanity checks outside JSON Schema. Fixture validation also binds internal font-isolation PDFs to committed manifest hashes.
- `ethos verify` now produces non-empty quote, value, presence, and table-cell verification checks over native Ethos document JSON and synthetic OpenDataLoader-style JSON through `--grounding opendataloader-json`; it also verifies quote/value/presence citations over pinned real OpenDataLoader 2.4.7 JSON, including grounded and ungrounded cases, maps explicit real OpenDataLoader-style row/cell structures to table-cell grounding, and normalizes conservative real-style text/child-container aliases when page/bbox/text data remains explicit. Citation/config inputs are rejected when they drift outside the closed schemas. The public demo harness covers grounded, ungrounded, split-quote, not-found, stale-fingerprint, unsupported non-v1 claim, capability-limited, malformed-citation, malformed OpenDataLoader-style input, and summary-format reject paths.
- Verification semantics are now trust-honest at alpha scope: quote containment is explicitly labeled, value/table-cell checks require normalized equality, fingerprint-pinned citations fail closed when source fingerprints are unavailable, and structured capability limits explain why a run is downgraded.
- `make verify-alpha` is the current alpha trust-loop command: it checks native examples, split-quote evidence matching, unsupported non-v1 claim reporting, synthetic OpenDataLoader-style examples, pinned real OpenDataLoader grounded/ungrounded examples, schema validation, verify-alpha case inventory coverage, usage diagnostics for malformed citations and malformed OpenDataLoader-style structures, byte-identical repeated verification reports, byte-identical native crop descriptors, summary diagnostics for an ungrounded native case, and foreign fixture manifest hash binding.
- `make verify-alpha` is the current alpha trust-loop command: it checks native examples, split-quote evidence matching, unsupported non-v1 claim reporting, synthetic OpenDataLoader-style examples, pinned real OpenDataLoader grounded/ungrounded examples, schema validation, verify-alpha case inventory coverage, usage diagnostics for malformed citations and malformed OpenDataLoader-style structures, byte-identical repeated verification reports, byte-identical native crop descriptors, summary diagnostics for an ungrounded native case, and foreign fixture manifest hash binding. `make milestone-b-internal-checks` composes the current internal Milestone B validation path across fixture validation, verify alpha, layout evaluator, Python surface tests, and policy gates.
- An internal Python surface scaffold exists under `python/ethos_pdf`. It shells out to a caller-provided local `ethos` CLI binary for `ethos doc parse` JSON, Markdown, and text output, and has stdlib unit tests that use a fake local command. This is pre-alpha scaffolding for Milestone B API shape work, not a public installation or publication path.
- Native Ethos verification can emit deterministic, schema-backed crop descriptor JSON artifacts through `--crop-dir`; these bind `document_fingerprint`, page, bbox, and check ids. Native `crop_ref` filenames are logical evidence references derived from document fingerprint, check id, and page, while descriptors still record the exact observed bbox. When `--crop-source-pdf` is supplied, the CLI validates source-PDF fingerprint binding and emits PNG crop artifacts whose filenames, byte hashes, dimensions, and source fingerprint are bound from the descriptor. `make verify-rendered-crops` checks same-host repeated-run stability for the rendered artifact path, and `make compare-rendered-crops` classifies two rendered-crop runs by separating logical evidence identity from rendered artifact byte equality. Cross-platform rendered image determinism is not claimed; the 2026-06-14 macOS arm64 vs Linux x64 validation record in `docs/validation/rendered-crops-2026-06-14.md` preserved document fingerprint and `payload_sha256` but failed rendered artifact byte equality because the evidence bbox differed slightly across platforms.

Expand Down Expand Up @@ -55,7 +55,7 @@ Milestone A has an accepted internal Gate Zero decision for roadmap control, so
| Layout groundwork | Landed: basic paragraph text blocks, fixture-backed alpha heading and flat list-item elements, simple column reading order over quantized spans, explicit alpha heading-confidence values, deterministic below-threshold confidence diagnostics, fixture metadata checks against committed extraction/layout goldens for current read-order and element-type expectations, and alpha text/Markdown export goldens derived from committed layout output | Tables, nested/richer list and heading semantics, broader rotation/quirk handling, and broader confidence dimensions remain future work |
| Layout evaluator scaffold | Landed: deterministic internal evaluator over committed extraction/layout fixture expectations, with heading/list/reading-order/rotation/hyphenation/ligature/font-identity/span-expectation coverage checks, expected page/span-text/font-id checks, expected-spans metadata validation, warning-reference checks, confidence-policy checks, text/Markdown export-golden checks, expectation drift diagnostics, report JSON, Make target, unit coverage, PR CI wiring, and static CI workflow guard coverage | Broader evaluator dimensions remain future work |
| Python surface scaffold | Landed: internal stdlib wrapper over a caller-provided local `ethos doc parse` command, with explicit JSON/Markdown/text methods, page selection passthrough, diagnostics passthrough, timeout handling, command failure reporting, and mocked-command unit coverage | Native binding work, broader API design, and public setup path remain future work |
| Font policy groundwork | Partially landed: substitution table and profile policy are present; substitution-table bytes are pinned by the deterministic profile and checked by schema/example validation; absent bundled fallback assets must remain represented by a null fallback-bundle hash; fixture output uses deterministic substitution IDs, committed embedded-font fixture metadata now binds expected extraction font identity, and document schema/font extraction keep emitted font ids inside the deterministic ASCII `embedded:` / `subst:` contract | Bundled fallback asset introduction/hash pinning and broader font/CID validation remain open |
| Font policy groundwork | Partially landed: substitution table and profile policy are present; substitution-table bytes are pinned by the deterministic profile and checked by schema/example validation; absent bundled fallback assets must remain represented by a null fallback-bundle hash; fixture output uses deterministic substitution IDs, committed embedded-font fixture metadata now binds expected extraction font identity, document schema/font extraction keep emitted font ids inside the deterministic ASCII `embedded:` / `subst:` contract, and CLI font-isolation PDFs are manifest/hash-bound | Bundled fallback asset introduction/hash pinning and broader font/CID validation remain open |
| Schema/example validation | Landed: schemas, examples, deterministic profile, referential integrity, and bbox sanity pass the `jsonschema` validation gate | Contract changes still require explicit versioning and compatibility review |
| Trust-layer implementation | Landed: `ethos verify` quote/value/presence/table-cell checks, explicit quote-containment labeling, normalized equality for value/table-cell checks, stale and unverifiable fingerprint handling, unsupported claim reporting, structured capability limits, native Ethos JSON path, ODL-style adapter path with synthetic table/cell mapping, explicit real ODL-style row/cell table grounding, conservative real-style text/child-container alias normalization, pinned real OpenDataLoader 2.4.7 grounded/ungrounded fixtures, foreign fixture manifest hash validation, crop-ref evidence plumbing, stable logical native crop refs, native crop descriptor artifacts, raw BGRA crop rendering in `ethos-pdf`, CLI PNG crop artifact production for bound native source PDFs, same-host rendered crop repeatability check, rendered-crop run comparison helper, strict citation/config input validation, citation input schema, split-quote fixture coverage, explicit unsupported non-v1 claim reporting, OpenDataLoader-style structure diagnostics for malformed bbox and unknown-page references, verify-alpha case inventory checks, and demo fixtures | Still needed: additional adapter hardening against broader real output shapes, future claim-kind expansion outside the current v1 alpha policy, and a decision on whether cross-platform rendered crop artifact equality is worth pursuing after the current macOS/Linux bbox drift finding |
| WS-HARNESS readiness | Partially landed: readiness path is green for frozen corpus/hardware and pinned competitors, Gate Zero evidence preflight validates the current `ethos-bench` handoff, and gates fail closed if those records regress | Public-safe comparison report flow, release/package approval, claim-wording approval, and future evidence-refresh workflow still need hardening |
Expand Down
Loading
Loading