Current process. Certification now runs inside this repository:
scripts/certify_data_release.pyderives the vendored country manifest directly from a country's data release manifest (see the data certification engineering skill). The intermediatepolicyengine-bundlesrepository flow is retired; its published bundles remain the historical record of earlier certifications. The architecture below — country*-datarepos publish immutable manifests,policyengine.pycertifies — is unchanged.
This document defines the intended reproducibility boundary for policyengine.py.
The key design decision is:
- country
*-datarepos build and stage immutable data artifacts policyengine.pyis the only component that certifies supported runtime bundlespolicyengine.pydoes not rebuild country data itself
This keeps country-specific data construction in the country data repos while still giving users a single top-level version to cite and pin.
For countries like the UK, the data package is not model-independent. Dataset construction, imputations, and calibration steps call the country model directly. That means a published dataset artifact depends on:
- the country model version used during data construction
- the calibration targets used during data construction
- the raw input data used during data construction
If policyengine.py only pins a country model version and a data package version without checking that relationship, the provenance boundary is incomplete.
Examples: policyengine-uk, policyengine-us
The country model package owns:
- policy logic
- variables and parameters
- reforms
- a
data_build_fingerprintfor the subset of model logic that affects data construction
It does not own final runtime bundle certification.
Examples: policyengine-uk-data, policyengine-us-data
The country data package owns:
- data build pipelines
- raw input acquisition
- calibration target snapshots
- expensive dataset construction
- staging immutable build artifacts with provenance
It does not define the final supported runtime bundle exposed to users.
policyengine.py owns:
- runtime bundle certification
- user-facing reproducibility boundaries
- the supported mapping from
policyengine.pyversion to country model version and certified data artifact
It does not rebuild microdata artifacts.
Certification runs in this repository: the vendored country release
manifest under src/policyengine/data/release_manifests/ is derived
directly from the country's published data release manifest. The
entrypoint is:
python scripts/certify_data_release.py --country us \
--manifest-uri "hf://dataset/policyengine/populace-us@<tag>/releases/<tag>/release_manifest.json"Earlier releases (policyengine 4.15.x–4.16.x) were certified through the
PolicyEngine/policyengine-bundles archive flow; those bundles remain the
historical record of their certifications.
The architecture has two manifest layers with different responsibilities.
Published by the country *-data repo.
This answers:
- what bytes were produced
- how they were produced
- which exact model and targets produced them
Suggested schema:
{
"schema_version": 1,
"country_id": "uk",
"data_package": {
"name": "policyengine-uk-data",
"version": "1.41.0"
},
"build": {
"build_id": "uk-data-2026-04-12T12-30-00Z",
"git_sha": "abc123",
"built_at": "2026-04-12T12:30:00Z",
"built_with_model_package": {
"name": "policyengine-uk",
"version": "2.81.0",
"git_sha": "def456",
"data_build_fingerprint": "sha256:..."
},
"calibration_targets": {
"snapshot_id": "2026-04-10",
"sha256": "sha256:..."
},
"raw_inputs": [
{
"name": "frs_2023_24",
"sha256": "sha256:..."
}
],
"build_environment": {
"python_version": "3.13.3",
"lockfile_sha256": "sha256:..."
}
},
"default_datasets": {
"national": "enhanced_frs_2023_24",
"baseline": "frs_2023_24"
},
"artifacts": {
"enhanced_frs_2023_24": {
"kind": "microdata",
"repo_id": "policyengine/policyengine-uk-data-private",
"path": "builds/uk-data-2026-04-12T12-30-00Z/enhanced_frs_2023_24.h5",
"revision": "uk-data-2026-04-12T12-30-00Z",
"sha256": "sha256:...",
"size_bytes": 123456789
}
}
}Notes:
build_idmust be immutable.- build artifacts should be staged under a build-specific path or revision, not a floating release tag.
- the build manifest is the authoritative provenance record for the artifact bytes.
Published by policyengine.py.
This answers:
- which model and data artifact are supported together at runtime
- which exact dataset should be used by default
- which artifact checksum and provenance should be surfaced to users
Suggested schema:
{
"schema_version": 1,
"policyengine_version": "3.5.0",
"bundle_id": "uk-3.5.0",
"published_at": "2026-04-12T13:00:00Z",
"country_id": "uk",
"model_package": {
"name": "policyengine-uk",
"version": "2.81.1"
},
"certified_data_artifact": {
"data_package": {
"name": "policyengine-uk-data",
"version": "1.41.0"
},
"build_id": "uk-data-2026-04-12T12-30-00Z",
"dataset": "enhanced_frs_2023_24",
"uri": "hf://policyengine/policyengine-uk-data-private/builds/uk-data-2026-04-12T12-30-00Z/enhanced_frs_2023_24.h5@uk-data-2026-04-12T12-30-00Z",
"sha256": "sha256:..."
},
"certification": {
"compatibility_basis": "matching_data_build_fingerprint",
"built_with_model_version": "2.81.0",
"certified_for_model_version": "2.81.1",
"data_build_fingerprint": "sha256:...",
"certified_by": "policyengine.py release workflow"
},
"default_dataset": "enhanced_frs_2023_24",
"region_artifacts": {
"national": {
"dataset": "enhanced_frs_2023_24"
}
}
}Notes:
- this is the user-facing reproducibility boundary
- apps and APIs should surface this bundle, not only country package versions
- a bundle may reuse a previously staged data artifact if compatibility is explicitly certified
The internal build manifest and certified runtime bundle remain the operational source of truth.
TRACE sits on top of those manifests as a standards-based export layer.
policyengine.py emits a certified-bundle TRO for each supported country. The
composition pins four artifacts by sha256:
- the bundled country release manifest shipped in
policyengine.py - the country data release manifest resolved for the certified data package version
- the certified dataset artifact
- the country model wheel published to PyPI (hash read from the bundled manifest when present, otherwise fetched from the PyPI JSON API at emit time)
TROs use the public TROv vocabulary at
https://w3id.org/trace/trov/0.1#. Every artifact location in the TRO
is a dereferenceable HTTPS URI or a local path relative to the shipped
wheel. Certification metadata is carried as structured pe:* fields on
the trov:TransparentResearchPerformance node so downstream tooling can
read pe:certifiedForModelVersion, pe:compatibilityBasis,
pe:builtWithModelVersion, pe:dataBuildFingerprint, and pe:dataBuildId
without parsing prose. Every TRO also carries pe:emittedIn set to
"github-actions" or "local"; CI-emitted TROs additionally carry
pe:ciRunUrl and pe:ciGitSha.
Country *-data repos should also emit a matching trace.tro.jsonld per
data release covering the release manifest and every staged artifact hash.
That is a country-data concern and lives in those repos.
From Python:
from policyengine.provenance.manifest import get_data_release_manifest, get_release_manifest
from policyengine.provenance.trace import build_trace_tro_from_release_bundle, serialize_trace_tro
country = get_release_manifest("us")
tro = build_trace_tro_from_release_bundle(country, get_data_release_manifest("us"))
Path("us.trace.tro.jsonld").write_bytes(serialize_trace_tro(tro))From the CLI:
policyengine trace-tro us --out us.trace.tro.jsonld
At release time, scripts/generate_trace_tros.py regenerates the bundled
data/release_manifests/{country}.trace.tro.jsonld files, and the
Versioning CI job commits them alongside the changelog so every published
wheel ships with the matching TRO.
from policyengine.results import write_results_with_trace_tro
write_results_with_trace_tro(
results, # ResultsJson instance
"results.json", # where to write results
bundle_tro=bundle_tro, # loaded from the shipped bundle
reform_payload={"salt_cap": 0},
bundle_tro_url=(
"https://raw.githubusercontent.com/PolicyEngine/policyengine.py/"
"v3.4.5/src/policyengine/data/release_manifests/us.trace.tro.jsonld"
),
)The bundle_tro_url is recorded on the performance node as
pe:bundleTroUrl. A verifier can fetch that URL, recompute its sha256,
and confirm it matches the bundle_tro artifact hash in the simulation
TRO's composition. Without this anchor, the bundle reference is only as
trustworthy as whoever produced the JSON.
Structural validation:
policyengine trace-tro-validate path/to/tro.jsonld
The shipped schema at policyengine/data/schemas/trace_tro.schema.json
checks structural fields, canonical hex-encoded sha256s, the required
pe:emittedIn, and that trov:hasLocation uses HTTPS (or the
well-known local paths results.json, reform.json,
bundle.trace.tro.jsonld). The same schema is exercised in the test
suite against generated TROs.
Content validation (the verifier workflow a replication reviewer should run):
import hashlib, json, requests
from policyengine.provenance.trace import canonical_json_bytes
sim_tro = json.load(open("results.trace.tro.jsonld"))
perf = sim_tro["@graph"][0]["trov:hasPerformance"]
# 1. Fetch the bundle TRO from its pinned URL and recompute its hash.
bundle_bytes = requests.get(perf["pe:bundleTroUrl"]).content
bundle_hash = hashlib.sha256(canonical_json_bytes(json.loads(bundle_bytes))).hexdigest()
# 2. Compare against the hash recorded in the simulation TRO's composition.
recorded = next(
a["trov:sha256"]
for a in sim_tro["@graph"][0]["trov:hasComposition"]["trov:hasArtifact"]
if a["@id"].endswith("bundle_tro")
)
assert bundle_hash == recorded, "bundle_tro_url content does not match sim TRO"
# 3. Confirm the fingerprint recorded on the performance matches the
# fingerprint inside the fetched bundle.
bundle = json.loads(bundle_bytes)
bundle_fp = bundle["@graph"][0]["trov:hasComposition"]["trov:hasFingerprint"]["trov:sha256"]
assert perf["pe:bundleFingerprint"] == bundle_fpA sim TRO with a swapped bundle_tro dict but a truthful
pe:bundleTroUrl will fail step 2; a sim TRO with both swapped will
fail step 3.
- TROs are emitted unsigned. A signed attestation (sigstore or in-toto) is a future addition that will bind TROs to a trusted-system key.
- The bundle composition does not yet pin a transitive lockfile
(
uv.lock/poetry.lock), a Python interpreter version, or an OS. AEA reviewers may demand these; the schema is extensible. - The model wheel is hashed by PyPI's published sha256. If a wheel is yanked and re-uploaded under the same version, the hash will change and the TRO becomes invalid — which is the correct behaviour.
- Country data packages whose data release manifest is private require
HUGGING_FACE_TOKENat emit time. The regeneration script skips countries whose data release manifest is unreachable so a partial run does not block other countries.
TRACE is not the source of truth for compatibility policy.
In particular, TRACE does not decide:
- whether a new model version can safely reuse an existing data artifact
- how
data_build_fingerprintis computed - which staged artifact becomes a supported runtime default
Those decisions still belong to the country data build manifest and the
policyengine.py certified runtime bundle.
TRACE adds three things our internal manifests do not provide by themselves:
- a standard declaration format for provenance exchange
- a composition fingerprint over the exact artifacts in scope
- a better external surface for papers, audits, and reproducibility reviews
That is why the recommended design is:
- internal manifests for build/certification control
- generated TRACE TROs for standards-based export
The architecture should avoid forcing a new data build for every harmless country model release.
To do that safely, compatibility must be explicit.
Each country model package should expose a data_build_fingerprint that covers the subset of logic that affects dataset construction or calibration.
Examples of inputs to the fingerprint:
- variables used in imputations
- variables used in calibration loss matrices
- parameters referenced during data construction
- uprating or target-computation logic used during the build
Things that should usually not affect the fingerprint:
- runtime-only outputs that are not used in data construction
- UI-oriented metadata
- code paths unrelated to data construction
policyengine.py may certify a staged data artifact for a model version only if one of the following is true:
- the model version exactly matches the
built_with_model_package.version(compatibility_basis: built_with_model_package) - the data publisher's
compatible_model_packagescovers the model version (compatibility_basis: compatible_model_packages— the publisher's claim, recorded with a warning and made good by the test suite passing on the pinned pair)
If neither is true, certification fails hard and a new data build or a
published compatibility claim is required. A data_build_fingerprint
basis (same fingerprint as the build-time model) is a planned third
basis; it requires fingerprint metadata from the installed model package.
Artifacts should move through explicit states:
staged: built by the country data repo and available for inspection or later certificationcertified: referenced by a releasedpolicyengine.pyruntime bundledeprecated: no longer recommended for new use, but still reproducible
The key point is that staged and certified are different states. A staged artifact is not automatically part of a supported runtime release.
- Cut UK model release candidate
M. - Compute
data_build_fingerprint(M). - Compare it to the fingerprint recorded in the previously certified data build manifest.
- If the fingerprint matches, skip the expensive UK data rebuild.
- Release
policyengine.pywith a new certified runtime bundle that points to the existing staged UK artifact.
- Cut UK model release candidate
M. - Compute
data_build_fingerprint(M). - If the fingerprint changed, build a new UK data artifact in
policyengine-uk-dataagainst:- exact
policyengine-uk==M - exact target snapshot
- exact raw input hashes
- exact
- Stage the new artifact under a build-specific immutable path or revision.
- Publish the UK data build manifest.
- Release
policyengine.pywith a certified runtime bundle that points to the new staged artifact.
The current release_manifest.json mechanism in country data repos is a good starting point, but it is not yet enough on its own. The target implementation should add:
built_with_model_package.versionbuilt_with_model_package.git_shabuilt_with_model_package.data_build_fingerprint- calibration target snapshot metadata
- immutable staged artifact paths or revisions
The target implementation in policyengine.py should add:
- hard validation of bundle certification rules
- explicit runtime bundle metadata on simulations, APIs, and app responses
- checksum-backed dataset resolution from the certified bundle manifest
Because that would centralise the wrong concerns:
- country-specific private data handling would move into the generic orchestration layer
- country-specific build logic would move into the generic orchestration layer
- expensive build failures would block the top-level runtime package more often
- provenance would still originate in the country data pipeline, so
policyengine.pywould not actually eliminate the need for the country build manifest
policyengine.py should be the certification boundary, not the country data build system.