Build provenance and artifact handoff layer
(ONNX model -> TensorRT/RKNN artifacts · metadata · manifest · Runtime/Lab handoff)
Language: English | 한국어
GitHub description: Build provenance and handoff layer for converting ONNX models into edge deployment artifacts.
- Build/provenance layer for the InferEdge validation pipeline
- Converts ONNX models into TensorRT/RKNN-oriented edge artifacts
- Records source model hash, artifact hash, preset, target, precision, and shape metadata
- Produces Runtime/Lab handoff records for downstream validation
- Supplies evidence for deployment review; InferEdgeLab owns the final decision
InferEdgeForge is not just a model conversion script.
It is a reproducible artifact provenance layer that:
- preserves build intent as structured metadata
- keeps model artifacts tied to their source fingerprints
- makes benchmark and compare handoff traceable
- helps reviewers understand whether an edge artifact is rebuildable and valid
InferEdgeForge is the build/provenance layer of the larger InferEdge validation pipeline:
ONNX model
-> InferEdgeForge build
-> metadata / manifest / worker runtime summary
-> InferEdgeRuntime validation / result export
-> InferEdgeLab compare / API / job workflow / deployment_decision
-> optional InferEdgeAIGuard provenance diagnosis
-> deploy / review / blocked decision
In that pipeline, Forge is responsible for converting an ONNX model into edge deployment artifacts such as TensorRT engines or RKNN artifacts, while preserving the provenance needed by Runtime, Lab, and AIGuard.
Implemented today:
metadata.jsonandmanifest.jsonrecord build identity, source model hash, artifact hash, preset snapshot, backend, target, precision, and shape contextworker/runtime summaryprojects build provenance into fields that Lab worker requests and Runtime invocation boundaries can consume- compare and runtime handoff metadata keeps artifact lineage connected to downstream validation
Planned later:
- automatic SaaS worker execution from Lab jobs
- production worker/queue integration
- broader production deployment controls around artifact promotion
Forge does not decide whether a model should be deployed. InferEdgeLab owns the final deployment_decision; Forge supplies the reproducible build evidence that makes that decision reviewable.
Most edge inference workflows can generate artifacts, but cannot explain or reproduce them reliably.
Moving from an ONNX model to an actual edge deployment artifact is usually messy.
- Build intent often lives in shell history, ad hoc notes, or one-off scripts.
- Artifacts can be regenerated, but it is unclear whether they came from the same recipe.
- Benchmark outputs are hard to trust when they are disconnected from the build context that produced them.
- Teams may keep the artifact, but lose the preset, source fingerprint, output lineage, and handoff state needed to review it later.
In practice, that means ONNX to edge deployment is often treated like a conversion step instead of an experiment system. Once multiple variants exist, traceability breaks down quickly.
A typical flow looks simple:
- Export an ONNX model.
- Run a backend-specific build command.
- Save the artifact somewhere.
- Benchmark it later, often in a separate tool.
The problem is not that these steps are impossible. The problem is that they are weakly connected.
- Reproducibility is fragile because the exact preset/build intent may not be preserved.
- Traceability is fragile because artifact lineage, source hashes, and output context can be lost.
- Benchmark interpretation is fragile because results may not be tied back to a specific build state.
- Comparison readiness is fragile because multiple variants may exist, but there is no structured way to know which ones are ready for compare.
InferEdgeForge builds a reproducible inference experiment system for edge deployment.
That system has a narrow but important responsibility:
- generate deployment artifacts from ONNX using named presets
- preserve build intent and artifact lineage as structured records
- prepare downstream Runtime/Lab handoff context for validation
- keep build, benchmark trace, and experiment state connected enough to review later
The output is not only an artifact. It is an artifact plus the context required to reason about that artifact as part of a deployment experiment.
InferEdgeForge currently provides these implemented capabilities:
preset-based build abstraction across backends and targets- structured
metadata.jsonandmanifest.jsonoutputs - source model and artifact SHA-256 fingerprints
- preset snapshots persisted with build records
- InferEdgeLab handoff generation through profile input and command views
run-benchmarkhandoff helper for invoking downstream validation commands from stored metadata- persisted
run_summary.jsonfor downstream traceability - experiment-level build listing across preset variants
- compare-ready candidate discovery
- compare command preview from persisted structured result paths
- manifest sanity validation before Runtime/Lab handoff
- downstream accuracy evidence attachment support via InferEdgeLab enrich flows
rebuild-from-manifestsupport, with Jetson rebuild validation recorded as functionally reproducible
This project is not documented only as a design idea. It has a recorded Jetson validation pass in docs/jetson_validation.md.
Verified on Jetson Orin-class hardware for the baseline COCO YOLOv8n TensorRT flow:
- TensorRT FP16 build succeeded for YOLOv8n
- TensorRT FP32 build succeeded for YOLOv8n
- both variants were benchmarked successfully
- compare handoff was executed and recorded
- the compare result was documented as latency-only
- an accuracy-aware workflow smoke test was also recorded
rebuild-from-manifestregenerated a runnable and benchmarkable TensorRT engine in a separate output root
What is important here is the level of validation:
- the artifact build path was exercised
- the benchmark handoff path was exercised
- the compare-ready workflow was exercised
- the rebuild path was exercised
Downstream Runtime/Lab validation has also recorded Jetson Evidence Track results for the Forge-generated TensorRT FP16 artifact:
- TensorRT FP16 25W: mean
10.066401 ms, p9515.476641 ms, p9915.548438 ms, FPS99.340373 - TensorRT FP16 15W: mean
10.799106 ms, p9515.438690 ms, p9915.529218 ms, FPS92.600262
Forge remains the build/provenance layer. Runtime owns these execution measurements, and Lab owns the comparison/deployment decision interpretation.
The repository also records an official accuracy-aware TensorRT validation for a Haeundae custom YOLOv8n model:
- source model:
models/onnx/yolov8n_haeundae.onnx - source SHA-256:
c99a5563c0c00859d39e2d2c4afc5de7646b96a320ba7e9493d8cc367427d9a5 - validation dataset: 1,657 detection samples, 1 class, RGB 640x640
- TensorRT FP16 and FP32 engines were built, fingerprinted, benchmarked, and handed off for downstream InferEdgeLab validation
- matching detection accuracy payloads were attached to the FP16 and FP32 latency results
- enriched compare output judged FP32 as
tradeoff_slower, with latency regressions and neutral accuracy
In that Haeundae validation context, FP16 is the selected TensorRT precision. FP32 increased mean latency from 8.8819 ms to 10.2869 ms and P99 latency from 13.7437 ms to 18.1921 ms, while accuracy deltas were effectively neutral (map50 +0.04pp, map50_95 +0.01pp, f1_score +0.02pp).
What is equally important is what is not being claimed:
- this is not a statement that all TensorRT environments are production-ready
- this is not a statement that the baseline COCO YOLOv8n Jetson compare is accuracy-aware
- this is not a statement of bitwise TensorRT artifact reproducibility
- this is not a claim that the Haeundae FP16 selection generalizes beyond the recorded Jetson Orin environment, Haeundae validation dataset, and custom YOLOv8n model
The baseline COCO YOLOv8n Jetson compare is intentionally scoped as a latency-oriented FP16 vs FP32 result. Accuracy-aware compare wiring has been smoke-tested with existing InferEdgeLab payload examples, but task-matched TensorRT accuracy evidence has not been attached to that baseline COCO FP16/FP32 result.
The Haeundae custom model result is recorded separately as the accuracy-aware validation record. The earlier COCO latency-only result and the RKNN payload smoke test are not mixed into that decision.
InferEdgeForge and InferEdgeLab are paired, but they do different jobs.
InferEdgeForge
- builds deployment artifacts
- records build identity and preset intent
- preserves source and artifact fingerprints
- prepares runtime, benchmark, and compare handoff state
InferEdgeLab
- analyzes Runtime result JSON
- compares structured results
- attaches downstream accuracy evidence when available
- interprets trade-offs during validation
This separation matters. Build generation and deployment evaluation are related, but they should not be collapsed into one opaque tool.
Forge intentionally does not ship the inferedgelab CLI. Commands such as evaluate-detection, enrich-pair, and compare belong to the InferEdgeLab repository/package; Forge only previews or invokes those downstream handoff commands when Lab is available in the environment.
The current workflow is intentionally build-centered and traceable.
python -m inferedgeforge.cli build \
--model models/test.onnx \
--preset tensorrt/jetson_fp16 \
--output builds \
--dry-run
python -m inferedgeforge.cli build \
--model models/test.onnx \
--preset tensorrt/jetson_fp16 \
--output builds
python -m inferedgeforge.cli inspect-build --summary \
builds/test__jetson__tensorrt__jetson_fp16/metadata.json
python -m inferedgeforge.cli run-benchmark \
builds/test__jetson__tensorrt__jetson_fp16/metadata.json
python -m inferedgeforge.cli list-builds --dir builds
python -m inferedgeforge.cli show-compare-candidates \
--dir builds \
--model models/test.onnx
python -m inferedgeforge.cli show-compare-command \
--dir builds \
--model models/test.onnx \
--left tensorrt/jetson_fp16 \
--right tensorrt/jetson_fp32
python -m inferedgeforge.cli validate-manifest \
--build-dir builds/test__jetson__tensorrt__jetson_fp16Representative outputs produced by the system:
metadata.jsonmanifest.json- deployment artifact such as
model.engineormodel.rknn run_summary.json- compare-ready discovery views across builds
Each build can leave a reviewable record instead of only a binary artifact.
metadata.json: build identity, source model context, preset snapshot, handoff mappingmanifest.json: reproducibility-oriented snapshot of the build recipe and artifact context- worker/runtime summary: compact metadata/manifest projection for Lab worker requests and Runtime invocation planning
- artifact SHA-256: fingerprint of the produced deployment artifact
- source SHA-256: fingerprint of the ONNX input
run_summary.json: persisted downstream execution trace afterrun-benchmarkvalidate-manifest: build-free sanity check for required Runtime/Lab handoff fields
That is the core difference between this project and a plain conversion wrapper. The system preserves enough state to support later inspection, rebuild, and comparison.
This project is meant to demonstrate more than CLI implementation.
This project is not about running models faster. It is about making inference experiments inspectable and reproducible, so that deployment decisions can be justified rather than guessed.
It demonstrates the design of a system that supports:
- experiment traceability: multiple preset variants can be grouped and reviewed as one experiment surface
- reproducibility: build intent, manifests, source fingerprints, and rebuild flows are preserved as explicit records
- deployment decision support: compare-ready handoff and benchmark traces help downstream analysis happen with context intact
That is why this project should be read as an experiment workflow system rather than just a model build utility.
The current system is intentionally honest about its boundaries.
- TensorRT engine hashes are not guaranteed to be bitwise stable across rebuilds
- Jetson rebuild validation currently supports functional reproducibility, not bitwise identity
- baseline COCO YOLOv8n Jetson FP16 vs FP32 compare is still documented as latency-only
- Haeundae custom YOLOv8n accuracy-aware validation is scoped to its recorded model, dataset, thresholds, and Jetson environment
- accuracy evidence depends on external evaluation results or downstream InferEdgeLab enrich flows
- some environment details in the Jetson validation record remain
TBD - backend toolchains remain environment-dependent
- broader device coverage is still open work
- docs/quickstart.md: practical end-to-end quickstart
- docs/handoff.md: Forge to Lab handoff contract
- docs/runtime_handoff_contract.md: Forge to Runtime artifact and manifest handoff contract
- docs/jetson_validation.md: recorded Jetson validation evidence
- examples/README.md: examples index
- Roadmap.md: implementation status and next steps
InferEdgeForge already has:
- preset-based build orchestration
- structured metadata and manifest output
- SHA-based traceability for source models and artifacts
- Jetson TensorRT real engine generation via
trtexec - benchmark handoff and persisted execution summaries
- compare-ready experiment views
- documented Jetson FP16/FP32 validation evidence
- documented Haeundae custom YOLOv8n TensorRT accuracy-aware validation evidence
- documented manifest-based rebuild validation
It should still be read as a focused and validated foundation, not as a claim that every backend path or deployment workflow is complete.