Skip to content

gwonxhj/InferEdge-Runtime

Repository files navigation

InferEdge-Runtime

C++ runtime execution and result export layer
(ONNX Runtime · TensorRT Jetson · latency statistics · Lab-compatible JSON)

Language: English | 한국어

CI

GitHub description: C++ runtime execution and result export layer for ONNX Runtime/TensorRT edge inference validation.

Summary

  • C++ execution layer for the InferEdge validation pipeline
  • Runs ONNX Runtime CPU and Jetson TensorRT benchmark paths
  • Measures latency statistics and FPS from real Runtime executions
  • Exports Lab-compatible result JSON for compare/report/deployment decision flows
  • Preserves Forge manifest source model identity when running built artifacts

What Makes InferEdge-Runtime Different?

InferEdge-Runtime is not a benchmark wrapper.

It is an execution evidence layer that:

  • validates or runs model/artifact inputs at the Runtime boundary
  • records latency, FPS, system, and provenance context
  • exports structured evidence that Lab can compare and review
  • keeps runtime execution separate from Lab's deployment decision policy

Project Overview

InferEdgeRuntime is a C++ Edge AI runtime for on-device inference and benchmarking.

It is the Runtime stage of the InferEdge portfolio pipeline. InferEdgeForge prepares model artifacts, InferEdgeRuntime runs and benchmarks those artifacts on target devices, and InferEdgeLab analyzes the exported result JSON files.

Release Status

InferEdgeRuntime v0.1.0 is a validated MVP release.

  • ONNX Runtime CPU backend: fully functional
  • Benchmark + JSON export: stable
  • Forge/Lab pipeline: integrated through manifest/result JSON and worker boundary contracts
  • TensorRT backend: benchmark execution on Jetson linked builds

InferEdge Pipeline Position

InferEdgeRuntime is the C++ execution/result export layer of the larger InferEdge validation pipeline:

ONNX model
-> InferEdgeForge build
-> metadata / manifest / worker runtime summary
-> InferEdgeRuntime validation / result export
-> InferEdgeLab compare / API / job workflow / deployment_decision
-> optional InferEdgeAIGuard provenance diagnosis
-> deploy / review / blocked decision

In that pipeline, Runtime is responsible for the execution boundary: it validates or runs model/artifact inputs, measures latency, exports Lab-compatible result JSON, and can emit dry-run worker response payloads for Lab integration smoke tests.

Implemented today:

  • ONNX Runtime C++ MVP path and benchmark/result JSON export
  • Jetson TensorRT linked-build benchmark/result JSON export
  • Lab-compatible result fields for compare/report/deployment decision flows
  • Forge metadata/manifest handoff validation
  • manifest source model identity preservation for compare-ready TensorRT engine results
  • Lab worker_request dry-run validation
  • Lab worker completed/failed response dry-run export

Planned later:

  • full worker daemon integration
  • real Lab-triggered Forge/Runtime execution
  • production queue or job runner infrastructure
  • production hardening beyond the current manual/dev linked-build validation path

Runtime does not own comparison policy or final deployment judgement. InferEdgeLab owns deployment_decision, while Runtime supplies trustworthy execution and profiling evidence.

Current Capabilities

  • C++17 + CMake build
  • CLI option validation
  • ONNX Runtime external link configuration
  • ONNX model metadata loading
  • float32 dummy input generation
  • optional OpenCV-based real image input preprocessing
  • ONNX Runtime CPU inference benchmark
  • latency mean/min/max/std/p50/p90/p95/p99
  • FPS calculation
  • JSON result export
  • Lab-compatible top-level fields
  • automatic result naming and results/latest.json handoff
  • limited manifest default apply for Forge handoff preparation
  • Lab worker adapter contract fixture/test coverage
  • Lab worker response dry-run export for contract smoke testing
  • TensorRT backend stub for default/non-linked builds
  • TensorRT engine deserialization and metadata extraction on Jetson linked builds
  • TensorRT one-shot dummy inference on Jetson linked builds
  • TensorRT benchmark runner on Jetson linked builds
  • Jetson Evidence Track fields for power mode, jetson_clocks, tegrastats summary, and Lab-compatible result import
  • documented benchmark measurement policy

Current Limitations

  • default macOS build uses ONNX Runtime/stub paths unless optional backends are explicitly linked
  • float32 input only
  • real image preprocessing requires INFEREDGE_ENABLE_OPENCV=ON
  • no TensorRT output post-processing yet
  • float32 TensorRT buffers only at current stage
  • no multi-input advanced dynamic shape support yet
  • OpenCV and CUDA are not linked in the default build
  • manifest parsing is limited to the sample Forge handoff schema
  • no full general-purpose JSON parser yet
  • contract tests and smoke tests cover the current handoff/result schemas; broader backend integration tests remain future work
  • GitHub Actions currently runs default smoke test only
  • ORT linked smoke test remains local/manual because it requires external ONNX Runtime and model files

TensorRT backend execution is implemented for Jetson-oriented linked builds. The current Mac/default build keeps TensorRT as a stub and does not link TensorRT or CUDA. See docs/tensorrt_backend_plan.md for the Jetson Orin Nano implementation plan.

Jetson Evidence Track

The current Jetson Evidence Track has been validated on Jetson Orin Nano through a TensorRT FP16 linked build and Lab-compatible Runtime JSON export. These records are deployment validation evidence, not a production inference server or a trtexec GPU-only benchmark.

Evidence Backend Precision Power Mode Mean ms P95 ms P99 ms FPS
TensorRT short smoke tensorrt__jetson FP16 25W 10.066401 15.476641 15.548438 99.340373
TensorRT power-mode evidence tensorrt__jetson FP16 15W 10.799106 15.438690 15.529218 92.600262

The 15W and 25W outputs include tegrastats-derived context and should be interpreted as different run configurations. InferEdgeLab owns comparison and deployment decision interpretation.

Requirements

  • CMake 3.16+
  • C++17 compiler
  • Optional: ONNX Runtime C/C++ package
  • Optional: OpenCV for --input <image_path> real image preprocessing
  • Apple Silicon users should use the osx-arm64 ONNX Runtime package
  • Optional for Jetson TensorRT link validation:
    • Jetson Orin Nano
    • TensorRT 10.x
    • CUDA runtime
    • NvInfer.h
    • libnvinfer.so
    • libcudart.so

Smoke Test Scripts

Use the smoke scripts before opening a PR or after changing runtime behavior.

CI runs the default smoke test on every push to main and every pull request. The workflow validates build success, CLI execution, and JSON export without external ONNX Runtime dependencies.

Default smoke test:

scripts/smoke_default.sh

This builds the dependency-free target, runs help/version checks, writes results/smoke_default.json, validates the JSON, and confirms the benchmark status is skipped.

ONNX Runtime linked smoke test:

scripts/smoke_ort.sh "$HOME/onnxruntime/onnxruntime-osx-arm64-1.25.0" /path/to/model.onnx

This requires a local ONNX Runtime package and a local ONNX model file outside the repository. If macOS blocks the downloaded ONNX Runtime .dylib, use the xattr command in the macOS quarantine note below.

Quickstart: Default Build

The default build does not require ONNX Runtime. It still writes a JSON result, but the benchmark is marked as skipped.

cmake -S . -B build
cmake --build build
./build/inferedge-runtime --model models/sample.onnx --engine onnxruntime --device cpu --batch 1 --height 224 --width 224 --warmup 1 --runs 1 --output results/default_skipped.json

Expected behavior:

  • build succeeds without external runtime dependencies
  • backend availability is false
  • benchmark status is skipped
  • results/default_skipped.json is created

Quickstart: ONNX Runtime Linked Build On Apple Silicon

Keep the ONNX Runtime package outside this repository. Do not vendor ONNX Runtime headers, libraries, or model files into this repo.

Example package location:

~/onnxruntime/onnxruntime-osx-arm64-1.25.0

Build with ONNX Runtime enabled:

cmake -S . -B build-ort -DINFEREDGE_ENABLE_ORT=ON -DINFEREDGE_ORT_ROOT=$HOME/onnxruntime/onnxruntime-osx-arm64-1.25.0
cmake --build build-ort

Build with ONNX Runtime and OpenCV real image input enabled:

cmake -S . -B build-ort-opencv -DINFEREDGE_ENABLE_ORT=ON -DINFEREDGE_ORT_ROOT=$HOME/onnxruntime/onnxruntime-osx-arm64-1.25.0 -DINFEREDGE_ENABLE_OPENCV=ON
cmake --build build-ort-opencv

Run a benchmark with a local ONNX model:

./build-ort/inferedge-runtime --model /path/to/model.onnx --engine onnxruntime --device cpu --batch 1 --height 224 --width 224 --warmup 3 --runs 10 --output results/ort_cpu.json

Record a Forge/build manifest path:

./build-ort/inferedge-runtime --manifest /path/to/manifest.json --model /path/to/model.onnx --engine onnxruntime --device cpu --batch 1 --height 224 --width 224 --warmup 3 --runs 10 --output auto

Auto-named output:

./build-ort/inferedge-runtime --model /path/to/model.onnx --engine onnxruntime --device cpu --batch 1 --height 224 --width 224 --warmup 3 --runs 10 --output auto

Expected behavior:

  • backend availability is true
  • model input/output metadata is printed
  • warmup iterations run before timed runs
  • latency and FPS are printed
  • results/ort_cpu.json is created
  • --output auto writes a structured filename under results/
  • every run also writes results/latest.json

macOS Quarantine Note

Downloaded ONNX Runtime .dylib files can be blocked by macOS quarantine policy. If the linked binary fails to load the ONNX Runtime library, remove the quarantine attribute from the external ONNX Runtime package:

xattr -dr com.apple.quarantine ~/onnxruntime/onnxruntime-osx-arm64-1.25.0

Usage

./build/inferedge-runtime --help
./build/inferedge-runtime --version
./build/inferedge-runtime --model models/sample.onnx --engine onnxruntime --device cpu --batch 1 --height 224 --width 224 --warmup 5 --runs 50 --output results/sample.json
./build-ort/inferedge-runtime --manifest /path/to/manifest.json --model /path/to/model.onnx --engine onnxruntime --device cpu --batch 1 --height 224 --width 224 --warmup 3 --runs 10 --output auto

CLI notes:

  • --manifest loads limited defaults from the Forge/build manifest schema.
  • Lab worker adapter planning is documented in docs/lab_worker_adapter_contract.md.
  • --lab-worker-request <path> --validate-lab-worker-request validates Lab worker request JSON and exits without inference.
  • Forge summary-origin worker request compatibility is covered by tests/fixtures/forge_summary_worker_request.json.
  • --lab-worker-request <path> --export-worker-response <path> --worker-response-status completed|failed writes a Lab worker response contract payload without inference.
  • CLI-provided values always take priority over manifest defaults.
  • --batch, --height, and --width resolve dynamic dummy input dimensions.
  • --input uses a real image input instead of dummy zeros when OpenCV support is enabled.
  • Static model dimensions take precedence over CLI shape overrides.
  • --warmup controls untimed warmup iterations.
  • --runs controls timed iterations used for latency and FPS statistics.
  • --run-once runs one inference without benchmark timing.
  • --output writes the benchmark result JSON and creates missing output directories.

Real Inference Mode

--input <image_path> enables real image input mode. In this mode, Runtime loads the image with OpenCV, converts BGR to RGB, resizes it to the resolved model input size, normalizes values to 0.0..1.0, and writes a float32 NCHW tensor with shape [batch, 3, height, width].

The default build remains dependency-free. Real image input requires an OpenCV-enabled build:

cmake -S . -B build-ort-opencv -DINFEREDGE_ENABLE_ORT=ON -DINFEREDGE_ORT_ROOT=$HOME/onnxruntime/onnxruntime-osx-arm64-1.25.0 -DINFEREDGE_ENABLE_OPENCV=ON
cmake --build build-ort-opencv

ONNX Runtime example:

./build-ort-opencv/inferedge-runtime --model yolov8n.onnx --input test.jpg --engine onnxruntime --device cpu --batch 1 --height 640 --width 640 --run-once --output results/real_input_onnx.json

TensorRT linked builds use the same --input path to fill TensorRT input buffers when OpenCV support is enabled. If --input is provided without OpenCV support, Runtime fails with a clear configuration error instead of silently falling back to dummy input.

Runtime records input mode metadata under JSON extra:

  • input_mode: dummy or image
  • input_path: the provided image path, or an empty string for dummy mode
  • input_preprocess: opencv_bgr_to_rgb_resize_float32_nchw or dummy_zero_float32

TensorRT stub example:

./build/inferedge-runtime --model models/sample.engine --engine tensorrt --device jetson --batch 1 --height 640 --width 640 --warmup 1 --runs 1 --output results/tensorrt_stub.json

This command does not execute TensorRT. In the default build, the TensorRT stub reports available=false and creates a skipped benchmark JSON result.

Jetson TensorRT one-shot check build:

cmake -S . -B build-trt -DINFEREDGE_ENABLE_TENSORRT=ON
cmake --build build-trt
./build-trt/inferedge-runtime --model /home/risenano01/InferEdgeForge/builds/yolov8n__jetson__tensorrt__jetson_fp16/model.engine --engine tensorrt --device jetson --batch 1 --height 640 --width 640 --run-once --output results/tensorrt_run_once.json

When TensorRT and CUDA headers/libraries are found, the TensorRT backend reports available=true, deserializes the .engine file, records input/output metadata, allocates float32 dummy host/device buffers, and executes one inference through TensorRT. Expected metadata for the current Forge YOLOv8n TensorRT engine includes input images and output output0.

Jetson TensorRT benchmark:

./build-trt/inferedge-runtime \
  --model /home/risenano01/InferEdgeForge/builds/yolov8n__jetson__tensorrt__jetson_fp16/model.engine \
  --engine tensorrt \
  --device jetson \
  --power-mode 15W \
  --jetson-clocks on \
  --tegrastats-log results/tegrastats_yolov8n_trt_fp16_15w.log \
  --batch 1 \
  --height 640 \
  --width 640 \
  --warmup 10 \
  --runs 50 \
  --output results/tensorrt_benchmark.json

Expected benchmark behavior:

  • engine.available=true
  • status=success
  • mean_ms > 0
  • p99_ms > 0
  • fps_value > 0
  • model_metadata.inputs contains images
  • model_metadata.outputs contains output0

Jetson evidence fields are optional CLI inputs used to preserve validation context before importing results into InferEdgeLab:

  • --power-mode: records the Jetson power mode label, such as 15W, 25W, or MAXN
  • --jetson-clocks: records the observed jetson_clocks state, such as on, off, or unknown
  • --tegrastats-log: records and parses a tegrastats log into jetson_evidence.tegrastats_summary

These fields prepare the Runtime result for Jetson Evidence Track validation. They do not imply that every TensorRT/GPU benchmark or INT8 calibration path is complete.

Jetson evidence can also be exported as Markdown for portfolio/review handoff:

./build/inferedge-runtime \
  --report-jetson-evidence \
  --result-json tests/fixtures/jetson_tensorrt_25w_result.json \
  --tegrastats-log tests/fixtures/tegrastats_sample.log \
  --report-output reports/jetson_evidence_summary.md

./build/inferedge-runtime \
  --compare-power-modes \
  --base-result tests/fixtures/jetson_tensorrt_25w_result.json \
  --candidate-result tests/fixtures/jetson_tensorrt_15w_result.json \
  --report-output reports/jetson_power_mode_comparison.md

The Markdown reports summarize Runtime JSON and tegrastats evidence only. InferEdgeLab remains responsible for comparison policy and deployment decision interpretation.

Committed report snapshots:

Benchmark Interpretation

InferEdgeRuntime measures end-to-end inference latency. The reported latency_ms values include memory transfer and synchronization overhead in addition to backend execution.

Do not directly compare InferEdgeRuntime TensorRT latency with trtexec GPU latency. trtexec reports lower-level metrics such as GPU latency, Host latency, enqueue time, and H2D/D2H latency separately. InferEdgeRuntime currently reports a deployment-oriented wall-clock latency, so it is normal for Runtime latency to be larger than trtexec GPU latency.

This makes InferEdgeRuntime results more representative of the simple runtime path used for deployment and downstream InferEdgeLab comparison. See docs/benchmark_policy.md for the full measurement policy.

Output modes:

  • --output results/foo.json: writes to an explicit path.
  • --output auto: writes to an auto-generated filename under results/.

Auto filename rule:

{model}__{engine}__{device}__{precision}__b{batch}__h{height}w{width}__{timestamp}.json

Example:

toy224__onnxruntime__cpu__fp32__b1__h224w224__20260426T115825Z.json

Every run also writes the same JSON content to results/latest.json. This stable file is useful for quick handoff to InferEdgeLab or small scripts that only need the most recent result.

JSON Result Schema

Runtime JSON results include nested structured fields for detailed reporting and top-level compatibility fields for quick comparison. This output is the Runtime side of the Forge -> Runtime -> Lab contract documented by InferEdgeLab, and is intended to be consumed by Lab compare, report, and deployment decision flows.

Main nested fields:

  • schema_version
  • manifest_path
  • model
  • engine
  • device
  • run_config
  • latency_ms
  • fps
  • benchmark
  • timestamp
  • system
  • jetson_evidence
  • model_metadata
  • extra

The extra object includes:

  • runtime
  • json_export
  • output_mode: auto or explicit
  • latest_path: currently results/latest.json
  • manifest_recorded: true when --manifest was provided, otherwise false
  • manifest_precision: recorded from artifact.precision
  • manifest_format: recorded from artifact.format
  • power_mode: optional Jetson power mode label
  • jetson_clocks: optional jetson_clocks state
  • tegrastats_log_path: optional source log path for thermal/power evidence
  • tegrastats_status: not_provided, parsed, unavailable, or no_samples
  • compare_ready: currently true
  • compare_key
  • backend_key
  • compare_model_source: manifest_source_model or model_path
  • compare_model_name: normalized model component used by compare_key

Top-level compatibility fields:

  • compare_key
  • backend_key
  • runtime_role
  • model_name
  • manifest_path
  • model_path
  • engine_name
  • engine_backend
  • device_name
  • batch
  • height
  • width
  • mean_ms
  • p50_ms
  • p95_ms
  • p99_ms
  • fps_value
  • success
  • status

The schema regression fixture lives at tests/fixtures/lab_compatible_result.json, and tests/test_lab_result_schema.py validates both the fixture and smoke-generated Runtime JSON.

See examples/README.md for command examples and compact JSON field notes.

Forge Handoff Input Preparation

Runtime can now record a manifest path produced by Forge or another build stage and apply a limited set of manifest values as default runtime config. It can also validate Forge metadata.json / manifest.json handoff fixtures without executing an artifact. CLI-provided values always take priority over handoff defaults.

Sample manifest:

  • examples/manifest.sample.json
  • tests/fixtures/forge_handoff_manifest.json
  • tests/fixtures/forge_handoff_metadata.json

Current behavior:

  • Runtime records the --manifest path in the result JSON.
  • Runtime reads limited defaults from examples/manifest.sample.json and Forge manifest.json style handoffs.
  • Runtime can read Forge metadata.json with --forge-metadata.
  • Runtime applies handoff defaults only when the same value was not provided directly by CLI.
  • --validate-forge-handoff parses and validates the handoff input, then exits before execution.

Applied manifest fields:

  • artifact.model_path
  • artifact.path
  • lab_compat.runtime.runtime_artifact_path
  • runtime.engine
  • runtime.device
  • runtime.precision
  • runtime.batch
  • runtime.height
  • runtime.width

Recorded-only manifest fields:

  • artifact.precision
  • artifact.format
  • artifact.sha256
  • source_model.sha256
  • build.preset_name
  • build.build_id

Compare-key manifest fields:

  • source_model.path
  • artifact.model_name as a fallback when source_model.path is absent

Not applied yet:

  • warmup
  • runs
  • output
  • arbitrary metadata

Default build example:

./build/inferedge-runtime --manifest examples/manifest.sample.json --output auto

Forge handoff validation examples:

./build/inferedge-runtime --forge-manifest tests/fixtures/forge_handoff_manifest.json --validate-forge-handoff
./build/inferedge-runtime --forge-metadata tests/fixtures/forge_handoff_metadata.json --validate-forge-handoff

The sample manifest uses /path/to/model.onnx as a placeholder. For a real run, either edit a local manifest outside the repository to point at a real model or override the model path from the CLI.

CLI override example:

./build-ort/inferedge-runtime --manifest examples/manifest.sample.json --model /Users/GwonHyeokJun/Desktop/edgebench/models/toy224.onnx --batch 1 --height 224 --width 224 --output auto

Draft manifest schema direction:

schema_version: inferedge-forge-manifest-v1
artifact:
  model_path: /path/to/model.onnx
  model_name: toy224.onnx
  precision: fp32
  format: onnx
runtime:
  engine: onnxruntime
  device: cpu
  batch: 1
  height: 224
  width: 224
metadata:
  source: InferEdgeForge
  created_at: 2026-04-26T12:00:00Z
  notes: optional build notes

InferEdgeLab Compatibility

Runtime JSON results include both nested structured fields and top-level compatibility fields.

The nested fields are intended for detailed reports and future schema expansion. The top-level compatibility fields are intended for quick comparison in InferEdgeLab and EdgeBench-style loaders without deep nested parsing.

Runtime does not perform comparison calculations. It only writes compare-ready metadata that Lab can consume:

  • compare_key: groups results from the same model and input condition, such as toy224__b1__h224w224__fp32
  • backend_key: identifies the backend/device pair, such as onnxruntime__cpu or tensorrt__jetson
  • runtime_role: fixed to runtime-result
  • top-level latency aliases: mean_ms, p50_ms, p95_ms, and p99_ms

The model component of compare_key prefers manifest source_model.path when available, then falls back to the CLI --model path stem. This lets TensorRT artifacts with generic filenames such as model.engine still produce a source-model-specific key like yolov8n__b1__h640w640__fp32 when Forge supplies source model identity.

InferEdgeLab can compare results that share the same compare_key and use backend_key to distinguish backend/device variants.

Forge -> Runtime -> Lab flow:

  1. Forge builds or exports model artifacts and provenance.
  2. Runtime runs ONNX Runtime or Jetson TensorRT linked-build benchmarks and writes Lab-compatible JSON results.
  3. Lab reads JSON results and owns comparison, reporting, API/job workflow, and deployment decision output.

Repository Layout

.
├── CMakeLists.txt
├── CHANGELOG.md
├── include/
│   └── inferedge_runtime/
│       ├── cli.hpp
│       ├── engine.hpp
│       ├── manifest.hpp
│       ├── result_writer.hpp
│       ├── version.hpp
│       └── engines/
│           ├── onnxruntime_engine.hpp
│           └── tensorrt_engine.hpp
├── src/
│   ├── cli.cpp
│   ├── engine.cpp
│   ├── main.cpp
│   ├── manifest.cpp
│   ├── result_writer.cpp
│   └── engines/
│       ├── onnxruntime_engine.cpp
│       └── tensorrt_engine.cpp
├── scripts/
│   ├── smoke_default.sh
│   └── smoke_ort.sh
├── docs/
│   ├── benchmark_policy.md
│   ├── mvp_validation.md
│   └── tensorrt_backend_plan.md
├── examples/
│   └── README.md
└── tests/
    └── README.md

Roadmap

  • CLI skeleton
  • Backend interface and ONNX Runtime stub backend
  • ONNX Runtime C++ link configuration
  • ONNX model metadata loading
  • ONNX Runtime dummy inference
  • Benchmark runner
  • JSON result export
  • Lab-compatible JSON fields
  • Scripted smoke tests
  • GitHub Actions CI smoke tests
  • Auto result naming and latest.json handoff
  • Manifest path recording for Forge handoff preparation
  • Example Forge manifest
  • Forge manifest parsing and config default apply
  • Robust manifest parser or external JSON dependency decision
  • TensorRT backend stub
  • TensorRT backend implementation plan
  • TensorRT CMake link validation
  • TensorRT engine deserialization on Jetson
  • TensorRT metadata extraction
  • TensorRT one-shot inference
  • TensorRT benchmark runner on Jetson
  • Optional real image input inference mode
  • TensorRT output post-processing
  • TensorRT/ONNX Runtime comparison through InferEdgeLab demo evidence
  • InferEdgeLab direct import workflow through Local Studio / Lab result ingest

Version

Current version: v0.1.0 (MVP)

See CHANGELOG.md for details.

About

C++ runtime execution and Jetson/ONNX Runtime evidence export layer for Edge AI inference validation.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors