InferEdgeLab

End-to-end Edge AI inference validation pipeline
(C++ runtime · Jetson execution · validation · deployment decision)

Language: English | 한국어

GitHub description: Analysis/API layer for end-to-end Edge AI inference validation, reports, jobs, and deployment decisions.

Summary

End-to-end validation pipeline: Forge -> Runtime -> Lab -> optional AIGuard
Real device execution: Jetson TensorRT + ONNX Runtime CPU
Structured comparison: latency, accuracy, and validation evidence
Deployment decision: deployable / review / blocked
Local Studio: interactive workflow UI for inference validation

What Makes InferEdge Different?

InferEdge is not a benchmark tool.

It is a validation pipeline that:

runs real inference on edge devices
evaluates accuracy and output validity
detects anomalies and contract violations
produces deployment-ready decisions

InferEdge Pipeline Overview

InferEdge is organized as one product-style Edge AI inference validation pipeline:

ONNX model
-> InferEdgeForge build
-> metadata / manifest / worker runtime summary
-> InferEdgeRuntime validation / result export
-> InferEdgeLab compare / API / job workflow / deployment_decision
-> optional InferEdgeAIGuard provenance diagnosis
-> deploy / review / blocked decision

Repository roles are deliberately split:

InferEdgeForge: build artifact and provenance generation.
InferEdgeRuntime: C++ execution, profiling, result export, and worker response boundary.
InferEdgeLab: compare/report/API/job workflow and final deployment decision ownership.
InferEdgeAIGuard: optional rule + evidence based failure and provenance diagnosis.

Implemented today: Lab API response contract, /api/compare, /api/analyze in-memory jobs, worker request/response mappings, Runtime dry-run validation/export, Forge worker/runtime summary, AIGuard provenance mismatch diagnosis, Lab decision/report evidence smoke coverage, dev-only Lab -> Runtime ONNX Runtime smoke using yolov8n.onnx, manual Jetson TensorRT Runtime smoke using a Forge manifest plus TensorRT engine artifact, and Runtime source-model identity preservation for compare-ready TensorRT engine results.

Runtime identity polish: when a Forge manifest is applied, Runtime now preserves the manifest source_model.path identity for comparison naming. A TensorRT artifact such as model.engine can therefore keep compare_model_name=yolov8n and compare_key=yolov8n__b1__h640w640__fp32 instead of degrading to model__.... This is provenance/compare-readiness polish, not production SaaS infrastructure.

Not implemented yet: real worker daemon, full automated Forge/Runtime execution from production Lab workers, DB/Redis/queue, file upload, production frontend beyond Local Studio, and production auth/billing/deployment controls.

Portfolio entry points: portfolio submission · resume/interview summary · 1-page architecture summary · pipeline status

Interview one-liner: InferEdge is an end-to-end inference validation pipeline that converts, runs, compares, diagnoses, and decides whether an edge AI model candidate is ready to deploy.

Current Validation Evidence

YOLOv8n is validated through the current Local Studio evidence fixtures and Jetson Evidence Track result JSONs. InferEdgeRuntime generates compare-ready JSON results, and InferEdgeLab groups and compares them by compare_key, backend_key, precision, and run context.

Evidence	Backend	Precision	Power Mode	Mean ms	P95 ms	P99 ms	FPS
Local Studio baseline	ONNX Runtime CPU	FP32	n/a	45.4299	n/a	49.2128	22.0119
Local Studio candidate	TensorRT Jetson	FP16	25W	10.066401	15.476641	15.548438	99.340373
Jetson power-mode evidence	TensorRT Jetson	FP16	15W	10.799106	15.438690	15.529218	92.600262

The current Local Studio demo shows TensorRT Jetson FP16 25W as about 4.51x faster than the ONNX Runtime CPU FP32 baseline. The Jetson 15W/25W comparison is tracked as system evidence because power mode changes the run configuration. These measurements use InferEdgeRuntime end-to-end Runtime latency, not trtexec GPU-only latency. The full pipeline portfolio summary is available at docs/portfolio/inferedge_pipeline_portfolio.md, and the detailed Runtime comparison report is available at docs/portfolio/runtime_compare_yolov8n.md. The final local-first validation completion pass is summarized in docs/portfolio/final_validation_completion.md. The YOLOv8 COCO subset accuracy demo is documented in docs/portfolio/yolov8_coco_subset_evaluation.md. Validation problem cases are documented in docs/portfolio/validation_problem_cases.md.

Local Studio (Recommended Demo Entry Point)

InferEdge Local Studio is a local-first browser interface for inspecting the existing CLI workflow, API/job contracts, Runtime evidence, Compare View, Jetson command helper, and Lab-owned deployment decision structure. It runs on the user's machine through the FastAPI server and is intended as a local workflow UI foundation, not a production SaaS dashboard or cloud dashboard.

InferEdge Local Studio can replay the bundled portfolio evidence without requiring a live Jetson device during an interview walkthrough. The Load Demo Evidence flow imports the ONNX Runtime CPU and TensorRT Jetson Runtime JSON fixtures from examples/studio_demo, refreshes Compare View, and keeps the demo pair selectable in Recent jobs while the local server process is running.

Recommended demo flow:

Run poetry run inferedgelab serve --host 127.0.0.1 --port 8000
Open http://localhost:8000/studio
Click Load Demo Evidence
Review TensorRT vs ONNX Runtime comparison and deployment decision context

The same evidence can be exported from the CLI without opening the browser:

poetry run inferedgelab demo-evidence-summary
poetry run inferedgelab demo-evidence-summary --format json
poetry run inferedgelab portfolio-demo-check
poetry run inferedgelab export-demo-evidence --output reports/studio_demo_evidence.md

portfolio-demo-check is the pre-submission guardrail for this portfolio demo. It validates the committed Studio fixtures, expected README/PPT metrics, portfolio docs, and local Studio assets without starting workers, queues, databases, or a production SaaS service.

Verified demo fixture values:

Backend	Device	Precision	Power Mode	Mean ms	P95 ms	P99 ms	FPS	Compare Key
ONNX Runtime	CPU	FP32	n/a	45.4299	n/a	49.2128	22.0119	`yolov8n__b1__h640w640__fp32`
TensorRT	Jetson	FP16	25W	10.066401	15.476641	15.548438	99.340373	`yolov8n__b1__h640w640__fp16`

Studio reports this as about a 4.51x TensorRT speedup for the bundled demo pair. AIGuard remains optional in this local Studio path; if Guard evidence is not loaded, the deployment decision explains that the Lab comparison is available but diagnosis evidence is not provided. The same demo flow also surfaces a small yolov8_coco evaluation report summary: 10 images, 89 ground-truth boxes, mAP@50 0.1410, precision 0.2941, recall 0.1685, structural validation passed. It also includes problem-case summaries for annotation-missing review, invalid detection structure blocking, contract shape mismatch blocking, and latency regression review.

What works today:

Run creates an in-memory analyze job through the existing /api/analyze contract.
Import accepts a Runtime result JSON path or pasted JSON payload and adds it to the in-memory compare-ready evidence set.
Load Demo Evidence imports the bundled ONNX Runtime CPU and TensorRT Jetson fixtures for a stable browser demo.
Compare View shows TensorRT vs ONNX Runtime mean latency, p99, FPS, latency diff, and speedup when compatible evidence is loaded.
Jetson Helper shows the local command shape for running the Runtime on a Jetson device.
Deployment Decision stays Lab-owned; AIGuard is optional deterministic diagnosis evidence.

Current non-goals remain unchanged: no DB, queue, upload service, production auth, billing, or production SaaS worker orchestration. Jobs and imported Studio evidence are in-memory and reset when the local server process restarts.

Reproducible Review Flow

For a quick review, follow this order:

Read the pipeline summary: docs/portfolio/inferedge_pipeline_portfolio.md
Check the real benchmark result: docs/portfolio/runtime_compare_yolov8n.md
Review the current submission draft: docs/portfolio/inferedge_portfolio_submission.md
Run Lab comparison with compare-runtime-dir if local InferEdgeRuntime JSON artifacts are available.

Raw Runtime JSON and generated benchmark reports are intentionally not committed because they are environment-dependent. Instead, this README and the portfolio documents preserve validated benchmark numbers as stable review evidence.

graph LR
    A["InferEdgeForge<br/>Build / Convert / Manifest"] --> B["InferEdgeRuntime<br/>Run Inference / Benchmark / JSON Export"]
    B --> C["InferEdgeLab<br/>Group / Compare / Report"]
    C --> D["Portfolio Report<br/>Markdown / PDF Draft"]

Runtime measures. Lab compares. Portfolio documents explain the evidence.

Sample Runtime Result

This is a compact example of the structured result shape that InferEdgeRuntime exports and InferEdgeLab groups by compare_key and backend_key.

{
  "compare_key": "yolov8n__b1__h640w640__fp16",
  "backend_key": "tensorrt__jetson",
  "mean_ms": 10.066401,
  "p95_ms": 15.476641,
  "p99_ms": 15.548438,
  "fps_value": 99.340373,
  "success": true,
  "status": "success",
  "run_config": {
    "power_mode": "25W",
    "jetson_clocks": "on"
  },
  "extra": {
    "input_mode": "dummy",
    "precision": "fp16",
    "power_mode": "25W"
  }
}

What InferEdgeLab Solves

1. Inconsistent Benchmark Comparisons

Most benchmark comparisons silently differ in batch size, input shape, or precision — leading to false improvements and missed regressions.

InferEdgeLab stores run_config and input shape as structured metadata and enforces same-condition comparison, explicitly separating same-precision and cross-precision semantics.

2. No Interpretation for Precision Trade-offs

Switching FP32 → INT8 changes both latency and accuracy, but most tools only show raw numbers.

InferEdgeLab computes latency delta + accuracy delta together and classifies the result:

acceptable_tradeoff
caution_tradeoff
risky_tradeoff
severe_tradeoff

3. Benchmark Results Are Not Reusable

Typical benchmarking is one-time execution with no structured storage.

InferEdgeLab saves all results as structured JSON, enabling compare, compare-latest, and history-report — reused across CLI, FastAPI, and CI pipelines.

Architecture Snapshot

CLI / API → Service Layer → Structured Result → Compare / Report

CLI Layer: profile, compare, compare-latest, summarize, list-results, history-report, enrich, serve
Service Layer: reusable validation logic
API Adapter Layer: FastAPI read-only endpoints
Engine Layer: ONNX Runtime CPU · TensorRT (Jetson) · RKNN (Odroid)

Contract-Based Validation

InferEdgeLab treats model evaluation as a contract/preset-based validation workflow, not as a claim that any arbitrary model can be automatically scored without context. evaluate-detection now supports the yolov8_coco preset, optional model_contract.json, COCO annotations, YOLO txt labels, structural detection-output validation, and JSON/Markdown/HTML evaluation reports. Metric evaluation defaults to the lightweight --metric-backend simplified path and can explicitly request --metric-backend pycocotools when the optional pycocotools package is installed. When annotations are not provided, accuracy is explicitly marked as skipped and the report records structural validation only.

Planned presets such as resnet_imagenet and custom_contract keep future evaluation work scoped to explicit model contracts and dataset assumptions. Small normal/problem contract fixtures live under examples/validation_demo/.

Key Results (Real Hardware Validation)

InferEdgeLab was validated on real edge hardware using YOLOv8 models.

Jetson TensorRT (Haeundae YOLOv8n)

InferEdgeLab can now consume externally produced Jetson TensorRT latency results and engine artifacts, generate Haeundae YOLOv8n detection accuracy payloads with evaluate-detection, attach them through enrich-pair, and report an accuracy-aware FP16 vs FP32 comparison. In the recorded downstream comparison, FP16 was 8.8819ms mean / 13.7437ms p99 with 0.8037 mAP@50, while FP32 was 10.2869ms mean / 18.1921ms p99 with 0.8041 mAP@50; the Lab judgement was tradeoff_slower / not_beneficial.

Odroid M2 (RKNN)

Model	Precision	Mean Latency (ms)	P99 (ms)	Observation
YOLOv8n	FP16	72.4430	79.1559	enriched runtime baseline
YOLOv8n	INT8	35.5771	45.3868	-50.89% latency, acceptable_tradeoff
YOLOv8s	FP16	85.8169	109.4198	enriched runtime baseline
YOLOv8s	INT8	49.9623	58.6213	-41.78% latency, acceptable_tradeoff
YOLOv8m	FP16	171.9906	192.6720	enriched runtime baseline
YOLOv8m	INT8	87.8136	111.5943	-48.94% latency, acceptable_tradeoff

Interpretation

INT8 quantization provided ~42–51% latency improvement on RK3588 NPU across YOLOv8n/s/m
Initial cross-precision runtime comparison is classified as tradeoff_faster
Before accuracy attachment, the same runtime pair is classified as unknown_risk
After attaching detection accuracy payloads through enrich-pair, the runtime pairs for yolov8n, yolov8s, and yolov8m are all reinterpreted as acceptable_tradeoff
Primary metric (map50) improved across all three enriched pairs:
- yolov8n: 0.7791 → 0.7977 (+1.86pp)
- yolov8s: 0.7840 → 0.8090 (+2.50pp)
- yolov8m: 0.7856 → 0.7975 (+1.19pp)
Some secondary metrics such as map50_95, f1_score, and precision may still decline, which shows why deployment decisions should be based on an explicitly chosen primary metric rather than a single raw speed number

This workflow demonstrates how a latency-only benchmark can be transformed into an accuracy-aware deployment decision without re-running the full profiling process.

Proven in Practice

Validated on real edge hardware:

Scope	Status
ONNX Runtime CPU profiling + structured result	✅
Jetson TensorRT repeated validation + report reuse	✅
Jetson TensorRT Haeundae YOLOv8n downstream accuracy enrichment and compare	✅
Odroid RKNN curated validation + cross-precision comparison	✅
Odroid RKNN enriched validation with accuracy-aware trade-off interpretation (`yolov8n/s/m`)	✅
FastAPI read-only adapter (service reuse)	✅
CI benchmark + validation gate	✅

Start Here

Additional reference docs include the pipeline contract, benchmark reference table, Jetson TensorRT validation runbook, async job workflow contract, Forge/Runtime worker integration contract, and project roadmap. Legacy/reference portfolio notes are preserved in pipeline portfolio summary, older PDF draft, and EdgeBench-era design notes.

End-to-End Demo

scripts/demo_pipeline_full.sh is the guided portfolio demo entrypoint for the full InferEdge flow: Forge -> Runtime -> Lab -> optional AIGuard. By default it prints a safe demo summary and does not start a production worker daemon, queue, database, or SaaS worker. It separates macOS Lab -> Runtime ONNX Runtime smoke from Jetson TensorRT manifest smoke and preserves the current SaaS-ready validation foundation scope.

bash scripts/demo_pipeline_full.sh
bash scripts/demo_pipeline_full.sh --help
bash scripts/demo_pipeline_full.sh --run-jetson-command-print

🚀 Quickstart (3-minute demo)

Clone and install

git clone https://github.com/gwonxhj/InferEdgeLab.git
cd InferEdgeLab

pip install poetry
poetry install

Generate a toy model

poetry run python scripts/make_toy_model.py \
  --height 224 \
  --width 224 \
  --out models/toy224.onnx

Profile

poetry run inferedgelab profile models/toy224.onnx \
  --warmup 10 \
  --runs 50 \
  --batch 1 \
  --height 224 \
  --width 224

Compare

poetry run inferedgelab compare-latest \
  --model toy224.onnx \
  --engine onnxruntime \
  --device cpu

Optional Guard reasoning is available with compare --with-guard and compare-latest --with-guard. InferEdgeAIGuard is an optional dependency; when it is installed, Lab appends Guard Analysis based on the compare result and judgement, and when it is not installed, compare still runs normally. Compare and compare-latest also include a Deployment Decision that combines Lab judgement with Guard status into a deployable, review, blocked, or unknown release signal.

Core workflow:

profile → structured result → compare → report / CI

Runtime Integration

InferEdgeLab can consume compare-ready JSON files produced by InferEdgeRuntime and compare them automatically at the directory level. Runtime results are grouped by compare_key, then backend measurements are compared by backend_key using mean_ms.

poetry run inferedgelab compare-runtime-dir results/

To save the same grouped comparison as Markdown:

poetry run inferedgelab compare-runtime-dir results/ --report reports/runtime_compare.md

Example compare-ready Runtime fields:

{
  "runtime_role": "runtime-result",
  "compare_key": "toy224__b1__h224w224__fp32",
  "backend_key": "onnxruntime__cpu",
  "mean_ms": 1.4
}

If the same compare_key also has a tensorrt__jetson result, compare-runtime-dir prints the grouped backend latencies and the fastest backend ratio.

Portfolio Example

See YOLOv8n Runtime backend comparison for a real example where InferEdgeRuntime produced ONNX Runtime CPU and TensorRT Jetson JSON results, and InferEdgeLab grouped them by compare_key and backend_key into a Markdown comparison report. The YOLOv8n Runtime comparison report demonstrates a real OpenCV image-input benchmark, compare_key / backend_key automatic grouping, and the role split where Runtime generates JSON while Lab performs comparison and reporting.

API Server Usage

Run server

poetry run inferedgelab serve --host 127.0.0.1 --port 8000

Health check

curl "http://127.0.0.1:8000/health"

Endpoints

/health
/api/list-results
/api/summarize
/api/history-report
/api/compare
/api/compare-latest

More details: FastAPI API usage guide

CI / Benchmarks

InferEdgeLab integrates benchmarking into CI:

structured result reuse
compare-based regression detection
compare-latest automation
CI validation gate
benchmark evidence tracking

Auto-Generated Benchmark Summary

No auto-generated report summaries are available yet.

See: Benchmark reference table · Project roadmap

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 269 Commits
.github/workflows		.github/workflows
assets/images		assets/images
benchmarks		benchmarks
docs		docs
examples		examples
inferedgelab		inferedgelab
reports		reports
scripts		scripts
tests		tests
validation/rknn		validation/rknn
.gitignore		.gitignore
BENCHMARKS.md		BENCHMARKS.md
LICENSE		LICENSE
Makefile		Makefile
README.ko.md		README.ko.md
README.md		README.md
Roadmap.md		Roadmap.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

InferEdgeLab

Summary

What Makes InferEdge Different?

InferEdge Pipeline Overview

Current Validation Evidence

Local Studio (Recommended Demo Entry Point)

Reproducible Review Flow

Sample Runtime Result

What InferEdgeLab Solves

1. Inconsistent Benchmark Comparisons

2. No Interpretation for Precision Trade-offs

3. Benchmark Results Are Not Reusable

Architecture Snapshot

Contract-Based Validation

Key Results (Real Hardware Validation)

Jetson TensorRT (Haeundae YOLOv8n)

Odroid M2 (RKNN)

Interpretation

Proven in Practice

Start Here

End-to-End Demo

🚀 Quickstart (3-minute demo)

Clone and install

Generate a toy model

Profile

Compare

Runtime Integration

Portfolio Example

API Server Usage

Run server

Health check

Endpoints

CI / Benchmarks

Auto-Generated Benchmark Summary

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages