Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 5 additions & 2 deletions README.ko.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,9 +64,12 @@ Studio evidence와 jobs는 in-memory이며 local server process가 재시작되

- macOS ONNX Runtime CPU smoke: Lab -> C++ Runtime CLI -> ONNX Runtime CPU execution -> Lab job result ingestion 경로 검증.
- Jetson Orin Nano TensorRT smoke: Forge manifest + TensorRT engine artifact를 C++ Runtime CLI가 실행한 evidence 확보.
- YOLOv8n real image benchmark:
- TensorRT Jetson: mean `9.9375 ms`, p99 `15.5231 ms`, FPS `100.6293`
- Local Studio demo evidence:
- TensorRT Jetson FP16 25W: mean `10.066401 ms`, p95 `15.476641 ms`, p99 `15.548438 ms`, FPS `99.340373`
- ONNX Runtime CPU: mean `45.4299 ms`, p99 `49.2128 ms`, FPS `22.0119`
- Jetson Evidence Track:
- TensorRT Jetson FP16 15W: mean `10.799106 ms`, p95 `15.438690 ms`, p99 `15.529218 ms`, FPS `92.600262`
- power mode는 run configuration의 일부이므로 15W/25W 결과는 같은 조건 회귀가 아니라 system evidence로 해석합니다.
- Runtime source model identity polish: TensorRT `model.engine` 실행에서도 Forge manifest의 `source_model.path`를 우선해 `compare_key=yolov8n__b1__h640w640__fp32`를 유지할 수 있습니다.

## 설치와 빠른 실행
Expand Down
47 changes: 28 additions & 19 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,17 +65,20 @@ Interview one-liner: **InferEdge is an end-to-end inference validation pipeline

---

## Real Inference Benchmark Result
## Current Validation Evidence

YOLOv8n was validated with a real OpenCV image-input benchmark: InferEdgeRuntime generated compare-ready JSON results, and InferEdgeLab automatically grouped and compared them by `compare_key` and `backend_key`.
YOLOv8n is validated through the current Local Studio evidence fixtures and Jetson Evidence Track result JSONs.
InferEdgeRuntime generates compare-ready JSON results, and InferEdgeLab groups and compares them by `compare_key`, `backend_key`, precision, and run context.

| Backend | Input Mode | Mean ms | P99 ms | FPS |
|---|---|---:|---:|---:|
| TensorRT Jetson | image | 9.9375 | 15.5231 | 100.6293 |
| ONNX Runtime CPU | image | 45.4299 | 49.2128 | 22.0119 |
| Evidence | Backend | Precision | Power Mode | Mean ms | P95 ms | P99 ms | FPS |
|---|---|---|---|---:|---:|---:|---:|
| Local Studio baseline | ONNX Runtime CPU | FP32 | n/a | 45.4299 | n/a | 49.2128 | 22.0119 |
| Local Studio candidate | TensorRT Jetson | FP16 | 25W | 10.066401 | 15.476641 | 15.548438 | 99.340373 |
| Jetson power-mode evidence | TensorRT Jetson | FP16 | 15W | 10.799106 | 15.438690 | 15.529218 | 92.600262 |

TensorRT Jetson was 4.6x faster than ONNX Runtime CPU in this real image input benchmark.
The benchmark uses end-to-end Runtime latency, not trtexec GPU-only latency.
The current Local Studio demo shows TensorRT Jetson FP16 25W as about 4.51x faster than the ONNX Runtime CPU FP32 baseline.
The Jetson 15W/25W comparison is tracked as system evidence because power mode changes the run configuration.
These measurements use InferEdgeRuntime end-to-end Runtime latency, not `trtexec` GPU-only latency.
The full pipeline portfolio summary is available at [docs/portfolio/inferedge_pipeline_portfolio.md](docs/portfolio/inferedge_pipeline_portfolio.md), and the detailed Runtime comparison report is available at [docs/portfolio/runtime_compare_yolov8n.md](docs/portfolio/runtime_compare_yolov8n.md).
The final local-first validation completion pass is summarized in [docs/portfolio/final_validation_completion.md](docs/portfolio/final_validation_completion.md).
The YOLOv8 COCO subset accuracy demo is documented in [docs/portfolio/yolov8_coco_subset_evaluation.md](docs/portfolio/yolov8_coco_subset_evaluation.md).
Expand All @@ -100,12 +103,12 @@ Recommended demo flow:

Verified demo fixture values:

| Backend | Device | Mean ms | P99 ms | FPS | Compare Key |
|---|---|---:|---:|---:|---|
| ONNX Runtime | CPU | 45.4299 | 49.2128 | 22.0119 | `yolov8n__b1__h640w640__fp32` |
| TensorRT | Jetson | 9.9375 | 15.5231 | 100.6293 | `yolov8n__b1__h640w640__fp32` |
| Backend | Device | Precision | Power Mode | Mean ms | P95 ms | P99 ms | FPS | Compare Key |
|---|---|---|---|---:|---:|---:|---:|---|
| ONNX Runtime | CPU | FP32 | n/a | 45.4299 | n/a | 49.2128 | 22.0119 | `yolov8n__b1__h640w640__fp32` |
| TensorRT | Jetson | FP16 | 25W | 10.066401 | 15.476641 | 15.548438 | 99.340373 | `yolov8n__b1__h640w640__fp16` |

Studio reports this as a `4.57x` TensorRT speedup for the bundled demo pair.
Studio reports this as about a `4.51x` TensorRT speedup for the bundled demo pair.
AIGuard remains optional in this local Studio path; if Guard evidence is not loaded, the deployment decision explains that the Lab comparison is available but diagnosis evidence is not provided.
The same demo flow also surfaces a small `yolov8_coco` evaluation report summary: 10 images, 89 ground-truth boxes, mAP@50 `0.1410`, precision `0.2941`, recall `0.1685`, structural validation `passed`.
It also includes problem-case summaries for annotation-missing review, invalid detection structure blocking, contract shape mismatch blocking, and latency regression review.
Expand Down Expand Up @@ -153,16 +156,22 @@ This is a compact example of the structured result shape that InferEdgeRuntime e

```json
{
"compare_key": "yolov8n__b1__h640w640__fp32",
"compare_key": "yolov8n__b1__h640w640__fp16",
"backend_key": "tensorrt__jetson",
"mean_ms": 9.9375,
"p99_ms": 15.5231,
"fps_value": 100.6293,
"mean_ms": 10.066401,
"p95_ms": 15.476641,
"p99_ms": 15.548438,
"fps_value": 99.340373,
"success": true,
"status": "success",
"run_config": {
"power_mode": "25W",
"jetson_clocks": "on"
},
"extra": {
"input_mode": "image",
"input_preprocess": "opencv_bgr_to_rgb_resize_float32_nchw"
"input_mode": "dummy",
"precision": "fp16",
"power_mode": "25W"
}
}
```
Expand Down
5 changes: 3 additions & 2 deletions docs/portfolio/final_validation_completion.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,8 +39,9 @@ InferEdge is complete for the current portfolio milestone when it can replay a l
Runtime demo pair:

- ONNX Runtime CPU: 45.4299 ms mean / 49.2128 ms p99 / 22.0119 FPS
- TensorRT Jetson: 9.9375 ms mean / 15.5231 ms p99 / 100.6293 FPS
- Studio speedup display: about 4.57x faster
- TensorRT Jetson FP16 25W: 10.066401 ms mean / 15.548438 ms p99 / 99.340373 FPS
- Jetson FP16 15W power-mode evidence: 10.799106 ms mean / 15.529218 ms p99 / 92.600262 FPS
- Studio speedup display: about 4.51x faster for the ONNX Runtime CPU FP32 vs TensorRT Jetson FP16 25W demo pair

YOLOv8 COCO subset evaluation:

Expand Down
6 changes: 3 additions & 3 deletions docs/portfolio/inferedge_1page_architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,12 +39,12 @@ ONNX model
- `/api/analyze` in-memory job workflow
- Lab `worker_request` / `worker_response` boundary
- Lab -> Runtime dev-only minimal execution smoke using `yolov8n.onnx` (ONNX Runtime CPU, success, mean about 47.97 ms, p95 about 51.80 ms, about 20.85 FPS)
- Jetson Orin Nano TensorRT Runtime smoke using Forge manifest + TensorRT engine artifact (success, manifest applied, mean about 14.00 ms, p99 about 15.50 ms, about 71.44 FPS)
- Local Studio demo evidence replay at `/studio` using bundled ONNX Runtime CPU and TensorRT Jetson result fixtures: 45.4299 ms vs 9.9375 ms mean latency, 49.2128 ms vs 15.5231 ms p99, 22.0119 vs 100.6293 FPS, and a 4.57x TensorRT speedup for the demo pair
- Jetson Orin Nano TensorRT Runtime smoke using Forge manifest + TensorRT engine artifact, now recorded as Jetson Evidence Track fixtures for FP16 25W and 15W power modes
- Local Studio demo evidence replay at `/studio` using bundled ONNX Runtime CPU FP32 and TensorRT Jetson FP16 25W result fixtures: 45.4299 ms vs 10.066401 ms mean latency, 49.2128 ms vs 15.548438 ms p99, 22.0119 vs 99.340373 FPS, and about a 4.51x TensorRT speedup for the demo pair
- Runtime source-model identity polish for manifest-backed TensorRT engine results (`model.engine` can still keep `compare_model_name=yolov8n` and `compare_key=yolov8n__b1__h640w640__fp32`)
- Runtime `worker_request` validation and `worker_response` dry-run export
- Forge worker/runtime summary
- AIGuard provenance mismatch diagnosis
- AIGuard evidence diagnosis cases for provenance mismatch, bbox collapse, score saturation, temporal instability, and normal/pass paths
- Lab decision/report guard evidence smoke
- all repo README pipeline summaries synced

Expand Down
23 changes: 12 additions & 11 deletions docs/portfolio/inferedge_pipeline_portfolio.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,30 +80,31 @@ The benchmark workflow is:
`compare_key` identifies the comparison group for the same model, input shape, and precision.
`backend_key` identifies the actual backend and device combination, such as `onnxruntime__cpu` or `tensorrt__jetson`.

## 5. Real Image Input Validation Result
## 5. Current Local Studio Demo Evidence

This validation used YOLOv8n with real image input:
The current Local Studio demo evidence uses bundled Runtime result fixtures so the comparison can be replayed in a browser without a live Jetson session:

- Model: YOLOv8n
- Input Mode: image
- Input Shape: `1x3x640x640`
- `compare_key`: `yolov8n__b1__h640w640__fp32`
- `input_preprocess`: `opencv_bgr_to_rgb_resize_float32_nchw`
- ONNX baseline `compare_key`: `yolov8n__b1__h640w640__fp32`
- TensorRT candidate `compare_key`: `yolov8n__b1__h640w640__fp16`
- TensorRT power mode: `25W`

| Backend | Input Mode | Mean ms | P99 ms | FPS | Status |
|---|---|---:|---:|---:|---|
| TensorRT Jetson | image | 9.9375 | 15.5231 | 100.6293 | success |
| ONNX Runtime CPU | image | 45.4299 | 49.2128 | 22.0119 | success |
| Backend | Precision | Power Mode | Mean ms | P95 ms | P99 ms | FPS | Status |
|---|---|---|---:|---:|---:|---:|---|
| TensorRT Jetson | FP16 | 25W | 10.066401 | 15.476641 | 15.548438 | 99.340373 | success |
| ONNX Runtime CPU | FP32 | n/a | 45.4299 | n/a | 49.2128 | 22.0119 | success |

- Total compare groups: 1
- Comparable groups count: 1
- Skipped groups count: 0
- Fastest backend: `tensorrt__jetson`
- Slowest backend: `onnxruntime__cpu`
- Speedup ratio: `4.6x`
- ONNX Runtime is 4.6x slower than TensorRT.
- Speedup ratio: about `4.51x`
- ONNX Runtime CPU is about 4.51x slower than TensorRT Jetson FP16 25W for this demo pair.

The Runtime latency is end-to-end wall-clock latency and should not be directly compared with trtexec GPU-only latency.
The historical OpenCV real-image input benchmark remains documented in `runtime_compare_yolov8n.md`, while Local Studio now uses the explicit FP16/25W evidence fixture above.

## 6. Technical Contribution

Expand Down
21 changes: 11 additions & 10 deletions docs/portfolio/inferedge_pipeline_portfolio_pdf.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,24 +70,25 @@ InferEdgeLab

---

## Page 3. Real Benchmark Result & Contribution
## Page 3. Current Demo Evidence & Contribution

### Real Image Input Benchmark
### Local Studio Demo Evidence

- Model: YOLOv8n
- Input Mode: image
- Input Shape: `1x3x640x640`
- `compare_key`: `yolov8n__b1__h640w640__fp32`
- `input_preprocess`: `opencv_bgr_to_rgb_resize_float32_nchw`
- ONNX baseline `compare_key`: `yolov8n__b1__h640w640__fp32`
- TensorRT candidate `compare_key`: `yolov8n__b1__h640w640__fp16`
- TensorRT power mode: `25W`

| Backend | Input Mode | Mean ms | P99 ms | FPS | Status |
|---|---|---:|---:|---:|---|
| TensorRT Jetson | image | 9.9375 | 15.5231 | 100.6293 | success |
| ONNX Runtime CPU | image | 45.4299 | 49.2128 | 22.0119 | success |
| Backend | Precision | Power Mode | Mean ms | P99 ms | FPS | Status |
|---|---|---|---:|---:|---:|---|
| TensorRT Jetson | FP16 | 25W | 10.066401 | 15.548438 | 99.340373 | success |
| ONNX Runtime CPU | FP32 | n/a | 45.4299 | 49.2128 | 22.0119 | success |

TensorRT Jetson was 4.6x faster than ONNX Runtime CPU in this real image input benchmark.
TensorRT Jetson FP16 25W was about 4.51x faster than ONNX Runtime CPU FP32 in the current Local Studio demo evidence.

Runtime latency is measured as end-to-end wall-clock latency and should not be directly compared with trtexec GPU-only latency.
The historical real-image input benchmark remains documented separately in `runtime_compare_yolov8n.md`.

### Technical Contribution

Expand Down
9 changes: 7 additions & 2 deletions docs/portfolio/inferedge_pipeline_status.md
Original file line number Diff line number Diff line change
Expand Up @@ -97,6 +97,7 @@ The current cross-repository loop is covered by documentation, fixtures, and smo
- AIGuard worker provenance mismatch diagnosis
- Lab deployment decision/report evidence smoke for AIGuard worker provenance diagnosis
- Local Studio local-first workflow UI for viewing Forge -> Runtime -> Lab -> optional AIGuard state, creating in-memory analyze jobs, importing Runtime result JSON, replaying bundled demo evidence, comparing backends, and inspecting Lab-owned deployment decision context
- Local Studio portfolio demo evidence for ONNX Runtime CPU, TensorRT Jetson FP16 25W, Jetson FP16 15W power-mode evidence, and AIGuard diagnosis cases
- YOLOv8 COCO subset evaluation report generated from 10 local images and 89 converted COCO-style person annotations, with metric backend `simplified`, mAP@50 0.1410, precision 0.2941, recall 0.1685, and structural validation passed
- Validation problem case fixtures for annotation-missing review, invalid detection structure blocking, and contract shape mismatch blocking

Expand All @@ -105,7 +106,10 @@ This means the current product boundary is testable without running the producti
InferEdge now has two runtime execution evidence paths:

1. macOS ONNX Runtime CPU smoke through Lab's dev-only Runtime execution path using `yolov8n.onnx`. The smoke created Lab job `job_9e2321179256`, called the C++ Runtime CLI through Lab's subprocess path, executed ONNX Runtime on CPU with FP32, and ingested the resulting JSON back into the Lab job result. Runtime reported input shape `[1, 3, 640, 640]`, output shape `[1, 84, 8400]`, `warmup=1`, `runs=5`, benchmark status success, mean latency about 47.97 ms, p50 about 46.95 ms, p95/p99 about 51.80 ms, and about 20.85 FPS. The resulting `deployment_decision` was `unknown`, which is expected for direct Runtime execution before Lab compare/report.
2. Jetson Orin Nano TensorRT smoke using a Forge-generated manifest and TensorRT engine artifact executed by the C++ Runtime CLI. The manual Jetson smoke ran on Linux `5.15.148-tegra` / `aarch64` from `~/InferEdge-Runtime`, using Forge manifest `/home/risenano01/InferEdgeForge/builds/yolov8n__jetson__tensorrt__jetson_fp16/manifest.json` and artifact `/home/risenano01/InferEdgeForge/builds/yolov8n__jetson__tensorrt__jetson_fp16/model.engine`. The result JSON was `results/jetson/yolov8n_jetson_tensorrt_manifest_smoke.json` and reported `success: true`, `status: success`, `engine_backend: tensorrt`, `device_name: jetson`, `manifest_applied: true`, input shape `[1, 3, 640, 640]`, output shape `[1, 84, 8400]`, mean latency about 14.00 ms, p99 about 15.50 ms, and about 71.44 FPS.
2. Jetson Orin Nano TensorRT smoke using a Forge-generated manifest and TensorRT engine artifact executed by the C++ Runtime CLI. The current Jetson Evidence Track records TensorRT FP16 short-smoke results with tegrastats summaries for both 25W and 15W power modes:
- 25W result: `results/jetson_evidence/yolov8n_trt_fp16_25w_20260504T170039Z.json`, mean `10.066401 ms`, p95 `15.476641 ms`, p99 `15.548438 ms`, FPS `99.340373`.
- 15W result: `results/jetson_evidence/yolov8n_trt_fp16_15w_20260504T171959Z.json`, mean `10.799106 ms`, p95 `15.438690 ms`, p99 `15.529218 ms`, FPS `92.600262`.
- The 15W vs 25W comparison is treated as system evidence because power mode changes the run configuration; it is not interpreted as same-condition model regression.

Compare-key polish status: this limitation has been resolved in InferEdgeRuntime #37. When a Forge manifest is applied, Runtime now prefers `manifest.source_model.path` for compare naming, so a TensorRT artifact path such as `model.engine` can still produce `compare_model_name=yolov8n` and `compare_key=yolov8n__b1__h640w640__fp32`. This improves provenance and compare-readiness; it does not add production SaaS worker infrastructure.

Expand All @@ -131,10 +135,11 @@ This does not mean production SaaS is complete.
- Runtime compare-key identity polish for manifest-backed engine artifacts
- Guided end-to-end demo entrypoint for portfolio and interview walkthroughs
- Local Studio at `/studio` for a local-first browser view of Run / Import / Demo Evidence / Compare / Decision / Jetson Helper workflows
- Jetson Evidence Track short-smoke fixtures with TensorRT FP16 25W and 15W power-mode context, tegrastats summaries, and Lab-compatible Runtime JSON import
- Contract/preset validation demo with `yolov8_coco`, COCO annotation loading, `--metric-backend simplified` by default, optional `pycocotools` backend contract, structural validation, and JSON/Markdown/HTML report fixtures
- Problem-case validation reports that make skipped accuracy, invalid output structure, contract mismatch, and latency regression visible in Local Studio
- Cross-repo fixture compatibility across Forge, Runtime, Lab, and AIGuard
- Rule/evidence based provenance mismatch diagnosis
- Rule/evidence based AIGuard diagnosis, including normal/pass, bbox collapse/blocked, score saturation/blocked, temporal instability/review_required, and provenance mismatch cases

### Planned Later

Expand Down
Loading
Loading