From 48bebe283d2bd2f025ac280b94985aa1f9ba3600 Mon Sep 17 00:00:00 2001 From: hyeokjun32 Date: Tue, 5 May 2026 02:49:36 +0900 Subject: [PATCH] docs: sync current evidence status --- README.ko.md | 7 ++- README.md | 47 +++++++++++-------- docs/portfolio/final_validation_completion.md | 5 +- .../portfolio/inferedge_1page_architecture.md | 6 +-- .../portfolio/inferedge_pipeline_portfolio.md | 23 ++++----- .../inferedge_pipeline_portfolio_pdf.md | 21 +++++---- docs/portfolio/inferedge_pipeline_status.md | 9 +++- .../inferedge_portfolio_submission.md | 6 +-- .../inferedge_resume_interview_summary.md | 16 +++---- docs/portfolio/runtime_compare_yolov8n.md | 19 +++++++- 10 files changed, 98 insertions(+), 61 deletions(-) diff --git a/README.ko.md b/README.ko.md index de37d99..fa27998 100644 --- a/README.ko.md +++ b/README.ko.md @@ -64,9 +64,12 @@ Studio evidence와 jobs는 in-memory이며 local server process가 재시작되 - macOS ONNX Runtime CPU smoke: Lab -> C++ Runtime CLI -> ONNX Runtime CPU execution -> Lab job result ingestion 경로 검증. - Jetson Orin Nano TensorRT smoke: Forge manifest + TensorRT engine artifact를 C++ Runtime CLI가 실행한 evidence 확보. -- YOLOv8n real image benchmark: - - TensorRT Jetson: mean `9.9375 ms`, p99 `15.5231 ms`, FPS `100.6293` +- Local Studio demo evidence: + - TensorRT Jetson FP16 25W: mean `10.066401 ms`, p95 `15.476641 ms`, p99 `15.548438 ms`, FPS `99.340373` - ONNX Runtime CPU: mean `45.4299 ms`, p99 `49.2128 ms`, FPS `22.0119` +- Jetson Evidence Track: + - TensorRT Jetson FP16 15W: mean `10.799106 ms`, p95 `15.438690 ms`, p99 `15.529218 ms`, FPS `92.600262` + - power mode는 run configuration의 일부이므로 15W/25W 결과는 같은 조건 회귀가 아니라 system evidence로 해석합니다. - Runtime source model identity polish: TensorRT `model.engine` 실행에서도 Forge manifest의 `source_model.path`를 우선해 `compare_key=yolov8n__b1__h640w640__fp32`를 유지할 수 있습니다. ## 설치와 빠른 실행 diff --git a/README.md b/README.md index 520df32..a53ea17 100644 --- a/README.md +++ b/README.md @@ -65,17 +65,20 @@ Interview one-liner: **InferEdge is an end-to-end inference validation pipeline --- -## Real Inference Benchmark Result +## Current Validation Evidence -YOLOv8n was validated with a real OpenCV image-input benchmark: InferEdgeRuntime generated compare-ready JSON results, and InferEdgeLab automatically grouped and compared them by `compare_key` and `backend_key`. +YOLOv8n is validated through the current Local Studio evidence fixtures and Jetson Evidence Track result JSONs. +InferEdgeRuntime generates compare-ready JSON results, and InferEdgeLab groups and compares them by `compare_key`, `backend_key`, precision, and run context. -| Backend | Input Mode | Mean ms | P99 ms | FPS | -|---|---|---:|---:|---:| -| TensorRT Jetson | image | 9.9375 | 15.5231 | 100.6293 | -| ONNX Runtime CPU | image | 45.4299 | 49.2128 | 22.0119 | +| Evidence | Backend | Precision | Power Mode | Mean ms | P95 ms | P99 ms | FPS | +|---|---|---|---|---:|---:|---:|---:| +| Local Studio baseline | ONNX Runtime CPU | FP32 | n/a | 45.4299 | n/a | 49.2128 | 22.0119 | +| Local Studio candidate | TensorRT Jetson | FP16 | 25W | 10.066401 | 15.476641 | 15.548438 | 99.340373 | +| Jetson power-mode evidence | TensorRT Jetson | FP16 | 15W | 10.799106 | 15.438690 | 15.529218 | 92.600262 | -TensorRT Jetson was 4.6x faster than ONNX Runtime CPU in this real image input benchmark. -The benchmark uses end-to-end Runtime latency, not trtexec GPU-only latency. +The current Local Studio demo shows TensorRT Jetson FP16 25W as about 4.51x faster than the ONNX Runtime CPU FP32 baseline. +The Jetson 15W/25W comparison is tracked as system evidence because power mode changes the run configuration. +These measurements use InferEdgeRuntime end-to-end Runtime latency, not `trtexec` GPU-only latency. The full pipeline portfolio summary is available at [docs/portfolio/inferedge_pipeline_portfolio.md](docs/portfolio/inferedge_pipeline_portfolio.md), and the detailed Runtime comparison report is available at [docs/portfolio/runtime_compare_yolov8n.md](docs/portfolio/runtime_compare_yolov8n.md). The final local-first validation completion pass is summarized in [docs/portfolio/final_validation_completion.md](docs/portfolio/final_validation_completion.md). The YOLOv8 COCO subset accuracy demo is documented in [docs/portfolio/yolov8_coco_subset_evaluation.md](docs/portfolio/yolov8_coco_subset_evaluation.md). @@ -100,12 +103,12 @@ Recommended demo flow: Verified demo fixture values: -| Backend | Device | Mean ms | P99 ms | FPS | Compare Key | -|---|---|---:|---:|---:|---| -| ONNX Runtime | CPU | 45.4299 | 49.2128 | 22.0119 | `yolov8n__b1__h640w640__fp32` | -| TensorRT | Jetson | 9.9375 | 15.5231 | 100.6293 | `yolov8n__b1__h640w640__fp32` | +| Backend | Device | Precision | Power Mode | Mean ms | P95 ms | P99 ms | FPS | Compare Key | +|---|---|---|---|---:|---:|---:|---:|---| +| ONNX Runtime | CPU | FP32 | n/a | 45.4299 | n/a | 49.2128 | 22.0119 | `yolov8n__b1__h640w640__fp32` | +| TensorRT | Jetson | FP16 | 25W | 10.066401 | 15.476641 | 15.548438 | 99.340373 | `yolov8n__b1__h640w640__fp16` | -Studio reports this as a `4.57x` TensorRT speedup for the bundled demo pair. +Studio reports this as about a `4.51x` TensorRT speedup for the bundled demo pair. AIGuard remains optional in this local Studio path; if Guard evidence is not loaded, the deployment decision explains that the Lab comparison is available but diagnosis evidence is not provided. The same demo flow also surfaces a small `yolov8_coco` evaluation report summary: 10 images, 89 ground-truth boxes, mAP@50 `0.1410`, precision `0.2941`, recall `0.1685`, structural validation `passed`. It also includes problem-case summaries for annotation-missing review, invalid detection structure blocking, contract shape mismatch blocking, and latency regression review. @@ -153,16 +156,22 @@ This is a compact example of the structured result shape that InferEdgeRuntime e ```json { - "compare_key": "yolov8n__b1__h640w640__fp32", + "compare_key": "yolov8n__b1__h640w640__fp16", "backend_key": "tensorrt__jetson", - "mean_ms": 9.9375, - "p99_ms": 15.5231, - "fps_value": 100.6293, + "mean_ms": 10.066401, + "p95_ms": 15.476641, + "p99_ms": 15.548438, + "fps_value": 99.340373, "success": true, "status": "success", + "run_config": { + "power_mode": "25W", + "jetson_clocks": "on" + }, "extra": { - "input_mode": "image", - "input_preprocess": "opencv_bgr_to_rgb_resize_float32_nchw" + "input_mode": "dummy", + "precision": "fp16", + "power_mode": "25W" } } ``` diff --git a/docs/portfolio/final_validation_completion.md b/docs/portfolio/final_validation_completion.md index 4a71b06..2af4be3 100644 --- a/docs/portfolio/final_validation_completion.md +++ b/docs/portfolio/final_validation_completion.md @@ -39,8 +39,9 @@ InferEdge is complete for the current portfolio milestone when it can replay a l Runtime demo pair: - ONNX Runtime CPU: 45.4299 ms mean / 49.2128 ms p99 / 22.0119 FPS -- TensorRT Jetson: 9.9375 ms mean / 15.5231 ms p99 / 100.6293 FPS -- Studio speedup display: about 4.57x faster +- TensorRT Jetson FP16 25W: 10.066401 ms mean / 15.548438 ms p99 / 99.340373 FPS +- Jetson FP16 15W power-mode evidence: 10.799106 ms mean / 15.529218 ms p99 / 92.600262 FPS +- Studio speedup display: about 4.51x faster for the ONNX Runtime CPU FP32 vs TensorRT Jetson FP16 25W demo pair YOLOv8 COCO subset evaluation: diff --git a/docs/portfolio/inferedge_1page_architecture.md b/docs/portfolio/inferedge_1page_architecture.md index ad1e17e..8252493 100644 --- a/docs/portfolio/inferedge_1page_architecture.md +++ b/docs/portfolio/inferedge_1page_architecture.md @@ -39,12 +39,12 @@ ONNX model - `/api/analyze` in-memory job workflow - Lab `worker_request` / `worker_response` boundary - Lab -> Runtime dev-only minimal execution smoke using `yolov8n.onnx` (ONNX Runtime CPU, success, mean about 47.97 ms, p95 about 51.80 ms, about 20.85 FPS) -- Jetson Orin Nano TensorRT Runtime smoke using Forge manifest + TensorRT engine artifact (success, manifest applied, mean about 14.00 ms, p99 about 15.50 ms, about 71.44 FPS) -- Local Studio demo evidence replay at `/studio` using bundled ONNX Runtime CPU and TensorRT Jetson result fixtures: 45.4299 ms vs 9.9375 ms mean latency, 49.2128 ms vs 15.5231 ms p99, 22.0119 vs 100.6293 FPS, and a 4.57x TensorRT speedup for the demo pair +- Jetson Orin Nano TensorRT Runtime smoke using Forge manifest + TensorRT engine artifact, now recorded as Jetson Evidence Track fixtures for FP16 25W and 15W power modes +- Local Studio demo evidence replay at `/studio` using bundled ONNX Runtime CPU FP32 and TensorRT Jetson FP16 25W result fixtures: 45.4299 ms vs 10.066401 ms mean latency, 49.2128 ms vs 15.548438 ms p99, 22.0119 vs 99.340373 FPS, and about a 4.51x TensorRT speedup for the demo pair - Runtime source-model identity polish for manifest-backed TensorRT engine results (`model.engine` can still keep `compare_model_name=yolov8n` and `compare_key=yolov8n__b1__h640w640__fp32`) - Runtime `worker_request` validation and `worker_response` dry-run export - Forge worker/runtime summary -- AIGuard provenance mismatch diagnosis +- AIGuard evidence diagnosis cases for provenance mismatch, bbox collapse, score saturation, temporal instability, and normal/pass paths - Lab decision/report guard evidence smoke - all repo README pipeline summaries synced diff --git a/docs/portfolio/inferedge_pipeline_portfolio.md b/docs/portfolio/inferedge_pipeline_portfolio.md index a49dc63..43b1039 100644 --- a/docs/portfolio/inferedge_pipeline_portfolio.md +++ b/docs/portfolio/inferedge_pipeline_portfolio.md @@ -80,30 +80,31 @@ The benchmark workflow is: `compare_key` identifies the comparison group for the same model, input shape, and precision. `backend_key` identifies the actual backend and device combination, such as `onnxruntime__cpu` or `tensorrt__jetson`. -## 5. Real Image Input Validation Result +## 5. Current Local Studio Demo Evidence -This validation used YOLOv8n with real image input: +The current Local Studio demo evidence uses bundled Runtime result fixtures so the comparison can be replayed in a browser without a live Jetson session: - Model: YOLOv8n -- Input Mode: image - Input Shape: `1x3x640x640` -- `compare_key`: `yolov8n__b1__h640w640__fp32` -- `input_preprocess`: `opencv_bgr_to_rgb_resize_float32_nchw` +- ONNX baseline `compare_key`: `yolov8n__b1__h640w640__fp32` +- TensorRT candidate `compare_key`: `yolov8n__b1__h640w640__fp16` +- TensorRT power mode: `25W` -| Backend | Input Mode | Mean ms | P99 ms | FPS | Status | -|---|---|---:|---:|---:|---| -| TensorRT Jetson | image | 9.9375 | 15.5231 | 100.6293 | success | -| ONNX Runtime CPU | image | 45.4299 | 49.2128 | 22.0119 | success | +| Backend | Precision | Power Mode | Mean ms | P95 ms | P99 ms | FPS | Status | +|---|---|---|---:|---:|---:|---:|---| +| TensorRT Jetson | FP16 | 25W | 10.066401 | 15.476641 | 15.548438 | 99.340373 | success | +| ONNX Runtime CPU | FP32 | n/a | 45.4299 | n/a | 49.2128 | 22.0119 | success | - Total compare groups: 1 - Comparable groups count: 1 - Skipped groups count: 0 - Fastest backend: `tensorrt__jetson` - Slowest backend: `onnxruntime__cpu` -- Speedup ratio: `4.6x` -- ONNX Runtime is 4.6x slower than TensorRT. +- Speedup ratio: about `4.51x` +- ONNX Runtime CPU is about 4.51x slower than TensorRT Jetson FP16 25W for this demo pair. The Runtime latency is end-to-end wall-clock latency and should not be directly compared with trtexec GPU-only latency. +The historical OpenCV real-image input benchmark remains documented in `runtime_compare_yolov8n.md`, while Local Studio now uses the explicit FP16/25W evidence fixture above. ## 6. Technical Contribution diff --git a/docs/portfolio/inferedge_pipeline_portfolio_pdf.md b/docs/portfolio/inferedge_pipeline_portfolio_pdf.md index 84c358b..f24d788 100644 --- a/docs/portfolio/inferedge_pipeline_portfolio_pdf.md +++ b/docs/portfolio/inferedge_pipeline_portfolio_pdf.md @@ -70,24 +70,25 @@ InferEdgeLab --- -## Page 3. Real Benchmark Result & Contribution +## Page 3. Current Demo Evidence & Contribution -### Real Image Input Benchmark +### Local Studio Demo Evidence - Model: YOLOv8n -- Input Mode: image - Input Shape: `1x3x640x640` -- `compare_key`: `yolov8n__b1__h640w640__fp32` -- `input_preprocess`: `opencv_bgr_to_rgb_resize_float32_nchw` +- ONNX baseline `compare_key`: `yolov8n__b1__h640w640__fp32` +- TensorRT candidate `compare_key`: `yolov8n__b1__h640w640__fp16` +- TensorRT power mode: `25W` -| Backend | Input Mode | Mean ms | P99 ms | FPS | Status | -|---|---|---:|---:|---:|---| -| TensorRT Jetson | image | 9.9375 | 15.5231 | 100.6293 | success | -| ONNX Runtime CPU | image | 45.4299 | 49.2128 | 22.0119 | success | +| Backend | Precision | Power Mode | Mean ms | P99 ms | FPS | Status | +|---|---|---|---:|---:|---:|---| +| TensorRT Jetson | FP16 | 25W | 10.066401 | 15.548438 | 99.340373 | success | +| ONNX Runtime CPU | FP32 | n/a | 45.4299 | 49.2128 | 22.0119 | success | -TensorRT Jetson was 4.6x faster than ONNX Runtime CPU in this real image input benchmark. +TensorRT Jetson FP16 25W was about 4.51x faster than ONNX Runtime CPU FP32 in the current Local Studio demo evidence. Runtime latency is measured as end-to-end wall-clock latency and should not be directly compared with trtexec GPU-only latency. +The historical real-image input benchmark remains documented separately in `runtime_compare_yolov8n.md`. ### Technical Contribution diff --git a/docs/portfolio/inferedge_pipeline_status.md b/docs/portfolio/inferedge_pipeline_status.md index 6a5745b..192a594 100644 --- a/docs/portfolio/inferedge_pipeline_status.md +++ b/docs/portfolio/inferedge_pipeline_status.md @@ -97,6 +97,7 @@ The current cross-repository loop is covered by documentation, fixtures, and smo - AIGuard worker provenance mismatch diagnosis - Lab deployment decision/report evidence smoke for AIGuard worker provenance diagnosis - Local Studio local-first workflow UI for viewing Forge -> Runtime -> Lab -> optional AIGuard state, creating in-memory analyze jobs, importing Runtime result JSON, replaying bundled demo evidence, comparing backends, and inspecting Lab-owned deployment decision context +- Local Studio portfolio demo evidence for ONNX Runtime CPU, TensorRT Jetson FP16 25W, Jetson FP16 15W power-mode evidence, and AIGuard diagnosis cases - YOLOv8 COCO subset evaluation report generated from 10 local images and 89 converted COCO-style person annotations, with metric backend `simplified`, mAP@50 0.1410, precision 0.2941, recall 0.1685, and structural validation passed - Validation problem case fixtures for annotation-missing review, invalid detection structure blocking, and contract shape mismatch blocking @@ -105,7 +106,10 @@ This means the current product boundary is testable without running the producti InferEdge now has two runtime execution evidence paths: 1. macOS ONNX Runtime CPU smoke through Lab's dev-only Runtime execution path using `yolov8n.onnx`. The smoke created Lab job `job_9e2321179256`, called the C++ Runtime CLI through Lab's subprocess path, executed ONNX Runtime on CPU with FP32, and ingested the resulting JSON back into the Lab job result. Runtime reported input shape `[1, 3, 640, 640]`, output shape `[1, 84, 8400]`, `warmup=1`, `runs=5`, benchmark status success, mean latency about 47.97 ms, p50 about 46.95 ms, p95/p99 about 51.80 ms, and about 20.85 FPS. The resulting `deployment_decision` was `unknown`, which is expected for direct Runtime execution before Lab compare/report. -2. Jetson Orin Nano TensorRT smoke using a Forge-generated manifest and TensorRT engine artifact executed by the C++ Runtime CLI. The manual Jetson smoke ran on Linux `5.15.148-tegra` / `aarch64` from `~/InferEdge-Runtime`, using Forge manifest `/home/risenano01/InferEdgeForge/builds/yolov8n__jetson__tensorrt__jetson_fp16/manifest.json` and artifact `/home/risenano01/InferEdgeForge/builds/yolov8n__jetson__tensorrt__jetson_fp16/model.engine`. The result JSON was `results/jetson/yolov8n_jetson_tensorrt_manifest_smoke.json` and reported `success: true`, `status: success`, `engine_backend: tensorrt`, `device_name: jetson`, `manifest_applied: true`, input shape `[1, 3, 640, 640]`, output shape `[1, 84, 8400]`, mean latency about 14.00 ms, p99 about 15.50 ms, and about 71.44 FPS. +2. Jetson Orin Nano TensorRT smoke using a Forge-generated manifest and TensorRT engine artifact executed by the C++ Runtime CLI. The current Jetson Evidence Track records TensorRT FP16 short-smoke results with tegrastats summaries for both 25W and 15W power modes: + - 25W result: `results/jetson_evidence/yolov8n_trt_fp16_25w_20260504T170039Z.json`, mean `10.066401 ms`, p95 `15.476641 ms`, p99 `15.548438 ms`, FPS `99.340373`. + - 15W result: `results/jetson_evidence/yolov8n_trt_fp16_15w_20260504T171959Z.json`, mean `10.799106 ms`, p95 `15.438690 ms`, p99 `15.529218 ms`, FPS `92.600262`. + - The 15W vs 25W comparison is treated as system evidence because power mode changes the run configuration; it is not interpreted as same-condition model regression. Compare-key polish status: this limitation has been resolved in InferEdgeRuntime #37. When a Forge manifest is applied, Runtime now prefers `manifest.source_model.path` for compare naming, so a TensorRT artifact path such as `model.engine` can still produce `compare_model_name=yolov8n` and `compare_key=yolov8n__b1__h640w640__fp32`. This improves provenance and compare-readiness; it does not add production SaaS worker infrastructure. @@ -131,10 +135,11 @@ This does not mean production SaaS is complete. - Runtime compare-key identity polish for manifest-backed engine artifacts - Guided end-to-end demo entrypoint for portfolio and interview walkthroughs - Local Studio at `/studio` for a local-first browser view of Run / Import / Demo Evidence / Compare / Decision / Jetson Helper workflows +- Jetson Evidence Track short-smoke fixtures with TensorRT FP16 25W and 15W power-mode context, tegrastats summaries, and Lab-compatible Runtime JSON import - Contract/preset validation demo with `yolov8_coco`, COCO annotation loading, `--metric-backend simplified` by default, optional `pycocotools` backend contract, structural validation, and JSON/Markdown/HTML report fixtures - Problem-case validation reports that make skipped accuracy, invalid output structure, contract mismatch, and latency regression visible in Local Studio - Cross-repo fixture compatibility across Forge, Runtime, Lab, and AIGuard -- Rule/evidence based provenance mismatch diagnosis +- Rule/evidence based AIGuard diagnosis, including normal/pass, bbox collapse/blocked, score saturation/blocked, temporal instability/review_required, and provenance mismatch cases ### Planned Later diff --git a/docs/portfolio/inferedge_portfolio_submission.md b/docs/portfolio/inferedge_portfolio_submission.md index e6aae4d..1029373 100644 --- a/docs/portfolio/inferedge_portfolio_submission.md +++ b/docs/portfolio/inferedge_portfolio_submission.md @@ -108,10 +108,10 @@ Recent validation evidence: - GitHub Actions: Lab Benchmarks success, Runtime CI success - Lab PR #171 기준 1-page architecture summary 문서화 완료 - Lab -> Runtime manual smoke using `yolov8n.onnx`: `/api/analyze` created job `job_9e2321179256`, Lab invoked the C++ Runtime CLI through the dev-only subprocess path, ONNX Runtime executed the model successfully, and the latency/provenance JSON was ingested back into the Lab job result. The smoke reported ONNX Runtime backend available, benchmark status success, mean latency about 47.97 ms, p50 about 46.95 ms, p95/p99 about 51.80 ms, and about 20.85 FPS. -- Jetson TensorRT Runtime smoke: on Jetson Orin Nano (`Linux 5.15.148-tegra`, `aarch64`), the C++ Runtime CLI in `~/InferEdge-Runtime` executed Forge manifest `/home/risenano01/InferEdgeForge/builds/yolov8n__jetson__tensorrt__jetson_fp16/manifest.json` and TensorRT engine artifact `/home/risenano01/InferEdgeForge/builds/yolov8n__jetson__tensorrt__jetson_fp16/model.engine`. The output `results/jetson/yolov8n_jetson_tensorrt_manifest_smoke.json` reported `success: true`, `engine_backend: tensorrt`, `device_name: jetson`, `manifest_applied: true`, input shape `[1, 3, 640, 640]`, output shape `[1, 84, 8400]`, mean latency about 14.00 ms, p99 about 15.50 ms, and about 71.44 FPS. +- Jetson TensorRT Runtime smoke: on Jetson Orin Nano (`Linux 5.15.148-tegra`, `aarch64`), the C++ Runtime CLI in `~/InferEdge-Runtime` executed Forge manifest `/home/risenano01/InferEdgeForge/builds/yolov8n__jetson__tensorrt__jetson_fp16/manifest.json` and TensorRT engine artifact `/home/risenano01/InferEdgeForge/builds/yolov8n__jetson__tensorrt__jetson_fp16/model.engine`. The current Jetson Evidence Track records TensorRT FP16 25W at mean `10.066401 ms`, p95 `15.476641 ms`, p99 `15.548438 ms`, FPS `99.340373`, and TensorRT FP16 15W at mean `10.799106 ms`, p95 `15.438690 ms`, p99 `15.529218 ms`, FPS `92.600262`. - Runtime compare-key identity polish: InferEdgeRuntime now preserves Forge manifest source model identity for compare naming. If `manifest.source_model.path` is `models/onnx/yolov8n.onnx` and the explicit TensorRT artifact path is `model.engine`, Runtime can keep `compare_model_name=yolov8n` and `compare_key=yolov8n__b1__h640w640__fp32`. - Guided demo entrypoint: `scripts/demo_pipeline_full.sh` summarizes the full Forge -> Runtime -> Lab -> optional AIGuard flow and can print the Jetson TensorRT Runtime command without claiming production worker or SaaS readiness. -- Local Studio demo evidence: `/studio` can load bundled ONNX Runtime CPU and TensorRT Jetson Runtime result fixtures from `examples/studio_demo`, keep the demo pair selectable in Recent jobs while the local server process is alive, and show TensorRT Jetson vs ONNX Runtime CPU comparison in the browser. The fixture-backed evidence records ONNX Runtime CPU at mean 45.4299 ms / p99 49.2128 ms / 22.0119 FPS and TensorRT Jetson at mean 9.9375 ms / p99 15.5231 ms / 100.6293 FPS, a 4.57x TensorRT speedup for this demo pair. +- Local Studio demo evidence: `/studio` can load bundled ONNX Runtime CPU and TensorRT Jetson Runtime result fixtures from `examples/studio_demo`, keep the demo pair selectable in Recent jobs while the local server process is alive, and show TensorRT Jetson vs ONNX Runtime CPU comparison in the browser. The fixture-backed evidence records ONNX Runtime CPU FP32 at mean `45.4299 ms` / p99 `49.2128 ms` / `22.0119 FPS` and TensorRT Jetson FP16 25W at mean `10.066401 ms` / p99 `15.548438 ms` / `99.340373 FPS`, about a `4.51x` TensorRT speedup for this demo pair. - YOLOv8 COCO subset evaluation: a 10-image local person-detection subset with 89 ground-truth boxes is converted into a COCO-style annotation fixture and evaluated through the `yolov8_coco` preset. The generated report records metric backend `simplified`, mAP@50 0.1410, precision 0.2941, recall 0.1685, and structural validation passed. This is documented as subset workflow evidence, not a full COCO benchmark claim. `pycocotools` remains an optional explicit backend. - Validation problem cases: the demo bundle includes annotation-missing, invalid detection structure, contract shape mismatch, and latency regression reports. These show that InferEdge records review/block evidence explicitly instead of presenting every validation path as successful. @@ -166,7 +166,7 @@ Next practical step: - Final resume/interview wording is available in [InferEdge Resume and Interview Summary](inferedge_resume_interview_summary.md), including role-specific versions for AI Inference Engineer, Embedded/Edge Engineer, and Backend/AI Platform roles. - "제가 만든 것은 단순 벤치마크 스크립트가 아니라, edge deployment artifact의 출처와 실행 결과를 연결해 배포 가능 여부까지 판단하는 검증 파이프라인입니다." - "Forge, Runtime, Lab, AIGuard를 각각 build/provenance, C++ execution/result export, analysis/API/decision, rule/evidence diagnosis layer로 나눴습니다." -- "macOS ONNX Runtime CPU smoke와 Jetson Orin Nano TensorRT smoke를 모두 확보했고, Jetson에서는 Forge manifest + TensorRT `model.engine` + C++ Runtime CLI 실행으로 mean 약 14.00 ms, p99 약 15.50 ms, FPS 약 71.44 evidence를 확보했습니다." +- "macOS ONNX Runtime CPU smoke와 Jetson Orin Nano TensorRT smoke를 모두 확보했고, Jetson에서는 Forge manifest + TensorRT `model.engine` + C++ Runtime CLI 실행으로 FP16 25W mean 10.066401 ms, p99 15.548438 ms, FPS 99.340373 evidence와 15W power-mode evidence를 확보했습니다." - "Runtime source identity polish 이후에는 manifest-backed TensorRT engine artifact도 `compare_model_name=yolov8n`, `compare_key=yolov8n__b1__h640w640__fp32`를 유지할 수 있습니다." - "AIGuard는 LLM 추측이 아니라 artifact hash, source hash, precision, shape 같은 evidence를 비교하는 deterministic detector 구조입니다." - "아직 production worker, DB/Redis/queue, production frontend, auth/billing은 계획 단계로 명확히 구분했고, 먼저 contract와 smoke coverage를 안정화했습니다." diff --git a/docs/portfolio/inferedge_resume_interview_summary.md b/docs/portfolio/inferedge_resume_interview_summary.md index 1e6a954..fe85a07 100644 --- a/docs/portfolio/inferedge_resume_interview_summary.md +++ b/docs/portfolio/inferedge_resume_interview_summary.md @@ -4,19 +4,19 @@ - Built InferEdge, an end-to-end Edge AI inference validation pipeline that connects Forge build provenance, C++ Runtime execution, Lab comparison/report/API/job workflows, optional AIGuard diagnosis evidence, and Lab-owned deployment decisions. - Validated real execution paths on both macOS and edge hardware: `yolov8n.onnx` through Lab -> C++ Runtime -> ONNX Runtime CPU -> Lab job result ingestion, and Jetson Orin Nano TensorRT execution through Forge manifest + `model.engine` + C++ Runtime CLI. -- Documented Jetson TensorRT smoke evidence with mean latency about 14.00 ms, p99 about 15.50 ms, and about 71.44 FPS on a Forge-generated TensorRT engine artifact. -- Added Local Studio as a local-first browser workflow UI that can replay bundled ONNX Runtime CPU and TensorRT Jetson demo evidence, showing 45.4299 ms vs 9.9375 ms mean latency and a 4.57x TensorRT speedup without claiming production SaaS readiness. +- Documented Jetson TensorRT FP16 evidence with 25W mean `10.066401 ms`, p99 `15.548438 ms`, FPS `99.340373`, plus 15W power-mode comparison evidence. +- Added Local Studio as a local-first browser workflow UI that can replay bundled ONNX Runtime CPU and TensorRT Jetson demo evidence, showing 45.4299 ms vs 10.066401 ms mean latency and about a 4.51x TensorRT speedup without claiming production SaaS readiness. - Polished Runtime provenance readiness so manifest-backed TensorRT artifacts preserve source identity: `model.engine` can keep `compare_model_name=yolov8n` and `compare_key=yolov8n__b1__h640w640__fp32`. ## Role-Specific Resume Versions ### AI Inference Engineer -Built an end-to-end Edge AI inference validation pipeline across Forge, Runtime, Lab, and AIGuard. The system validates not only latency, but also artifact provenance, runtime result compatibility, comparison readiness, and deployment decision evidence. I verified `yolov8n.onnx` through ONNX Runtime CPU on macOS and a Forge-generated TensorRT `model.engine` on Jetson Orin Nano, with Jetson smoke evidence around 14.00 ms mean latency, 15.50 ms p99, and 71.44 FPS. Runtime now preserves Forge manifest source identity for compare keys, reducing ambiguity when TensorRT artifacts are executed as engine files. +Built an end-to-end Edge AI inference validation pipeline across Forge, Runtime, Lab, and AIGuard. The system validates not only latency, but also artifact provenance, runtime result compatibility, comparison readiness, and deployment decision evidence. I verified `yolov8n.onnx` through ONNX Runtime CPU on macOS and a Forge-generated TensorRT `model.engine` on Jetson Orin Nano, with current Jetson FP16 25W evidence at 10.066401 ms mean latency, 15.548438 ms p99, and 99.340373 FPS. Runtime now preserves Forge manifest source identity for compare keys, reducing ambiguity when TensorRT artifacts are executed as engine files. ### Embedded / Edge Engineer -Built a multi-repository edge inference validation workflow that connects model build artifacts to real device execution evidence. InferEdgeRuntime provides a C++ execution/result export boundary, and I validated a Jetson Orin Nano TensorRT smoke using a Forge manifest plus generated `model.engine` artifact. The run completed successfully through the C++ Runtime CLI with TensorRT backend, Jetson device target, manifest applied, mean latency about 14.00 ms, p99 about 15.50 ms, and about 71.44 FPS. This is manual/dev smoke evidence, while production worker orchestration remains future work. +Built a multi-repository edge inference validation workflow that connects model build artifacts to real device execution evidence. InferEdgeRuntime provides a C++ execution/result export boundary, and I validated a Jetson Orin Nano TensorRT smoke using a Forge manifest plus generated `model.engine` artifact. The current evidence records TensorRT FP16 25W at 10.066401 ms mean latency, 15.548438 ms p99, and 99.340373 FPS, with an additional 15W power-mode run for deployment context. This is manual/dev smoke evidence, while production worker orchestration remains future work. ### Backend / AI Platform @@ -24,11 +24,11 @@ Built the Lab-side orchestration and contract foundation for an edge AI validati ## Interview: First 30 Seconds -InferEdge는 단순 benchmark tool이 아니라 edge AI 모델의 build provenance, 실제 Runtime execution, 비교/report, optional diagnosis evidence, deployment decision을 연결하는 end-to-end validation pipeline입니다. 저는 macOS에서 `yolov8n.onnx`를 Lab -> C++ Runtime -> ONNX Runtime CPU -> Lab job result로 검증했고, Jetson Orin Nano에서는 Forge manifest와 TensorRT `model.engine`를 C++ Runtime CLI로 실행해 mean 약 14.00 ms, p99 약 15.50 ms, FPS 약 71.44의 smoke evidence를 확보했습니다. 최근에는 Runtime이 manifest source model identity를 보존하도록 보완해, engine artifact도 `compare_key=yolov8n__b1__h640w640__fp32` 형태를 유지할 수 있게 했습니다. +InferEdge는 단순 benchmark tool이 아니라 edge AI 모델의 build provenance, 실제 Runtime execution, 비교/report, optional diagnosis evidence, deployment decision을 연결하는 end-to-end validation pipeline입니다. 저는 macOS에서 `yolov8n.onnx`를 Lab -> C++ Runtime -> ONNX Runtime CPU -> Lab job result로 검증했고, Jetson Orin Nano에서는 Forge manifest와 TensorRT `model.engine`를 C++ Runtime CLI로 실행해 FP16 25W mean 10.066401 ms, p99 15.548438 ms, FPS 99.340373의 evidence를 확보했습니다. 최근에는 Runtime이 manifest source model identity를 보존하도록 보완해, engine artifact도 source model 기반 `compare_key`를 유지할 수 있게 했습니다. ## Interview: What Actually Works? -현재 실제로 동작하는 범위는 세 단계로 설명할 수 있습니다. 첫째, Lab은 Runtime result를 compare/report/API/job/deployment_decision 형태로 정리할 수 있습니다. 둘째, dev-only 경로에서 Lab이 C++ Runtime CLI를 subprocess로 호출해 `yolov8n.onnx` ONNX Runtime CPU 실행 결과를 job result로 ingest하는 smoke가 성공했습니다. 셋째, Jetson Orin Nano에서 Forge manifest와 TensorRT `model.engine` artifact를 C++ Runtime CLI로 실행한 manual smoke가 성공했고, TensorRT backend, Jetson target, manifest applied, mean 약 14.00 ms, p99 약 15.50 ms, FPS 약 71.44 evidence를 확보했습니다. 다만 production worker daemon이나 queue 기반 자동 실행은 아직 구현 범위가 아닙니다. +현재 실제로 동작하는 범위는 세 단계로 설명할 수 있습니다. 첫째, Lab은 Runtime result를 compare/report/API/job/deployment_decision 형태로 정리할 수 있습니다. 둘째, dev-only 경로에서 Lab이 C++ Runtime CLI를 subprocess로 호출해 `yolov8n.onnx` ONNX Runtime CPU 실행 결과를 job result로 ingest하는 smoke가 성공했습니다. 셋째, Jetson Orin Nano에서 Forge manifest와 TensorRT `model.engine` artifact를 C++ Runtime CLI로 실행한 manual smoke가 성공했고, TensorRT FP16 25W/15W power-mode evidence와 tegrastats summary를 확보했습니다. 다만 production worker daemon이나 queue 기반 자동 실행은 아직 구현 범위가 아닙니다. ## Interview: Is The SaaS Complete? @@ -50,7 +50,7 @@ InferEdgeForge owns build artifact provenance. It records metadata and manifests The Lab side includes `/api/compare`, `/api/analyze`, in-memory job stubs, worker request/response mapping, API response contracts, deployment decision bundles, and report evidence preservation. A recent manual smoke validated a real dev-only Runtime execution path using `yolov8n.onnx`: Lab created an analyze job, invoked the C++ Runtime CLI through subprocess, ONNX Runtime CPU executed the model, and the result JSON was ingested back into the Lab job result. The smoke completed successfully with mean latency about 47.97 ms, p95/p99 about 51.80 ms, and about 20.85 FPS. -I also validated a Jetson Orin Nano TensorRT Runtime smoke. On Linux `5.15.148-tegra` / `aarch64`, the C++ Runtime CLI in `~/InferEdge-Runtime` executed a Forge-generated manifest and TensorRT engine artifact from `yolov8n__jetson__tensorrt__jetson_fp16`. The result reported `success: true`, `engine_backend: tensorrt`, `device_name: jetson`, `manifest_applied: true`, input shape `[1, 3, 640, 640]`, output shape `[1, 84, 8400]`, mean latency about 14.00 ms, p99 about 15.50 ms, and about 71.44 FPS. Runtime also preserves the Forge manifest source model identity for compare naming, so a `model.engine` artifact can keep `compare_model_name=yolov8n` and `compare_key=yolov8n__b1__h640w640__fp32`. +I also validated a Jetson Orin Nano TensorRT Runtime smoke. On Linux `5.15.148-tegra` / `aarch64`, the C++ Runtime CLI in `~/InferEdge-Runtime` executed a Forge-generated manifest and TensorRT engine artifact from `yolov8n__jetson__tensorrt__jetson_fp16`. The current Jetson Evidence Track records FP16 25W at mean `10.066401 ms`, p99 `15.548438 ms`, FPS `99.340373`, and FP16 15W at mean `10.799106 ms`, p99 `15.529218 ms`, FPS `92.600262`. Runtime also preserves the Forge manifest source model identity for compare naming, so a `model.engine` artifact can keep `compare_model_name=yolov8n` and source-model-based `compare_key`. The project intentionally separates implemented portfolio-grade pipeline foundation from future production SaaS infrastructure. The current implementation demonstrates contracts, smoke coverage, and a dev-only execution path, while production worker daemons, persistent queues/databases, file upload, frontend, auth, and billing are explicitly planned future work. @@ -66,7 +66,7 @@ I split the system into four repositories with clear responsibilities. InferEdge The important recent validation is that this is no longer only contract-level documentation. I ran a manual dev-only smoke using `yolov8n.onnx`: `/api/analyze` created a Lab job, `/api/jobs/{job_id}/run-runtime-dev` invoked the C++ Runtime CLI through subprocess, ONNX Runtime CPU executed the model, and the Runtime JSON was ingested back into the Lab job result. The result completed successfully, with mean latency about 47.97 ms, p95/p99 about 51.80 ms, and about 20.85 FPS. The deployment decision is `unknown` at that direct execution stage because the result has not yet gone through Lab compare/report, which is expected behavior. -Separately, I validated Jetson TensorRT execution on Jetson Orin Nano. Runtime consumed a Forge manifest and the generated `model.engine`, applied the manifest, executed with `engine_backend: tensorrt` and `device_name: jetson`, and exported a successful result with mean latency about 14.00 ms, p99 about 15.50 ms, and about 71.44 FPS. The earlier compare naming limitation from explicit `model.engine` paths has been polished: Runtime now prefers Forge manifest `source_model.path`, so source identity such as `yolov8n` can survive into `compare_key`. +Separately, I validated Jetson TensorRT execution on Jetson Orin Nano. Runtime consumed a Forge manifest and the generated `model.engine`, applied the manifest, executed with `engine_backend: tensorrt` and `device_name: jetson`, and exported successful 25W and 15W FP16 evidence. The 25W result records 10.066401 ms mean, 15.548438 ms p99, and 99.340373 FPS. The earlier compare naming limitation from explicit `model.engine` paths has been polished: Runtime now prefers Forge manifest `source_model.path`, so source identity such as `yolov8n` can survive into `compare_key`. I am careful not to claim this as a production SaaS platform yet. The production worker daemon, persistent queue/database, file upload flow, frontend, auth, and billing remain future work. What is implemented is the pipeline foundation: schemas, contracts, CLI/API/job boundaries, evidence preservation, and a minimal real Runtime execution path. diff --git a/docs/portfolio/runtime_compare_yolov8n.md b/docs/portfolio/runtime_compare_yolov8n.md index 60a8e2f..f5bafdd 100644 --- a/docs/portfolio/runtime_compare_yolov8n.md +++ b/docs/portfolio/runtime_compare_yolov8n.md @@ -34,9 +34,23 @@ The comparison group used here is: - Speedup ratio: `3.4x` - ONNX Runtime is 3.4x slower than TensorRT. +## Current Local Studio Demo Evidence + +The current Local Studio `Load Demo Evidence` flow uses bundled Runtime result fixtures so reviewers can reproduce the browser comparison without requiring a live Jetson session. +This fixture pair intentionally records ONNX Runtime CPU FP32 as the baseline and TensorRT Jetson FP16 25W as the candidate. + +| Backend | Precision | Power Mode | Mean ms | P95 ms | P99 ms | FPS | Compare Key | +|---|---|---|---:|---:|---:|---:|---| +| onnxruntime__cpu | FP32 | n/a | 45.4299 | n/a | 49.2128 | 22.0119 | `yolov8n__b1__h640w640__fp32` | +| tensorrt__jetson | FP16 | 25W | 10.066401 | 15.476641 | 15.548438 | 99.340373 | `yolov8n__b1__h640w640__fp16` | + +- TensorRT Jetson FP16 25W is about `4.51x` faster than the ONNX Runtime CPU FP32 baseline in the current Studio demo evidence. +- The pair is cross-precision and cross-device evidence, so it is useful for deployment review but should not be described as same-condition regression testing. +- A second TensorRT Jetson FP16 15W fixture records mean `10.799106 ms`, p95 `15.438690 ms`, p99 `15.529218 ms`, and FPS `92.600262` for power-mode evidence. + ## Real Image Input Validation -The following validation is based on a real JPEG image input, not dummy input. +The following historical validation is based on a real JPEG image input, not dummy input. InferEdgeRuntime loaded the image with OpenCV, preprocessed it, and then benchmarked the Runtime backend path end to end. The image preprocessing path was: @@ -65,6 +79,9 @@ Both backend results used the same `compare_key`, so InferEdgeLab grouped them t - Speedup ratio: `4.6x` - ONNX Runtime is 4.6x slower than TensorRT. +This historical real-image result is kept as evidence that the Runtime path can benchmark image input. +The current Local Studio demo evidence above is the active browser demo fixture and includes explicit FP16/power-mode context. + ### Result Reproducibility Note Raw Runtime JSON artifacts are not committed to keep the repository clean.