gwonxhj · hyeokjun32 · May 1, 2026 · May 1, 2026
diff --git a/README.md b/README.md
@@ -83,6 +83,7 @@ YOLOv8n was validated with a real OpenCV image-input benchmark: InferEdgeRuntime
 TensorRT Jetson was 4.6x faster than ONNX Runtime CPU in this real image input benchmark.
 The benchmark uses end-to-end Runtime latency, not trtexec GPU-only latency.
 The full pipeline portfolio summary is available at [docs/portfolio/inferedge_pipeline_portfolio.md](docs/portfolio/inferedge_pipeline_portfolio.md), and the detailed Runtime comparison report is available at [docs/portfolio/runtime_compare_yolov8n.md](docs/portfolio/runtime_compare_yolov8n.md).
+The YOLOv8 COCO subset accuracy demo is documented in [docs/portfolio/yolov8_coco_subset_evaluation.md](docs/portfolio/yolov8_coco_subset_evaluation.md).
 
 ## Local Studio Demo Evidence
 
@@ -100,6 +101,7 @@ Verified demo fixture values:
 
 Studio reports this as a `4.57x` TensorRT speedup for the bundled demo pair.
 AIGuard remains optional in this local Studio path; if Guard evidence is not loaded, the deployment decision explains that the Lab comparison is available but diagnosis evidence is not provided.
+The same demo flow also surfaces a small `yolov8_coco` evaluation report summary: 10 images, 89 ground-truth boxes, mAP@50 `0.1410`, precision `0.2941`, recall `0.1685`, structural validation `passed`.
 
 ---
 

diff --git a/docs/portfolio/inferedge_pipeline_status.md b/docs/portfolio/inferedge_pipeline_status.md
@@ -96,6 +96,7 @@ The current cross-repository loop is covered by documentation, fixtures, and smo
 - AIGuard worker provenance mismatch diagnosis
 - Lab deployment decision/report evidence smoke for AIGuard worker provenance diagnosis
 - Local Studio local-first workflow UI for viewing Forge -> Runtime -> Lab -> optional AIGuard state, creating in-memory analyze jobs, importing Runtime result JSON, replaying bundled demo evidence, comparing backends, and inspecting Lab-owned deployment decision context
+- YOLOv8 COCO subset evaluation report generated from 10 local images and 89 converted COCO-style person annotations, with mAP@50 0.1410, precision 0.2941, recall 0.1685, and structural validation passed
 
 This means the current product boundary is testable without running the production worker infrastructure.
 
@@ -125,6 +126,7 @@ Demo readiness: `scripts/demo_pipeline_full.sh` now provides a guided end-to-end
 - Runtime compare-key identity polish for manifest-backed engine artifacts
 - Guided end-to-end demo entrypoint for portfolio and interview walkthroughs
 - Local Studio at `/studio` for a local-first browser view of Run / Import / Demo Evidence / Compare / Decision / Jetson Helper workflows
+- Contract/preset validation demo with `yolov8_coco`, COCO annotation loading, simplified accuracy metrics, structural validation, and JSON/Markdown/HTML report fixtures
 - Cross-repo fixture compatibility across Forge, Runtime, Lab, and AIGuard
 - Rule/evidence based provenance mismatch diagnosis
 

diff --git a/docs/portfolio/inferedge_portfolio_submission.md b/docs/portfolio/inferedge_portfolio_submission.md
@@ -112,6 +112,7 @@ Recent validation evidence:
 - Runtime compare-key identity polish: InferEdgeRuntime now preserves Forge manifest source model identity for compare naming. If `manifest.source_model.path` is `models/onnx/yolov8n.onnx` and the explicit TensorRT artifact path is `model.engine`, Runtime can keep `compare_model_name=yolov8n` and `compare_key=yolov8n__b1__h640w640__fp32`.
 - Guided demo entrypoint: `scripts/demo_pipeline_full.sh` summarizes the full Forge -> Runtime -> Lab -> optional AIGuard flow and can print the Jetson TensorRT Runtime command without claiming production worker or SaaS readiness.
 - Local Studio demo evidence: `/studio` can load bundled ONNX Runtime CPU and TensorRT Jetson Runtime result fixtures from `examples/studio_demo`, keep the demo pair selectable in Recent jobs while the local server process is alive, and show TensorRT Jetson vs ONNX Runtime CPU comparison in the browser. The fixture-backed evidence records ONNX Runtime CPU at mean 45.4299 ms / p99 49.2128 ms / 22.0119 FPS and TensorRT Jetson at mean 9.9375 ms / p99 15.5231 ms / 100.6293 FPS, a 4.57x TensorRT speedup for this demo pair.
+- YOLOv8 COCO subset evaluation: a 10-image local person-detection subset with 89 ground-truth boxes is converted into a COCO-style annotation fixture and evaluated through the `yolov8_coco` preset. The generated report records mAP@50 0.1410, precision 0.2941, recall 0.1685, and structural validation passed. This is documented as subset workflow evidence, not a full COCO benchmark claim.
 
 The direct Runtime execution result includes `deployment_decision`. Its `unknown` value is expected before Lab compare/report because the worker response has not yet been compared by Lab.
 

diff --git a/docs/portfolio/yolov8_coco_subset_evaluation.md b/docs/portfolio/yolov8_coco_subset_evaluation.md
@@ -0,0 +1,43 @@
+# YOLOv8 COCO Subset Evaluation Demo
+
+This document records a small local-first accuracy evaluation demo for InferEdgeLab.
+It is not a full COCO benchmark and should not be presented as production model validation.
+
+## Scope
+
+- Preset: `yolov8_coco`
+- Model: YOLOv8n ONNX Runtime CPU
+- Demo input: 10 local person-detection images
+- Annotation source: local YOLO txt labels converted into a compact COCO-style annotation fixture
+- Raw images: intentionally not committed
+- Annotation fixture: `examples/validation_demo/subset/yolov8_coco_subset_annotations.json`
+- Evaluation report: `examples/validation_demo/subset/yolov8_coco_subset_evaluation.json`
+
+## Result
+
+| Metric | Value |
+|---|---:|
+| Samples | 10 |
+| Ground-truth boxes | 89 |
+| Post-NMS detections checked | 51 |
+| mAP@50 | 0.1410 |
+| mAP@50-95 | 0.0873 |
+| Precision | 0.2941 |
+| Recall | 0.1685 |
+| F1 score | 0.2143 |
+| Structural validation | passed |
+| Contract input shape | passed |
+
+## Interpretation
+
+This demo proves that InferEdgeLab can load COCO-style annotations, run the YOLOv8 detection evaluator, compute simplified accuracy metrics, validate detection output structure, and emit JSON/Markdown/HTML reports.
+The numbers are intentionally documented as a small subset result only.
+They are useful as portfolio workflow evidence, not as a claim of full COCO accuracy.
+
+The relatively low recall is expected for this tiny local subset because the images are night beach/crowd scenes with many small person boxes.
+That is useful for the portfolio: it shows that the validation pipeline records uncomfortable evidence instead of hiding it.
+
+## Local Studio Link
+
+Local Studio's `Load Demo Evidence` flow now returns this evaluation report summary together with the existing ONNX Runtime CPU vs TensorRT Jetson latency pair.
+The Studio path remains local-first and does not upload raw images or add database, queue, auth, or production SaaS features.