diff --git a/README.md b/README.md index de7ca9d..091da17 100644 --- a/README.md +++ b/README.md @@ -83,6 +83,7 @@ YOLOv8n was validated with a real OpenCV image-input benchmark: InferEdgeRuntime TensorRT Jetson was 4.6x faster than ONNX Runtime CPU in this real image input benchmark. The benchmark uses end-to-end Runtime latency, not trtexec GPU-only latency. The full pipeline portfolio summary is available at [docs/portfolio/inferedge_pipeline_portfolio.md](docs/portfolio/inferedge_pipeline_portfolio.md), and the detailed Runtime comparison report is available at [docs/portfolio/runtime_compare_yolov8n.md](docs/portfolio/runtime_compare_yolov8n.md). +The YOLOv8 COCO subset accuracy demo is documented in [docs/portfolio/yolov8_coco_subset_evaluation.md](docs/portfolio/yolov8_coco_subset_evaluation.md). ## Local Studio Demo Evidence @@ -100,6 +101,7 @@ Verified demo fixture values: Studio reports this as a `4.57x` TensorRT speedup for the bundled demo pair. AIGuard remains optional in this local Studio path; if Guard evidence is not loaded, the deployment decision explains that the Lab comparison is available but diagnosis evidence is not provided. +The same demo flow also surfaces a small `yolov8_coco` evaluation report summary: 10 images, 89 ground-truth boxes, mAP@50 `0.1410`, precision `0.2941`, recall `0.1685`, structural validation `passed`. --- diff --git a/docs/portfolio/inferedge_pipeline_status.md b/docs/portfolio/inferedge_pipeline_status.md index 077eb6a..05fa078 100644 --- a/docs/portfolio/inferedge_pipeline_status.md +++ b/docs/portfolio/inferedge_pipeline_status.md @@ -96,6 +96,7 @@ The current cross-repository loop is covered by documentation, fixtures, and smo - AIGuard worker provenance mismatch diagnosis - Lab deployment decision/report evidence smoke for AIGuard worker provenance diagnosis - Local Studio local-first workflow UI for viewing Forge -> Runtime -> Lab -> optional AIGuard state, creating in-memory analyze jobs, importing Runtime result JSON, replaying bundled demo evidence, comparing backends, and inspecting Lab-owned deployment decision context +- YOLOv8 COCO subset evaluation report generated from 10 local images and 89 converted COCO-style person annotations, with mAP@50 0.1410, precision 0.2941, recall 0.1685, and structural validation passed This means the current product boundary is testable without running the production worker infrastructure. @@ -125,6 +126,7 @@ Demo readiness: `scripts/demo_pipeline_full.sh` now provides a guided end-to-end - Runtime compare-key identity polish for manifest-backed engine artifacts - Guided end-to-end demo entrypoint for portfolio and interview walkthroughs - Local Studio at `/studio` for a local-first browser view of Run / Import / Demo Evidence / Compare / Decision / Jetson Helper workflows +- Contract/preset validation demo with `yolov8_coco`, COCO annotation loading, simplified accuracy metrics, structural validation, and JSON/Markdown/HTML report fixtures - Cross-repo fixture compatibility across Forge, Runtime, Lab, and AIGuard - Rule/evidence based provenance mismatch diagnosis diff --git a/docs/portfolio/inferedge_portfolio_submission.md b/docs/portfolio/inferedge_portfolio_submission.md index 3d29176..08c7563 100644 --- a/docs/portfolio/inferedge_portfolio_submission.md +++ b/docs/portfolio/inferedge_portfolio_submission.md @@ -112,6 +112,7 @@ Recent validation evidence: - Runtime compare-key identity polish: InferEdgeRuntime now preserves Forge manifest source model identity for compare naming. If `manifest.source_model.path` is `models/onnx/yolov8n.onnx` and the explicit TensorRT artifact path is `model.engine`, Runtime can keep `compare_model_name=yolov8n` and `compare_key=yolov8n__b1__h640w640__fp32`. - Guided demo entrypoint: `scripts/demo_pipeline_full.sh` summarizes the full Forge -> Runtime -> Lab -> optional AIGuard flow and can print the Jetson TensorRT Runtime command without claiming production worker or SaaS readiness. - Local Studio demo evidence: `/studio` can load bundled ONNX Runtime CPU and TensorRT Jetson Runtime result fixtures from `examples/studio_demo`, keep the demo pair selectable in Recent jobs while the local server process is alive, and show TensorRT Jetson vs ONNX Runtime CPU comparison in the browser. The fixture-backed evidence records ONNX Runtime CPU at mean 45.4299 ms / p99 49.2128 ms / 22.0119 FPS and TensorRT Jetson at mean 9.9375 ms / p99 15.5231 ms / 100.6293 FPS, a 4.57x TensorRT speedup for this demo pair. +- YOLOv8 COCO subset evaluation: a 10-image local person-detection subset with 89 ground-truth boxes is converted into a COCO-style annotation fixture and evaluated through the `yolov8_coco` preset. The generated report records mAP@50 0.1410, precision 0.2941, recall 0.1685, and structural validation passed. This is documented as subset workflow evidence, not a full COCO benchmark claim. The direct Runtime execution result includes `deployment_decision`. Its `unknown` value is expected before Lab compare/report because the worker response has not yet been compared by Lab. diff --git a/docs/portfolio/yolov8_coco_subset_evaluation.md b/docs/portfolio/yolov8_coco_subset_evaluation.md new file mode 100644 index 0000000..833a052 --- /dev/null +++ b/docs/portfolio/yolov8_coco_subset_evaluation.md @@ -0,0 +1,43 @@ +# YOLOv8 COCO Subset Evaluation Demo + +This document records a small local-first accuracy evaluation demo for InferEdgeLab. +It is not a full COCO benchmark and should not be presented as production model validation. + +## Scope + +- Preset: `yolov8_coco` +- Model: YOLOv8n ONNX Runtime CPU +- Demo input: 10 local person-detection images +- Annotation source: local YOLO txt labels converted into a compact COCO-style annotation fixture +- Raw images: intentionally not committed +- Annotation fixture: `examples/validation_demo/subset/yolov8_coco_subset_annotations.json` +- Evaluation report: `examples/validation_demo/subset/yolov8_coco_subset_evaluation.json` + +## Result + +| Metric | Value | +|---|---:| +| Samples | 10 | +| Ground-truth boxes | 89 | +| Post-NMS detections checked | 51 | +| mAP@50 | 0.1410 | +| mAP@50-95 | 0.0873 | +| Precision | 0.2941 | +| Recall | 0.1685 | +| F1 score | 0.2143 | +| Structural validation | passed | +| Contract input shape | passed | + +## Interpretation + +This demo proves that InferEdgeLab can load COCO-style annotations, run the YOLOv8 detection evaluator, compute simplified accuracy metrics, validate detection output structure, and emit JSON/Markdown/HTML reports. +The numbers are intentionally documented as a small subset result only. +They are useful as portfolio workflow evidence, not as a claim of full COCO accuracy. + +The relatively low recall is expected for this tiny local subset because the images are night beach/crowd scenes with many small person boxes. +That is useful for the portfolio: it shows that the validation pipeline records uncomfortable evidence instead of hiding it. + +## Local Studio Link + +Local Studio's `Load Demo Evidence` flow now returns this evaluation report summary together with the existing ONNX Runtime CPU vs TensorRT Jetson latency pair. +The Studio path remains local-first and does not upload raw images or add database, queue, auth, or production SaaS features. diff --git a/examples/validation_demo/subset/yolov8_coco_subset_annotations.json b/examples/validation_demo/subset/yolov8_coco_subset_annotations.json new file mode 100644 index 0000000..a7a6b1d --- /dev/null +++ b/examples/validation_demo/subset/yolov8_coco_subset_annotations.json @@ -0,0 +1,1234 @@ +{ + "info": { + "description": "InferEdge YOLOv8 COCO subset demo annotations converted from local YOLO txt labels.", + "image_count": 10, + "annotation_count": 89 + }, + "images": [ + { + "id": 1, + "file_name": "human_0.jpg", + "width": 1920, + "height": 1080 + }, + { + "id": 2, + "file_name": "human_1.jpg", + "width": 1920, + "height": 1080 + }, + { + "id": 3, + "file_name": "human_100.jpg", + "width": 1920, + "height": 1080 + }, + { + "id": 4, + "file_name": "human_1000.jpg", + "width": 1920, + "height": 1080 + }, + { + "id": 5, + "file_name": "human_10001.jpg", + "width": 1920, + "height": 1080 + }, + { + "id": 6, + "file_name": "human_10002.jpg", + "width": 1920, + "height": 1080 + }, + { + "id": 7, + "file_name": "human_1001.jpg", + "width": 1920, + "height": 1080 + }, + { + "id": 8, + "file_name": "human_10013.jpg", + "width": 1920, + "height": 1080 + }, + { + "id": 9, + "file_name": "human_10014.jpg", + "width": 1920, + "height": 1080 + }, + { + "id": 10, + "file_name": "human_10015.jpg", + "width": 1920, + "height": 1080 + } + ], + "categories": [ + { + "id": 1, + "name": "person" + } + ], + "annotations": [ + { + "id": 1, + "image_id": 1, + "category_id": 1, + "bbox": [ + 110.421, + 948.511, + 49.789, + 123.83 + ], + "area": 6165.404, + "iscrowd": 0 + }, + { + "id": 2, + "image_id": 1, + "category_id": 1, + "bbox": [ + 54.94, + 947.001, + 53.37, + 127.149 + ], + "area": 6785.998, + "iscrowd": 0 + }, + { + "id": 3, + "image_id": 1, + "category_id": 1, + "bbox": [ + 0.0, + 761.9, + 19.43, + 38.52 + ], + "area": 748.466, + "iscrowd": 0 + }, + { + "id": 4, + "image_id": 1, + "category_id": 1, + "bbox": [ + 271.28, + 846.38, + 56.17, + 100.85 + ], + "area": 5664.727, + "iscrowd": 0 + }, + { + "id": 5, + "image_id": 2, + "category_id": 1, + "bbox": [ + 0.0, + 762.821, + 19.219, + 37.509 + ], + "area": 720.902, + "iscrowd": 0 + }, + { + "id": 6, + "image_id": 2, + "category_id": 1, + "bbox": [ + 70.37, + 1023.11, + 43.0, + 54.55 + ], + "area": 2345.655, + "iscrowd": 0 + }, + { + "id": 7, + "image_id": 2, + "category_id": 1, + "bbox": [ + 1.049, + 876.711, + 18.131, + 95.82 + ], + "area": 1737.266, + "iscrowd": 0 + }, + { + "id": 8, + "image_id": 2, + "category_id": 1, + "bbox": [ + 289.56, + 858.42, + 27.37, + 74.39 + ], + "area": 2036.035, + "iscrowd": 0 + }, + { + "id": 9, + "image_id": 2, + "category_id": 1, + "bbox": [ + 49.301, + 873.91, + 31.49, + 87.29 + ], + "area": 2748.753, + "iscrowd": 0 + }, + { + "id": 10, + "image_id": 2, + "category_id": 1, + "bbox": [ + 172.17, + 1013.9, + 49.21, + 66.1 + ], + "area": 3252.77, + "iscrowd": 0 + }, + { + "id": 11, + "image_id": 3, + "category_id": 1, + "bbox": [ + 0.001, + 799.5, + 19.74, + 74.33 + ], + "area": 1467.237, + "iscrowd": 0 + }, + { + "id": 12, + "image_id": 3, + "category_id": 1, + "bbox": [ + 180.929, + 735.1, + 13.77, + 42.58 + ], + "area": 586.338, + "iscrowd": 0 + }, + { + "id": 13, + "image_id": 3, + "category_id": 1, + "bbox": [ + 54.1, + 811.37, + 30.109, + 79.96 + ], + "area": 2407.55, + "iscrowd": 0 + }, + { + "id": 14, + "image_id": 4, + "category_id": 1, + "bbox": [ + 1853.179, + 814.399, + 27.75, + 65.06 + ], + "area": 1805.407, + "iscrowd": 0 + }, + { + "id": 15, + "image_id": 4, + "category_id": 1, + "bbox": [ + 1254.38, + 635.33, + 14.911, + 37.62 + ], + "area": 560.936, + "iscrowd": 0 + }, + { + "id": 16, + "image_id": 4, + "category_id": 1, + "bbox": [ + 1218.54, + 674.76, + 16.47, + 44.2 + ], + "area": 727.965, + "iscrowd": 0 + }, + { + "id": 17, + "image_id": 4, + "category_id": 1, + "bbox": [ + 1738.599, + 845.11, + 21.69, + 50.4 + ], + "area": 1093.196, + "iscrowd": 0 + }, + { + "id": 18, + "image_id": 4, + "category_id": 1, + "bbox": [ + 522.83, + 681.56, + 27.85, + 41.28 + ], + "area": 1149.625, + "iscrowd": 0 + }, + { + "id": 19, + "image_id": 4, + "category_id": 1, + "bbox": [ + 1051.96, + 790.08, + 25.079, + 62.6 + ], + "area": 1569.949, + "iscrowd": 0 + }, + { + "id": 20, + "image_id": 4, + "category_id": 1, + "bbox": [ + 1587.26, + 591.999, + 12.599, + 24.75 + ], + "area": 311.831, + "iscrowd": 0 + }, + { + "id": 21, + "image_id": 4, + "category_id": 1, + "bbox": [ + 183.791, + 889.16, + 28.86, + 69.4 + ], + "area": 2002.843, + "iscrowd": 0 + }, + { + "id": 22, + "image_id": 4, + "category_id": 1, + "bbox": [ + 1627.711, + 824.831, + 37.559, + 81.59 + ], + "area": 3064.43, + "iscrowd": 0 + }, + { + "id": 23, + "image_id": 4, + "category_id": 1, + "bbox": [ + 1271.601, + 853.32, + 29.779, + 56.98 + ], + "area": 1696.81, + "iscrowd": 0 + }, + { + "id": 24, + "image_id": 4, + "category_id": 1, + "bbox": [ + 686.27, + 967.28, + 23.311, + 77.7 + ], + "area": 1811.232, + "iscrowd": 0 + }, + { + "id": 25, + "image_id": 4, + "category_id": 1, + "bbox": [ + 224.659, + 610.91, + 12.91, + 19.13 + ], + "area": 246.97, + "iscrowd": 0 + }, + { + "id": 26, + "image_id": 4, + "category_id": 1, + "bbox": [ + 1029.001, + 806.48, + 21.75, + 49.31 + ], + "area": 1072.471, + "iscrowd": 0 + }, + { + "id": 27, + "image_id": 4, + "category_id": 1, + "bbox": [ + 1018.75, + 790.48, + 22.38, + 61.95 + ], + "area": 1386.409, + "iscrowd": 0 + }, + { + "id": 28, + "image_id": 4, + "category_id": 1, + "bbox": [ + 918.021, + 851.061, + 28.339, + 69.33 + ], + "area": 1964.743, + "iscrowd": 0 + }, + { + "id": 29, + "image_id": 4, + "category_id": 1, + "bbox": [ + 569.62, + 814.74, + 33.7, + 73.06 + ], + "area": 2462.105, + "iscrowd": 0 + }, + { + "id": 30, + "image_id": 4, + "category_id": 1, + "bbox": [ + 1146.679, + 846.85, + 26.621, + 41.16 + ], + "area": 1095.709, + "iscrowd": 0 + }, + { + "id": 31, + "image_id": 4, + "category_id": 1, + "bbox": [ + 553.661, + 614.941, + 13.82, + 22.77 + ], + "area": 314.68, + "iscrowd": 0 + }, + { + "id": 32, + "image_id": 4, + "category_id": 1, + "bbox": [ + 1829.37, + 815.89, + 28.12, + 61.57 + ], + "area": 1731.36, + "iscrowd": 0 + }, + { + "id": 33, + "image_id": 4, + "category_id": 1, + "bbox": [ + 1080.499, + 857.11, + 31.3, + 77.99 + ], + "area": 2441.076, + "iscrowd": 0 + }, + { + "id": 34, + "image_id": 4, + "category_id": 1, + "bbox": [ + 1665.6, + 829.93, + 31.02, + 80.47 + ], + "area": 2496.132, + "iscrowd": 0 + }, + { + "id": 35, + "image_id": 4, + "category_id": 1, + "bbox": [ + 784.781, + 837.66, + 42.159, + 75.79 + ], + "area": 3195.261, + "iscrowd": 0 + }, + { + "id": 36, + "image_id": 4, + "category_id": 1, + "bbox": [ + 1108.86, + 732.56, + 18.38, + 43.59 + ], + "area": 801.189, + "iscrowd": 0 + }, + { + "id": 37, + "image_id": 4, + "category_id": 1, + "bbox": [ + 1200.151, + 674.1, + 17.969, + 44.529 + ], + "area": 800.163, + "iscrowd": 0 + }, + { + "id": 38, + "image_id": 4, + "category_id": 1, + "bbox": [ + 774.131, + 741.51, + 22.61, + 53.57 + ], + "area": 1211.217, + "iscrowd": 0 + }, + { + "id": 39, + "image_id": 4, + "category_id": 1, + "bbox": [ + 1179.24, + 673.14, + 18.63, + 45.37 + ], + "area": 845.227, + "iscrowd": 0 + }, + { + "id": 40, + "image_id": 4, + "category_id": 1, + "bbox": [ + 744.631, + 741.26, + 21.229, + 55.22 + ], + "area": 1172.298, + "iscrowd": 0 + }, + { + "id": 41, + "image_id": 4, + "category_id": 1, + "bbox": [ + 689.451, + 863.25, + 34.51, + 74.769 + ], + "area": 2580.301, + "iscrowd": 0 + }, + { + "id": 42, + "image_id": 4, + "category_id": 1, + "bbox": [ + 1766.72, + 827.06, + 24.42, + 45.381 + ], + "area": 1108.214, + "iscrowd": 0 + }, + { + "id": 43, + "image_id": 5, + "category_id": 1, + "bbox": [ + 909.47, + 753.22, + 14.759, + 44.58 + ], + "area": 657.962, + "iscrowd": 0 + }, + { + "id": 44, + "image_id": 5, + "category_id": 1, + "bbox": [ + 1266.891, + 736.81, + 13.559, + 43.97 + ], + "area": 596.192, + "iscrowd": 0 + }, + { + "id": 45, + "image_id": 5, + "category_id": 1, + "bbox": [ + 1256.349, + 738.92, + 10.85, + 40.35 + ], + "area": 437.793, + "iscrowd": 0 + }, + { + "id": 46, + "image_id": 6, + "category_id": 1, + "bbox": [ + 907.71, + 721.24, + 23.311, + 49.21 + ], + "area": 1147.125, + "iscrowd": 0 + }, + { + "id": 47, + "image_id": 6, + "category_id": 1, + "bbox": [ + 953.04, + 721.24, + 22.011, + 44.031 + ], + "area": 969.15, + "iscrowd": 0 + }, + { + "id": 48, + "image_id": 6, + "category_id": 1, + "bbox": [ + 871.451, + 722.53, + 23.311, + 41.44 + ], + "area": 965.987, + "iscrowd": 0 + }, + { + "id": 49, + "image_id": 7, + "category_id": 1, + "bbox": [ + 878.569, + 741.95, + 29.56, + 60.11 + ], + "area": 1776.858, + "iscrowd": 0 + }, + { + "id": 50, + "image_id": 7, + "category_id": 1, + "bbox": [ + 1904.559, + 788.01, + 15.441, + 43.78 + ], + "area": 675.991, + "iscrowd": 0 + }, + { + "id": 51, + "image_id": 7, + "category_id": 1, + "bbox": [ + 790.65, + 734.419, + 21.581, + 65.83 + ], + "area": 1420.671, + "iscrowd": 0 + }, + { + "id": 52, + "image_id": 7, + "category_id": 1, + "bbox": [ + 686.659, + 860.04, + 26.001, + 56.78 + ], + "area": 1476.314, + "iscrowd": 0 + }, + { + "id": 53, + "image_id": 7, + "category_id": 1, + "bbox": [ + 1271.551, + 634.889, + 16.85, + 37.281 + ], + "area": 628.174, + "iscrowd": 0 + }, + { + "id": 54, + "image_id": 7, + "category_id": 1, + "bbox": [ + 996.131, + 825.28, + 21.27, + 56.89 + ], + "area": 1210.038, + "iscrowd": 0 + }, + { + "id": 55, + "image_id": 7, + "category_id": 1, + "bbox": [ + 1776.749, + 819.85, + 22.161, + 39.39 + ], + "area": 872.902, + "iscrowd": 0 + }, + { + "id": 56, + "image_id": 7, + "category_id": 1, + "bbox": [ + 713.74, + 851.51, + 25.58, + 55.67 + ], + "area": 1424.039, + "iscrowd": 0 + }, + { + "id": 57, + "image_id": 7, + "category_id": 1, + "bbox": [ + 1215.48, + 675.71, + 18.561, + 43.13 + ], + "area": 800.517, + "iscrowd": 0 + }, + { + "id": 58, + "image_id": 7, + "category_id": 1, + "bbox": [ + 903.66, + 764.79, + 15.949, + 37.31 + ], + "area": 595.069, + "iscrowd": 0 + }, + { + "id": 59, + "image_id": 7, + "category_id": 1, + "bbox": [ + 524.7, + 620.51, + 17.121, + 31.24 + ], + "area": 534.85, + "iscrowd": 0 + }, + { + "id": 60, + "image_id": 7, + "category_id": 1, + "bbox": [ + 505.78, + 721.53, + 19.58, + 46.59 + ], + "area": 912.242, + "iscrowd": 0 + }, + { + "id": 61, + "image_id": 7, + "category_id": 1, + "bbox": [ + 1170.839, + 674.24, + 18.62, + 45.5 + ], + "area": 847.225, + "iscrowd": 0 + }, + { + "id": 62, + "image_id": 7, + "category_id": 1, + "bbox": [ + 1199.739, + 674.99, + 16.07, + 43.55 + ], + "area": 699.865, + "iscrowd": 0 + }, + { + "id": 63, + "image_id": 7, + "category_id": 1, + "bbox": [ + 1751.79, + 826.91, + 20.469, + 46.55 + ], + "area": 952.841, + "iscrowd": 0 + }, + { + "id": 64, + "image_id": 7, + "category_id": 1, + "bbox": [ + 977.72, + 817.08, + 20.84, + 62.96 + ], + "area": 1312.06, + "iscrowd": 0 + }, + { + "id": 65, + "image_id": 7, + "category_id": 1, + "bbox": [ + 934.231, + 849.45, + 23.23, + 52.54 + ], + "area": 1220.505, + "iscrowd": 0 + }, + { + "id": 66, + "image_id": 7, + "category_id": 1, + "bbox": [ + 595.421, + 640.32, + 18.87, + 36.23 + ], + "area": 683.645, + "iscrowd": 0 + }, + { + "id": 67, + "image_id": 7, + "category_id": 1, + "bbox": [ + 1021.23, + 791.28, + 23.38, + 59.52 + ], + "area": 1391.565, + "iscrowd": 0 + }, + { + "id": 68, + "image_id": 7, + "category_id": 1, + "bbox": [ + 1684.429, + 627.031, + 18.689, + 35.85 + ], + "area": 670.002, + "iscrowd": 0 + }, + { + "id": 69, + "image_id": 7, + "category_id": 1, + "bbox": [ + 1226.199, + 596.38, + 13.611, + 18.68 + ], + "area": 254.247, + "iscrowd": 0 + }, + { + "id": 70, + "image_id": 7, + "category_id": 1, + "bbox": [ + 1210.76, + 612.821, + 17.61, + 29.09 + ], + "area": 512.278, + "iscrowd": 0 + }, + { + "id": 71, + "image_id": 7, + "category_id": 1, + "bbox": [ + 1052.01, + 792.0, + 23.57, + 61.41 + ], + "area": 1447.426, + "iscrowd": 0 + }, + { + "id": 72, + "image_id": 7, + "category_id": 1, + "bbox": [ + 1122.499, + 730.771, + 22.66, + 45.339 + ], + "area": 1027.385, + "iscrowd": 0 + }, + { + "id": 73, + "image_id": 7, + "category_id": 1, + "bbox": [ + 755.039, + 736.579, + 23.741, + 62.59 + ], + "area": 1485.944, + "iscrowd": 0 + }, + { + "id": 74, + "image_id": 7, + "category_id": 1, + "bbox": [ + 512.23, + 838.021, + 28.061, + 79.86 + ], + "area": 2240.922, + "iscrowd": 0 + }, + { + "id": 75, + "image_id": 7, + "category_id": 1, + "bbox": [ + 1639.389, + 832.52, + 28.061, + 75.54 + ], + "area": 2119.699, + "iscrowd": 0 + }, + { + "id": 76, + "image_id": 8, + "category_id": 1, + "bbox": [ + 412.98, + 748.079, + 29.361, + 60.0 + ], + "area": 1761.652, + "iscrowd": 0 + }, + { + "id": 77, + "image_id": 8, + "category_id": 1, + "bbox": [ + 596.809, + 734.04, + 29.361, + 63.83 + ], + "area": 1874.094, + "iscrowd": 0 + }, + { + "id": 78, + "image_id": 8, + "category_id": 1, + "bbox": [ + 1195.53, + 749.36, + 29.361, + 63.83 + ], + "area": 1874.094, + "iscrowd": 0 + }, + { + "id": 79, + "image_id": 8, + "category_id": 1, + "bbox": [ + 417.91, + 844.401, + 22.869, + 50.469 + ], + "area": 1154.193, + "iscrowd": 0 + }, + { + "id": 80, + "image_id": 8, + "category_id": 1, + "bbox": [ + 388.369, + 838.869, + 26.06, + 57.0 + ], + "area": 1485.435, + "iscrowd": 0 + }, + { + "id": 81, + "image_id": 8, + "category_id": 1, + "bbox": [ + 1654.36, + 816.76, + 29.86, + 51.56 + ], + "area": 1539.582, + "iscrowd": 0 + }, + { + "id": 82, + "image_id": 8, + "category_id": 1, + "bbox": [ + 383.62, + 739.15, + 26.799, + 61.269 + ], + "area": 1641.983, + "iscrowd": 0 + }, + { + "id": 83, + "image_id": 9, + "category_id": 1, + "bbox": [ + 1743.19, + 785.11, + 31.92, + 62.55 + ], + "area": 1996.607, + "iscrowd": 0 + }, + { + "id": 84, + "image_id": 9, + "category_id": 1, + "bbox": [ + 562.34, + 742.98, + 28.08, + 47.229 + ], + "area": 1326.204, + "iscrowd": 0 + }, + { + "id": 85, + "image_id": 9, + "category_id": 1, + "bbox": [ + 412.98, + 742.981, + 22.98, + 67.66 + ], + "area": 1554.856, + "iscrowd": 0 + }, + { + "id": 86, + "image_id": 9, + "category_id": 1, + "bbox": [ + 386.17, + 744.251, + 29.361, + 57.45 + ], + "area": 1686.755, + "iscrowd": 0 + }, + { + "id": 87, + "image_id": 10, + "category_id": 1, + "bbox": [ + 564.55, + 745.84, + 19.421, + 44.031 + ], + "area": 855.108, + "iscrowd": 0 + }, + { + "id": 88, + "image_id": 10, + "category_id": 1, + "bbox": [ + 387.14, + 743.25, + 24.601, + 58.28 + ], + "area": 1433.745, + "iscrowd": 0 + }, + { + "id": 89, + "image_id": 10, + "category_id": 1, + "bbox": [ + 415.63, + 747.14, + 16.831, + 51.8 + ], + "area": 871.832, + "iscrowd": 0 + } + ] +} diff --git a/examples/validation_demo/subset/yolov8_coco_subset_evaluation.html b/examples/validation_demo/subset/yolov8_coco_subset_evaluation.html new file mode 100644 index 0000000..f43b2d4 --- /dev/null +++ b/examples/validation_demo/subset/yolov8_coco_subset_evaluation.html @@ -0,0 +1,25 @@ + +InferEdge Evaluation Report
# InferEdge Evaluation Report
+
+- preset: `yolov8_coco`
+- engine: `onnxruntime`
+- device: `cpu`
+- samples: `10`
+- accuracy status: `evaluated`
+- contract input shape: `passed`
+- structural validation: `passed`
+- deployment signal: `review`
+
+## Metrics
+- map50: `0.14097840361885305`
+- map50_95: `0.08728567780534073`
+- f1_score: `0.21428571428571427`
+- precision: `0.29411764705882354`
+- recall: `0.16853932584269662`
+
+## Notes
+- Detection evaluation uses image directory traversal.
+- YOLOv8 postprocessing supports single-output and split boxes/scores output layouts.
+- Accuracy uses YOLO txt labels or COCO annotations when provided.
+- When annotations are missing, InferEdge records accuracy_skipped and structural validation only.
+
diff --git a/examples/validation_demo/subset/yolov8_coco_subset_evaluation.json b/examples/validation_demo/subset/yolov8_coco_subset_evaluation.json new file mode 100644 index 0000000..0ca59b0 --- /dev/null +++ b/examples/validation_demo/subset/yolov8_coco_subset_evaluation.json @@ -0,0 +1,312 @@ +{ + "report_role": "inferedge-evaluation-report", + "generated_at": "2026-05-01T08:08:02Z", + "preset": { + "name": "yolov8_coco", + "task": "object_detection", + "description": "YOLOv8 object detection on COCO-style labels.", + "input_shape": [ + 1, + 3, + 640, + 640 + ], + "input_format": "NCHW_RGB_FLOAT32_0_1", + "output_type": "yolov8_detection", + "output_shape": [ + 1, + 84, + 8400 + ], + "labels": [ + "person", + "bicycle", + "car", + "motorcycle", + "airplane", + "bus", + "train", + "truck", + "boat", + "traffic light", + "fire hydrant", + "stop sign", + "parking meter", + "bench", + "bird", + "cat", + "dog", + "horse", + "sheep", + "cow", + "elephant", + "bear", + "zebra", + "giraffe", + "backpack", + "umbrella", + "handbag", + "tie", + "suitcase", + "frisbee", + "skis", + "snowboard", + "sports ball", + "kite", + "baseball bat", + "baseball glove", + "skateboard", + "surfboard", + "tennis racket", + "bottle", + "wine glass", + "cup", + "fork", + "knife", + "spoon", + "bowl", + "banana", + "apple", + "sandwich", + "orange", + "broccoli", + "carrot", + "hot dog", + "pizza", + "donut", + "cake", + "chair", + "couch", + "potted plant", + "bed", + "dining table", + "toilet", + "tv", + "laptop", + "mouse", + "remote", + "keyboard", + "cell phone", + "microwave", + "oven", + "toaster", + "sink", + "refrigerator", + "book", + "clock", + "vase", + "scissors", + "teddy bear", + "hair drier", + "toothbrush" + ], + "thresholds": { + "score": 0.25, + "iou": 0.5 + }, + "accuracy": { + "primary_metric": "map50", + "secondary_metrics": [ + "precision", + "recall", + "f1_score", + "map50_95" + ], + "annotation_formats": [ + "coco", + "yolo_txt" + ] + } + }, + "model_contract": { + "contract_version": "1", + "task": "object_detection", + "preset": "yolov8_coco", + "labels": [ + "person", + "bicycle", + "car", + "motorcycle", + "airplane", + "bus", + "train", + "truck", + "boat", + "traffic light", + "fire hydrant", + "stop sign", + "parking meter", + "bench", + "bird", + "cat", + "dog", + "horse", + "sheep", + "cow", + "elephant", + "bear", + "zebra", + "giraffe", + "backpack", + "umbrella", + "handbag", + "tie", + "suitcase", + "frisbee", + "skis", + "snowboard", + "sports ball", + "kite", + "baseball bat", + "baseball glove", + "skateboard", + "surfboard", + "tennis racket", + "bottle", + "wine glass", + "cup", + "fork", + "knife", + "spoon", + "bowl", + "banana", + "apple", + "sandwich", + "orange", + "broccoli", + "carrot", + "hot dog", + "pizza", + "donut", + "cake", + "chair", + "couch", + "potted plant", + "bed", + "dining table", + "toilet", + "tv", + "laptop", + "mouse", + "remote", + "keyboard", + "cell phone", + "microwave", + "oven", + "toaster", + "sink", + "refrigerator", + "book", + "clock", + "vase", + "scissors", + "teddy bear", + "hair drier", + "toothbrush" + ], + "input": { + "shape": [ + 1, + 3, + 640, + 640 + ], + "format": "NCHW_RGB_FLOAT32_0_1", + "name": "images", + "dtype": "float32", + "type": null + }, + "output": { + "shape": [ + 1, + 84, + 8400 + ], + "format": "tensor", + "name": "output0", + "dtype": "float32", + "type": "yolov8_detection" + }, + "thresholds": { + "score": 0.25, + "iou": 0.5 + }, + "metadata": { + "demo_case": "normal", + "note": "Small contract fixture for contract/preset validation demos." + } + }, + "runtime_result": { + "engine": "onnxruntime", + "device": "cpu", + "sample_count": 10, + "model_input": { + "name": "images", + "dtype": "", + "shape": [ + 1, + 3, + 640, + 640 + ] + }, + "actual_input_shape": [ + 1, + 3, + 640, + 640 + ] + }, + "accuracy": { + "status": "evaluated", + "metrics": { + "map50": 0.14097840361885305, + "map50_95": 0.08728567780534073, + "f1_score": 0.21428571428571427, + "precision": 0.29411764705882354, + "recall": 0.16853932584269662 + }, + "reason": null + }, + "contract_validation": { + "input_shape": { + "status": "passed", + "actual_shape": [ + 1, + 3, + 640, + 640 + ], + "expected_shape": [ + 1, + 3, + 640, + 640 + ] + }, + "preset": "yolov8_coco", + "task": "object_detection" + }, + "structural_validation": { + "status": "passed", + "checked": { + "image_count": 10, + "detection_count": 51, + "num_classes": 80 + }, + "issues": [] + }, + "latency_summary": { + "status": "not_provided" + }, + "deployment_signal": { + "decision": "review", + "reason": "Accuracy evidence is available; compare and deployment policy still decide release." + }, + "notes": [ + "Detection evaluation uses image directory traversal.", + "YOLOv8 postprocessing supports single-output and split boxes/scores output layouts.", + "Accuracy uses YOLO txt labels or COCO annotations when provided.", + "When annotations are missing, InferEdge records accuracy_skipped and structural validation only." + ] +} diff --git a/examples/validation_demo/subset/yolov8_coco_subset_evaluation.md b/examples/validation_demo/subset/yolov8_coco_subset_evaluation.md new file mode 100644 index 0000000..0e5bdf6 --- /dev/null +++ b/examples/validation_demo/subset/yolov8_coco_subset_evaluation.md @@ -0,0 +1,23 @@ +# InferEdge Evaluation Report + +- preset: `yolov8_coco` +- engine: `onnxruntime` +- device: `cpu` +- samples: `10` +- accuracy status: `evaluated` +- contract input shape: `passed` +- structural validation: `passed` +- deployment signal: `review` + +## Metrics +- map50: `0.14097840361885305` +- map50_95: `0.08728567780534073` +- f1_score: `0.21428571428571427` +- precision: `0.29411764705882354` +- recall: `0.16853932584269662` + +## Notes +- Detection evaluation uses image directory traversal. +- YOLOv8 postprocessing supports single-output and split boxes/scores output layouts. +- Accuracy uses YOLO txt labels or COCO annotations when provided. +- When annotations are missing, InferEdge records accuracy_skipped and structural validation only. diff --git a/inferedgelab/core/detection_evaluator.py b/inferedgelab/core/detection_evaluator.py index 9ade7b5..44688dc 100644 --- a/inferedgelab/core/detection_evaluator.py +++ b/inferedgelab/core/detection_evaluator.py @@ -982,7 +982,11 @@ def evaluate_detection_engine( }, extra={ "engine_path": engine_path, - "runtime_artifact_path": getattr(engine.runtime_paths, "runtime_artifact_path", None), + "runtime_artifact_path": getattr( + getattr(engine, "runtime_paths", None), + "runtime_artifact_path", + None, + ), "image_files": image_files, "accuracy_status": accuracy_status, "accuracy_skip_reason": accuracy_skip_reason, diff --git a/inferedgelab/studio/routes.py b/inferedgelab/studio/routes.py index 5f9acb6..59196e9 100644 --- a/inferedgelab/studio/routes.py +++ b/inferedgelab/studio/routes.py @@ -21,10 +21,12 @@ STATIC_DIR = Path(__file__).resolve().parent / "static" DEMO_EVIDENCE_DIR = Path(__file__).resolve().parents[2] / "examples" / "studio_demo" +VALIDATION_DEMO_DIR = Path(__file__).resolve().parents[2] / "examples" / "validation_demo" / "subset" DEMO_EVIDENCE_FILES = ( "onnxruntime_cpu_result.json", "tensorrt_jetson_result.json", ) +DEMO_EVALUATION_REPORT = "yolov8_coco_subset_evaluation.json" DEMO_JOB_ID = "demo_yolov8n_trt_vs_onnx" STATIC_ASSETS = { "app.js": "application/javascript", @@ -156,10 +158,11 @@ def studio_import(request: Request, payload: dict[str, Any] = Body(...)) -> dict @router.get("/studio/api/demo-evidence", include_in_schema=False) def studio_demo_evidence(request: Request) -> dict[str, Any]: results = [_load_demo_result(file_name) for file_name in DEMO_EVIDENCE_FILES] + evaluation_report = _load_demo_evaluation_report() imported_results = _get_imported_results(request) imported_results.extend(results) compare = _build_imported_compare_response(results[0], results[1]) - demo_job = _build_demo_job(results, compare) + demo_job = _build_demo_job(results, compare, evaluation_report) _get_demo_jobs(request)[DEMO_JOB_ID] = demo_job return { "status": "loaded", @@ -170,6 +173,7 @@ def studio_demo_evidence(request: Request) -> dict[str, Any]: "results": results, "compare_ready": True, "compare": compare, + "evaluation_report": evaluation_report, "deployment_decision": compare["deployment_decision"], } @@ -296,7 +300,38 @@ def _load_demo_result(file_name: str) -> dict[str, Any]: return _with_compare_keys(result) -def _build_demo_job(results: list[dict[str, Any]], compare: dict[str, Any]) -> dict[str, Any]: +def _load_demo_evaluation_report() -> dict[str, Any]: + path = VALIDATION_DEMO_DIR / DEMO_EVALUATION_REPORT + try: + report = json.loads(path.read_text(encoding="utf-8")) + except OSError as exc: + raise HTTPException(status_code=500, detail=f"demo evaluation report not found: {DEMO_EVALUATION_REPORT}") from exc + except json.JSONDecodeError as exc: + raise HTTPException(status_code=500, detail=f"demo evaluation report is invalid JSON: {DEMO_EVALUATION_REPORT}") from exc + + accuracy = report.get("accuracy") if isinstance(report, dict) else None + structural = report.get("structural_validation") if isinstance(report, dict) else None + contract = report.get("contract_validation") if isinstance(report, dict) else None + if not isinstance(accuracy, dict) or not isinstance(structural, dict) or not isinstance(contract, dict): + raise HTTPException(status_code=500, detail=f"demo evaluation report schema error: {DEMO_EVALUATION_REPORT}") + + return { + "report_role": report.get("report_role"), + "source": f"examples/validation_demo/subset/{DEMO_EVALUATION_REPORT}", + "preset": (report.get("preset") or {}).get("name"), + "runtime_result": report.get("runtime_result") or {}, + "accuracy": accuracy, + "structural_validation": structural, + "contract_validation": contract, + "deployment_signal": report.get("deployment_signal") or {}, + } + + +def _build_demo_job( + results: list[dict[str, Any]], + compare: dict[str, Any], + evaluation_report: dict[str, Any], +) -> dict[str, Any]: now = _utc_now_iso() runtime_result = results[-1] if results else {} return { @@ -314,6 +349,7 @@ def _build_demo_job(results: list[dict[str, Any]], compare: dict[str, Any]) -> d "runtime_result": runtime_result, "comparison": compare, "deployment_decision": compare["deployment_decision"], + "evaluation_report": evaluation_report, "summary": compare["judgement"]["summary"], }, "error": None, diff --git a/inferedgelab/studio/static/app.js b/inferedgelab/studio/static/app.js index a9f4fbe..c6cd079 100644 --- a/inferedgelab/studio/static/app.js +++ b/inferedgelab/studio/static/app.js @@ -28,6 +28,7 @@ let selectedJobId = null; let compareData = null; let activeDecision = null; let importedResult = null; +let demoEvaluationReport = null; const importedResultsByJobId = {}; function createElement(tagName, className, textContent) { @@ -365,6 +366,7 @@ async function loadDemoEvidence() { const payload = await fetchJson("/studio/api/demo-evidence"); const results = Array.isArray(payload.results) ? payload.results : []; importedResult = results[results.length - 1] || null; + demoEvaluationReport = payload.evaluation_report || null; compareData = payload.compare || null; selectedJobId = payload.job_id || payload.job?.job_id || selectedJobId; selectedJob = payload.job || selectedJob; @@ -373,6 +375,7 @@ async function loadDemoEvidence() { setStatus("#demo-status", "Success: demo evidence loaded.", "success"); setStatus("#import-status", "Success: demo ONNX Runtime + TensorRT evidence imported.", "success"); renderImportEvidence({ result: importedResult }); + renderDemoEvaluation(demoEvaluationReport); renderImportedResult(); await loadJobs(selectedJobId); await loadCompare(); @@ -385,6 +388,32 @@ async function loadDemoEvidence() { } } +function renderDemoEvaluation(report) { + const target = document.querySelector("#demo-report-summary"); + if (!target) { + return; + } + target.replaceChildren(); + + if (!report) { + return; + } + + const metrics = report.accuracy?.metrics || {}; + const structural = report.structural_validation || {}; + const contract = report.contract_validation?.input_shape || {}; + target.append( + evidenceItem("preset", report.preset || "yolov8_coco"), + evidenceItem("samples", report.runtime_result?.sample_count ?? "-"), + evidenceItem("mAP50", metrics.map50 === undefined ? "-" : formatNumber(metrics.map50)), + evidenceItem("precision", metrics.precision === undefined ? "-" : formatNumber(metrics.precision)), + evidenceItem("recall", metrics.recall === undefined ? "-" : formatNumber(metrics.recall)), + evidenceItem("structure", structural.status || "-"), + evidenceItem("contract", contract.status || "-"), + evidenceItem("report", report.source || "-"), + ); +} + function renderPipeline() { const target = document.querySelector("#pipeline-flow"); target.replaceChildren(); @@ -446,6 +475,7 @@ function renderRunPanel() { setState("#import-state", "idle"); setState("#jetson-state", "idle"); setState("#demo-state", "idle"); + renderDemoEvaluation(null); } function resetTransientInputs() { diff --git a/inferedgelab/studio/static/index.html b/inferedgelab/studio/static/index.html index 495feba..3594610 100644 --- a/inferedgelab/studio/static/index.html +++ b/inferedgelab/studio/static/index.html @@ -137,8 +137,8 @@ } } - - + +
@@ -266,6 +266,7 @@

Replay validation evidence

+
@@ -331,7 +332,7 @@

Future Work

- - + + diff --git a/inferedgelab/studio/static/style.css b/inferedgelab/studio/static/style.css index deda0f5..5ebdad2 100644 --- a/inferedgelab/studio/static/style.css +++ b/inferedgelab/studio/static/style.css @@ -561,6 +561,14 @@ body.file-mode .file-protocol-warning { line-height: 1.45; } +.demo-report-summary { + grid-template-columns: repeat(4, minmax(0, 1fr)); +} + +.demo-report-summary .evidence-item:last-child { + grid-column: 1 / -1; +} + .metric-name, .metric-value { display: block; diff --git a/tests/test_studio_routes.py b/tests/test_studio_routes.py index a21249b..a9d2436 100644 --- a/tests/test_studio_routes.py +++ b/tests/test_studio_routes.py @@ -60,10 +60,10 @@ def test_studio_route_returns_local_studio_html(): assert "Import" in html assert "Jetson Helper" in html assert 'data-critical="studio-dark"' in html - assert 'href="/studio/static/style.css?v=15"' in html - assert 'href="style.css?v=15"' in html - assert 'src="/studio/static/app.js?v=15"' in html - assert 'src="app.js?v=15"' in html + assert 'href="/studio/static/style.css?v=16"' in html + assert 'href="style.css?v=16"' in html + assert 'src="/studio/static/app.js?v=16"' in html + assert 'src="app.js?v=16"' in html assert "file-protocol-warning" in html assert 'placeholder="results/latest.json"' in html assert 'value="results/latest.json"' not in html @@ -76,6 +76,7 @@ def test_studio_route_returns_local_studio_html(): assert "Lab's local gate" in html assert "Load Demo Evidence" in html assert 'id="demo-state"' in html + assert 'id="demo-report-summary"' in html def test_studio_static_assets_are_served(): @@ -126,6 +127,7 @@ def test_studio_static_assets_include_redesigned_ui_contracts(): assert "decisionNotes" in app_text assert "request record only" in app_text assert "loadDemoEvidence" in app_text + assert "renderDemoEvaluation" in app_text assert "/studio/api/demo-evidence" in app_text assert "jobDisplayName" in app_text assert "jobCaption" in app_text @@ -142,6 +144,7 @@ def test_studio_static_assets_include_redesigned_ui_contracts(): assert ".evidence-summary" in style_text assert ".compare-card.improvement" in style_text assert ".demo-card" in style_text + assert ".demo-report-summary" in style_text assert ".compare-stat-list" in style_text assert ".job-row .state-pill" in style_text assert "flex-wrap: wrap" in style_text @@ -346,6 +349,10 @@ def test_studio_demo_evidence_loads_compare_ready_pair(): assert response["compare"]["status"] == "ok" assert response["compare"]["judgement"]["overall"] == "improvement" assert response["deployment_decision"]["decision"] == "unknown" + assert response["evaluation_report"]["preset"] == "yolov8_coco" + assert response["evaluation_report"]["accuracy"]["status"] == "evaluated" + assert response["evaluation_report"]["accuracy"]["metrics"]["map50"] > 0 + assert response["evaluation_report"]["structural_validation"]["status"] == "passed" assert compare["status"] == "ok" assert compare["base"]["backend_key"] == "onnxruntime__cpu" assert compare["new"]["backend_key"] == "tensorrt__jetson" @@ -370,6 +377,7 @@ def test_studio_demo_evidence_is_listed_and_selectable_as_job(): assert detail["result"]["runtime_result"]["backend_key"] == "tensorrt__jetson" assert detail["result"]["comparison"]["base"]["backend_key"] == "onnxruntime__cpu" assert detail["result"]["comparison"]["new"]["backend_key"] == "tensorrt__jetson" + assert detail["result"]["evaluation_report"]["accuracy"]["metrics"]["precision"] > 0 def test_studio_importing_two_compatible_results_returns_compare_data(): diff --git a/tests/test_validation_demo_report.py b/tests/test_validation_demo_report.py new file mode 100644 index 0000000..19cad6e --- /dev/null +++ b/tests/test_validation_demo_report.py @@ -0,0 +1,24 @@ +from __future__ import annotations + +import json +from pathlib import Path + + +def test_yolov8_coco_subset_demo_report_contains_evaluated_accuracy(): + repo_root = Path(__file__).resolve().parents[1] + report_path = repo_root / "examples" / "validation_demo" / "subset" / "yolov8_coco_subset_evaluation.json" + annotation_path = repo_root / "examples" / "validation_demo" / "subset" / "yolov8_coco_subset_annotations.json" + + report = json.loads(report_path.read_text(encoding="utf-8")) + annotations = json.loads(annotation_path.read_text(encoding="utf-8")) + + assert annotations["info"]["image_count"] == 10 + assert annotations["info"]["annotation_count"] == 89 + assert report["preset"]["name"] == "yolov8_coco" + assert report["runtime_result"]["sample_count"] == 10 + assert report["accuracy"]["status"] == "evaluated" + assert round(report["accuracy"]["metrics"]["map50"], 4) == 0.141 + assert round(report["accuracy"]["metrics"]["precision"], 4) == 0.2941 + assert round(report["accuracy"]["metrics"]["recall"], 4) == 0.1685 + assert report["structural_validation"]["status"] == "passed" + assert report["contract_validation"]["input_shape"]["status"] == "passed"