InferEdgeLab API layer is a thin FastAPI adapter over the existing service layer.
- It exposes InferEdgeLab read-only workflows over HTTP
- It keeps business logic in reusable services, not in the API layer
- It provides a practical bridge toward future Web UI and SaaS expansion
This means the current API is intended to reuse the same validation flow already available from the CLI, not to replace it with a separate implementation.
InferEdgeLab currently follows this boundary:
- CLI: argument parsing, file saving, console rendering, command entrypoints
- Service layer: domain orchestration for compare, history-report, summarize, list-results
- API adapter: HTTP parameter binding and service response exposure
In short:
CLI / HTTP API -> Service Layer -> Existing domain logic / loaders / renderers
The FastAPI layer is intentionally thin. It reuses the same service-layer logic used by the CLI, so compare/history/summarize/list-results behavior stays aligned across interfaces.
Basic launch:
poetry run inferedgelab serveCustom host and port:
poetry run inferedgelab serve --host 0.0.0.0 --port 8000Development mode with auto-reload:
poetry run inferedgelab serve --host 127.0.0.1 --port 8000 --reloadBy default, the API runs on 127.0.0.1:8000.
Current endpoints:
GET /healthGET /api/list-resultsGET /api/summarizeGET /api/history-reportGET /api/comparePOST /api/analyzeGET /api/jobs/{job_id}POST /api/jobs/{job_id}/complete-dev
Request:
curl "http://127.0.0.1:8000/health"Response:
{"status":"ok","service":"inferedgelab-api","version":"0.1.0"}Purpose:
- Returns recent structured result items
- Reuses the
list-resultsservice bundle contract
Example:
curl "http://127.0.0.1:8000/api/list-results?limit=5"Example with filters:
curl "http://127.0.0.1:8000/api/list-results?model=toy224.onnx&engine=onnxruntime&device=cpu&precision=fp32"Response structure:
meta- request metadata such as
pattern,limit,filters,count
- request metadata such as
dataitems: structured result item list
Purpose:
- Builds summary bundle data and rendered Markdown
- Reuses the same summarize service used by CLI output generation
Example:
curl "http://127.0.0.1:8000/api/summarize?pattern=reports/*.json&mode=latest&sort=p99"Example with recent/top:
curl "http://127.0.0.1:8000/api/summarize?pattern=reports/*.json&mode=both&sort=time&recent=5&top=3"Response structure:
meta- request metadata such as
pattern,format,mode,sort,recent,top
- request metadata such as
datarows,latest_rows,history_rows
renderedmarkdown
Purpose:
- Selects history results with filters
- Produces HTML and optional Markdown report content
Example:
curl "http://127.0.0.1:8000/api/history-report?model=toy224.onnx&include_markdown=true"Example with shape filters:
curl "http://127.0.0.1:8000/api/history-report?engine=onnxruntime&device=cpu&batch=1&height=224&width=224"Response structure:
history- matched structured result history
filters- applied history filters
html- rendered HTML report text
markdown- rendered Markdown report text or
null
- rendered Markdown report text or
Purpose:
- Compares two structured result files
- Returns the SaaS API response contract with compare result data, judgement, rendered Markdown/HTML, deployment decision, provenance, and optional AIGuard evidence
Path-based example:
curl "http://127.0.0.1:8000/api/compare?base_path=results/base.json&new_path=results/new.json"JSON body example:
curl -X POST "http://127.0.0.1:8000/api/compare" \
-H "Content-Type: application/json" \
-d '{
"base_result": {"model": "resnet18", "engine": "onnxruntime", "device": "cpu", "precision": "fp32", "batch": 1, "height": 224, "width": 224, "mean_ms": 10.0, "p99_ms": 12.0, "timestamp": "2026-04-13T09:00:00Z"},
"new_result": {"model": "resnet18", "engine": "onnxruntime", "device": "cpu", "precision": "fp32", "batch": 1, "height": 224, "width": 224, "mean_ms": 9.0, "p99_ms": 11.0, "timestamp": "2026-04-13T10:00:00Z"},
"guard_analysis": {"status": "ok", "anomalies": [], "suspected_causes": [], "recommendations": [], "confidence": 0.5}
}'Response structure:
summary- compact response type, comparison mode, overall judgement, deployment decision, and guard status
comparison- compare metrics and context
deployment_decision- Lab-owned deployment decision; always included
guard_analysis- optional AIGuard evidence; omitted when not provided or not executed
provenance,metadata,timestamps,execution_info- frontend/SaaS integration context
SaaS-facing compare responses should be wrapped into a stable external JSON shape. The wrapper preserves existing service-layer output and does not change compare, report, or deployment decision logic.
Required top-level fields:
summary- compact response type, comparison mode, overall judgement, deployment decision, and guard status
comparison- compare result, judgement, and rendered Markdown/HTML report content
deployment_decision- Lab-owned deploy/review/block/unknown decision
guard_analysis- optional AIGuard evidence; omitted when AIGuard is not installed or not executed
provenance- runtime, shape, and run configuration provenance copied from the compare bundle
metadata- request and bundle metadata such as paths and legacy warning state
timestamps- base/new result timestamps when available
execution_info- path, selection mode, and execution-context fields needed by frontend/SaaS clients
The fixture at tests/fixtures/api_response_bundle.json locks the external contract for deployable, review_required, blocked, and AIGuard-absent responses. AIGuard remains optional, and InferEdgeLab remains the final deployment decision owner.
Long-running SaaS workflows such as future /api/analyze calls should use the async job response contract documented in saas_job_workflow.md.
The contract defines queued, running, completed, failed, and cancelled job states. Completed jobs carry the existing API response contract bundle in result, including Lab-owned deployment_decision; failed jobs keep result as null and include structured error details.
Future Forge/Runtime worker handoff payloads are documented in worker_integration_contract.md. That contract defines the minimum worker request/completed/failed response shapes without adding queue, database, Forge execution, or Runtime execution infrastructure.
Current stub example:
curl -X POST "http://127.0.0.1:8000/api/analyze" \
-H "Content-Type: application/json" \
-d '{
"model_path": "models/resnet18.onnx",
"metadata_path": "artifacts/metadata.json",
"manifest_path": "artifacts/manifest.json",
"notes": "smoke job"
}'The current implementation returns a queued in-memory job and does not run Forge, Runtime, uploads, queues, or workers.
Poll the job:
curl "http://127.0.0.1:8000/api/jobs/job_..."Development-only completion stub:
curl -X POST "http://127.0.0.1:8000/api/jobs/job_.../complete-dev" \
-H "Content-Type: application/json" \
-d '{
"result": {
"summary": {"response_type": "compare", "overall": "improvement", "comparison_mode": "same_precision", "precision_pair": ["fp32", "fp32"], "deployment_decision": "deployable", "guard_status": null},
"comparison": {"result": {}, "judgement": {}, "rendered": {"markdown": "# Compare", "html": "<html></html>"}},
"deployment_decision": {"decision": "deployable", "reason": "Mock dev completion result.", "lab_overall": "improvement", "guard_status": null, "recommended_action": "Review generated report before deployment."},
"provenance": {"source_bundle": "compare"},
"metadata": {"legacy_warning": false},
"timestamps": {"base": "2026-04-13T09:00:00Z", "new": "2026-04-13T10:00:00Z"},
"execution_info": {"base_path": "dev/base.json", "new_path": "dev/new.json", "selection_mode": null, "legacy_warning": false}
}
}'/api/jobs/{job_id}/complete-dev is only a development/mock path. It stores a caller-provided API response contract bundle on an in-memory job so SaaS clients can smoke-test the queued-to-completed flow before real Forge/Runtime worker integration exists.
- The compare/history/summarize/list-results API layer remains a thin service adapter.
- The analyze job endpoints are in-memory SaaS workflow stubs.
- The API layer reuses service-layer logic rather than duplicating benchmark logic inside HTTP handlers.
- This keeps CLI and API behavior aligned across compare, history-report, summarize, and list-results.
- The API is intentionally a bridge layer for future Web UI or SaaS-oriented expansion, not a separate product surface yet.