InferEdgeLab FastAPI Read-Only Adapter Usage Guide

Purpose

InferEdgeLab API layer is a thin FastAPI adapter over the existing service layer.

It exposes InferEdgeLab read-only workflows over HTTP
It keeps business logic in reusable services, not in the API layer
It provides a practical bridge toward future Web UI and SaaS expansion

This means the current API is intended to reuse the same validation flow already available from the CLI, not to replace it with a separate implementation.

CLI, Service, API Adapter Relationship

InferEdgeLab currently follows this boundary:

CLI: argument parsing, file saving, console rendering, command entrypoints
Service layer: domain orchestration for compare, history-report, summarize, list-results
API adapter: HTTP parameter binding and service response exposure

In short:

CLI / HTTP API -> Service Layer -> Existing domain logic / loaders / renderers

The FastAPI layer is intentionally thin. It reuses the same service-layer logic used by the CLI, so compare/history/summarize/list-results behavior stays aligned across interfaces.

Run the Server

Basic launch:

poetry run inferedgelab serve

Custom host and port:

poetry run inferedgelab serve --host 0.0.0.0 --port 8000

Development mode with auto-reload:

poetry run inferedgelab serve --host 127.0.0.1 --port 8000 --reload

By default, the API runs on 127.0.0.1:8000.

Endpoints

Current endpoints:

GET /health
GET /api/list-results
GET /api/summarize
GET /api/history-report
GET /api/compare
POST /api/analyze
GET /api/jobs/{job_id}
POST /api/jobs/{job_id}/complete-dev

Health Check

Request:

curl "http://127.0.0.1:8000/health"

Response:

{"status":"ok","service":"inferedgelab-api","version":"0.1.0"}

List Results

Purpose:

Returns recent structured result items
Reuses the list-results service bundle contract

Example:

curl "http://127.0.0.1:8000/api/list-results?limit=5"

Example with filters:

curl "http://127.0.0.1:8000/api/list-results?model=toy224.onnx&engine=onnxruntime&device=cpu&precision=fp32"

Response structure:

meta
- request metadata such as pattern, limit, filters, count
data
- items: structured result item list

Summarize

Purpose:

Builds summary bundle data and rendered Markdown
Reuses the same summarize service used by CLI output generation

Example:

curl "http://127.0.0.1:8000/api/summarize?pattern=reports/*.json&mode=latest&sort=p99"

Example with recent/top:

curl "http://127.0.0.1:8000/api/summarize?pattern=reports/*.json&mode=both&sort=time&recent=5&top=3"

Response structure:

meta
- request metadata such as pattern, format, mode, sort, recent, top
data
- rows, latest_rows, history_rows
rendered
- markdown

History Report

Purpose:

Selects history results with filters
Produces HTML and optional Markdown report content

Example:

curl "http://127.0.0.1:8000/api/history-report?model=toy224.onnx&include_markdown=true"

Example with shape filters:

curl "http://127.0.0.1:8000/api/history-report?engine=onnxruntime&device=cpu&batch=1&height=224&width=224"

Response structure:

history
- matched structured result history
filters
- applied history filters
html
- rendered HTML report text
markdown
- rendered Markdown report text or null

Compare

Purpose:

Compares two structured result files
Returns the SaaS API response contract with compare result data, judgement, rendered Markdown/HTML, deployment decision, provenance, and optional AIGuard evidence

Path-based example:

curl "http://127.0.0.1:8000/api/compare?base_path=results/base.json&new_path=results/new.json"

JSON body example:

curl -X POST "http://127.0.0.1:8000/api/compare" \
  -H "Content-Type: application/json" \
  -d '{
    "base_result": {"model": "resnet18", "engine": "onnxruntime", "device": "cpu", "precision": "fp32", "batch": 1, "height": 224, "width": 224, "mean_ms": 10.0, "p99_ms": 12.0, "timestamp": "2026-04-13T09:00:00Z"},
    "new_result": {"model": "resnet18", "engine": "onnxruntime", "device": "cpu", "precision": "fp32", "batch": 1, "height": 224, "width": 224, "mean_ms": 9.0, "p99_ms": 11.0, "timestamp": "2026-04-13T10:00:00Z"},
    "guard_analysis": {"status": "ok", "anomalies": [], "suspected_causes": [], "recommendations": [], "confidence": 0.5}
  }'

Response structure:

summary
- compact response type, comparison mode, overall judgement, deployment decision, and guard status
comparison
- compare metrics and context
deployment_decision
- Lab-owned deployment decision; always included
guard_analysis
- optional AIGuard evidence; omitted when not provided or not executed
provenance, metadata, timestamps, execution_info
- frontend/SaaS integration context

InferEdgeLab API Response Contract

SaaS-facing compare responses should be wrapped into a stable external JSON shape. The wrapper preserves existing service-layer output and does not change compare, report, or deployment decision logic.

Required top-level fields:

summary
- compact response type, comparison mode, overall judgement, deployment decision, and guard status
comparison
- compare result, judgement, and rendered Markdown/HTML report content
deployment_decision
- Lab-owned deploy/review/block/unknown decision
guard_analysis
- optional AIGuard evidence; omitted when AIGuard is not installed or not executed
provenance
- runtime, shape, and run configuration provenance copied from the compare bundle
metadata
- request and bundle metadata such as paths and legacy warning state
timestamps
- base/new result timestamps when available
execution_info
- path, selection mode, and execution-context fields needed by frontend/SaaS clients

The fixture at tests/fixtures/api_response_bundle.json locks the external contract for deployable, review_required, blocked, and AIGuard-absent responses. AIGuard remains optional, and InferEdgeLab remains the final deployment decision owner.

SaaS Async Job Workflow

Long-running SaaS workflows such as future /api/analyze calls should use the async job response contract documented in saas_job_workflow.md.

The contract defines queued, running, completed, failed, and cancelled job states. Completed jobs carry the existing API response contract bundle in result, including Lab-owned deployment_decision; failed jobs keep result as null and include structured error details.

Future Forge/Runtime worker handoff payloads are documented in worker_integration_contract.md. That contract defines the minimum worker request/completed/failed response shapes without adding queue, database, Forge execution, or Runtime execution infrastructure.

Current stub example:

curl -X POST "http://127.0.0.1:8000/api/analyze" \
  -H "Content-Type: application/json" \
  -d '{
    "model_path": "models/resnet18.onnx",
    "metadata_path": "artifacts/metadata.json",
    "manifest_path": "artifacts/manifest.json",
    "notes": "smoke job"
  }'

The current implementation returns a queued in-memory job and does not run Forge, Runtime, uploads, queues, or workers.

Poll the job:

curl "http://127.0.0.1:8000/api/jobs/job_..."

Development-only completion stub:

curl -X POST "http://127.0.0.1:8000/api/jobs/job_.../complete-dev" \
  -H "Content-Type: application/json" \
  -d '{
    "result": {
      "summary": {"response_type": "compare", "overall": "improvement", "comparison_mode": "same_precision", "precision_pair": ["fp32", "fp32"], "deployment_decision": "deployable", "guard_status": null},
      "comparison": {"result": {}, "judgement": {}, "rendered": {"markdown": "# Compare", "html": "<html></html>"}},
      "deployment_decision": {"decision": "deployable", "reason": "Mock dev completion result.", "lab_overall": "improvement", "guard_status": null, "recommended_action": "Review generated report before deployment."},
      "provenance": {"source_bundle": "compare"},
      "metadata": {"legacy_warning": false},
      "timestamps": {"base": "2026-04-13T09:00:00Z", "new": "2026-04-13T10:00:00Z"},
      "execution_info": {"base_path": "dev/base.json", "new_path": "dev/new.json", "selection_mode": null, "legacy_warning": false}
    }
  }'

/api/jobs/{job_id}/complete-dev is only a development/mock path. It stores a caller-provided API response contract bundle on an in-memory job so SaaS clients can smoke-test the queued-to-completed flow before real Forge/Runtime worker integration exists.

Notes

The compare/history/summarize/list-results API layer remains a thin service adapter.
The analyze job endpoints are in-memory SaaS workflow stubs.
The API layer reuses service-layer logic rather than duplicating benchmark logic inside HTTP handlers.
This keeps CLI and API behavior aligned across compare, history-report, summarize, and list-results.
The API is intentionally a bridge layer for future Web UI or SaaS-oriented expansion, not a separate product surface yet.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

InferEdgeLab FastAPI Read-Only Adapter Usage Guide

Purpose

CLI, Service, API Adapter Relationship

Run the Server

Endpoints

Health Check

List Results

Summarize

History Report

Compare

InferEdgeLab API Response Contract

SaaS Async Job Workflow

Notes

FilesExpand file tree

api_usage.md

Latest commit

History

api_usage.md

File metadata and controls

InferEdgeLab FastAPI Read-Only Adapter Usage Guide

Purpose

CLI, Service, API Adapter Relationship

Run the Server

Endpoints

Health Check

List Results

Summarize

History Report

Compare

InferEdgeLab API Response Contract

SaaS Async Job Workflow

Notes