All API responses are JSON. Errors follow OpenAI's error format where applicable.
Authorization: Bearer se-<uuid>
Token is looked up by SHA-256 hash in tokens table. Resolves to user + permissions.
Cookie: se_session=<hex-token>
Set after OIDC login. 24-hour TTL. Looked up by SHA-256 hash in sessions table.
Active when BREAK_GLASS=true. Validates credentials against BOOTSTRAP_USER and BOOTSTRAP_PASSWORD env vars, then mints a session cookie. HTTP Basic Auth is not accepted anywhere.
Request body:
{ "username": "string", "password": "string" }Responses:
204 No Content— credentials valid;Set-Cookie: se_session=<token>is included in the response. Use this cookie for all subsequent authenticated requests.401 Unauthorized— credentials invalid.404 Not Found— bootstrap is not active (BREAK_GLASSis nottrue).429 Too Many Requests— rate limit exceeded (5 requests/min/IP).
Example:
# Login and capture the session cookie
curl -c cookies.txt -s -o /dev/null -w "%{http_code}" \
-X POST https://api.example.com/auth/bootstrap-login \
-H 'Content-Type: application/json' \
-d '{"username":"admin","password":"changeme"}'
# Use the cookie for subsequent calls
curl -b cookies.txt https://api.example.com/api/user/tokens | jq .List enabled OIDC providers for the login page. Also signals whether the break-glass login form should be shown.
Response 200:
{
"providers": [
{ "id": "string", "name": "string" }
],
"bootstrap_active": true
}bootstrap_active is true when BREAK_GLASS=true is set server-side. The portal UI renders the break-glass form below the OIDC provider buttons when this field is true.
Redirects to the OIDC provider's authorization endpoint.
Response 302: Redirect to IdP.
Handles OIDC callback. Exchanges code for tokens, creates/updates user, sets session cookie.
Response 302: Redirect to / with Set-Cookie: se_session=<token>.
Clears session.
Request: (empty body, session cookie required)
Response 200:
{ "status": "logged_out" }Returns current session user info. Used by the UI to check auth state.
Response 200:
{
"user_id": "string",
"email": "string | null",
"display_name": "string | null",
"is_admin": true
}Response 401: Not authenticated.
List the authenticated user's API tokens (hashed — never returns plaintext).
Response 200:
{
"tokens": [
{
"id": "string",
"name": "string",
"category_id": "string | null",
"category_name": "string | null",
"specific_model_id": "string | null",
"expires_at": "string | null",
"revoked": false,
"created_at": "string"
}
]
}Mint a new API token. Returns the plaintext token once.
Request:
{
"name": "string",
"category_id": "string | null",
"specific_model_id": "string | null",
"expires_in_days": 90
}expires_in_days is an integer (default 90). The token expires that many days from creation.
Response 201:
{
"token": "se-<uuid>",
"name": "string",
"warning": "Save this token — it cannot be shown again."
}Revoke a token.
Response 200:
{ "status": "revoked" }Response 404: Token not found or not owned by user.
Usage statistics for the authenticated user.
Query params:
period—hour,day,week,month(default:day)
Response 200:
{
"summary": {
"total_requests": 0,
"total_input_tokens": 0,
"total_output_tokens": 0,
"period": "day"
},
"by_model": [
{
"model_id": "string",
"category_name": "string",
"requests": 0,
"input_tokens": 0,
"output_tokens": 0
}
],
"by_token": [
{
"token_name": "string",
"requests": 0,
"input_tokens": 0,
"output_tokens": 0
}
]
}Time-series usage data broken down by model and token.
Query params:
period—hour,day,week,month(default:day)
Response 200:
{
"timeline": [
{
"timestamp": "string",
"model": "string",
"requests": 0,
"input_tokens": 0,
"output_tokens": 0
}
],
"timeline_by_token": [
{
"timestamp": "string",
"token_name": "string",
"requests": 0,
"input_tokens": 0,
"output_tokens": 0
}
]
}List available model categories (read-only for non-admins).
Response 200:
{
"categories": [
{
"id": "string",
"name": "string",
"description": "string",
"preferred_model_id": "string | null",
"created_at": "string"
}
]
}List all registered models (read-only for non-admins).
Response 200:
{
"models": [
{
"id": "string",
"hf_repo": "string",
"filename": "string | null",
"size_bytes": 0,
"category_id": "string | null",
"loaded": false,
"backend_type": "llamacpp",
"context_length": 4096,
"created_at": "string"
}
]
}Disk usage for the model storage path.
Response 200:
{
"total_bytes": 0,
"used_bytes": 0,
"free_bytes": 0
}Unified Server-Sent Events stream merging metrics and reservation signals.
Event types:
-
metrics(every ~2s) — system metrics snapshot- Admin payload: full
MetricsSnapshot(GPU memory, CPU, disk, queues, containers, active reservation) - Non-admin payload:
{ gpu_memory, active_reservation, timestamp }
- Admin payload: full
-
reservations_changed— emitted on any reservation state change (no data payload)
Example:
event: metrics
data: {"gpu_memory":{"total_mb":32768,"used_mb":8192},"timestamp":"2026-02-17T10:00:00Z",...}
event: reservations_changed
Clients should reconnect on disconnection. The stream uses SSE keep-alive.
Create a new reservation request. Times must be on 30-minute boundaries and in the future.
Request:
{
"start_time": "2026-02-20T14:00:00",
"end_time": "2026-02-20T18:00:00",
"reason": "Batch inference job"
}Response 201:
{ "id": "uuid", "status": "pending" }Response 400: Invalid times, not on 30-min boundary, end before start, or in the past. Response 409: Overlaps with an existing approved/active reservation.
List the current user's reservations (all statuses).
Response 200:
{
"reservations": [
{
"id": "uuid",
"user_id": "uuid",
"status": "pending | approved | active | completed | rejected | cancelled",
"start_time": "string",
"end_time": "string",
"reason": "string",
"admin_note": "string",
"approved_by": "uuid | null",
"created_at": "string",
"updated_at": "string"
}
]
}Cancel own pending or approved reservation.
Response 200:
{ "status": "cancelled" }Response 404: Not found, not owned by user, or not in a cancellable state.
Get the currently active reservation (if any). Visible to all authenticated users.
Response 200:
{
"active": true,
"reservation_id": "uuid",
"user_id": "uuid",
"user_display_name": "string | null",
"end_time": "string"
}Or { "active": false } when no reservation is active.
All approved, active, and pending reservations for calendar display (all users).
Response 200:
{
"reservations": [
{
"id": "uuid",
"user_id": "uuid",
"status": "string",
"start_time": "string",
"end_time": "string",
"reason": "string",
"user_email": "string | null",
"user_display_name": "string | null"
}
]
}Start a container during the active reservation (reservation holder only).
Request:
{
"model_id": "uuid",
"backend_type": "llamacpp",
"gpu_type": "rocm | cuda | none",
"gpu_layers": 99,
"context_size": 4096,
"parallel": 1
}Only model_id is required; other fields have defaults.
Response 200:
{
"container": "sovereign-llamacpp-<model_id>",
"url": "http://sovereign-llamacpp-<model_id>:8080"
}Response 403: Caller does not hold the active reservation.
Stop a container during the active reservation (reservation holder only).
Request:
{ "model_id": "uuid" }Response 200:
{ "status": "stopped" }List all reservations with user display info.
Response 200: Same shape as user listing but includes all users' reservations.
Approve a pending reservation. Checks for overlap before approving.
Request:
{ "note": "Optional admin note" }Response 200:
{ "status": "approved" }Response 409: Approving would create an overlap.
Reject a pending reservation.
Request:
{ "note": "Optional rejection reason" }Response 200:
{ "status": "rejected" }Force-activate an approved reservation immediately.
Response 200:
{ "status": "active" }Response 409: Another reservation is already active.
Force-end an active reservation early.
Response 200:
{ "status": "completed" }Delete a reservation record. Cannot delete active reservations (deactivate first).
Response 200:
{ "status": "deleted" }Return current fairness/queue settings.
Response 200:
{
"fairness_base_priority": 100.0,
"fairness_wait_weight": 1.0,
"fairness_usage_weight": 10.0,
"fairness_usage_scale": 1000.0,
"fairness_window_minutes": 60,
"queue_timeout_secs": 30
}Partial update — only the provided keys are changed.
Request:
{
"fairness_base_priority": 200.0,
"queue_timeout_secs": 60
}Response 200: Returns the full updated settings object (same shape as GET).
List all configured IdPs.
Response 200:
{
"idps": [
{
"id": "string",
"name": "string",
"issuer": "string",
"client_id": "string",
"scopes": "string",
"enabled": true,
"created_at": "string"
}
]
}Add a new OIDC provider.
Request:
{
"name": "string",
"issuer": "string",
"client_id": "string",
"client_secret": "string",
"scopes": "openid email profile"
}Response 201:
{
"id": "string",
"name": "string"
}Update an IdP configuration.
Request: Same fields as POST (all optional).
Response 200:
{ "status": "updated" }Disable an IdP (soft delete — sets enabled = 0).
Response 200:
{ "status": "disabled" }Response 200:
{
"categories": [
{
"id": "string",
"name": "string",
"description": "string",
"preferred_model_id": "string | null",
"created_at": "string"
}
]
}Request:
{
"name": "string",
"description": "string",
"preferred_model_id": "string | null"
}Response 201:
{ "id": "string", "name": "string" }Request: Same fields as POST (all optional).
Response 200:
{ "status": "updated" }Response 200:
{ "status": "deleted" }Response 200:
{
"models": [
{
"id": "string",
"hf_repo": "string",
"filename": "string | null",
"size_bytes": 0,
"category_id": "string | null",
"loaded": false,
"backend_type": "llamacpp",
"last_used_at": "string | null",
"created_at": "string"
}
]
}Register a model (does not download or start it).
Request:
{
"hf_repo": "string",
"category_id": "string | null"
}Response 201:
{ "id": "string", "hf_repo": "string" }Update model metadata (e.g. assign to category).
Request:
{
"category_id": "string | null"
}Response 200:
{ "status": "updated" }Unregister a model (must be unloaded first).
Response 200:
{ "status": "deleted" }Response 409: Model is currently loaded.
List all managed backend containers.
Response 200:
{
"containers": [
{
"id": "string",
"names": ["string"],
"state": "running | exited | ...",
"status": "string",
"labels": {}
}
]
}Container model IDs can be found in the
labelsfield under the keysovereign-engine.model-id. Containers are namedsovereign-llamacpp-{model_id}.
Start a backend container for a model.
Request:
{
"model_id": "string",
"gpu_type": "rocm | cuda | none",
"gpu_layers": 99,
"context_size": 4096,
"parallel": 1
}Backend containers are attached to the internal Docker network (
sovereign-internal) and are not exposed on any host port. The proxy reaches them by container name.
Response 200:
{
"container": "sovereign-llamacpp-<model_id>",
"url": "http://sovereign-llamacpp-<model_id>:8080"
}Stop and remove a backend container.
Request:
{
"model_id": "string"
}Response 200:
{ "status": "stopped" }Response 200:
{
"users": [
{
"id": "string",
"idp_id": "string",
"email": "string | null",
"display_name": "string | null",
"is_admin": false,
"created_at": "string",
"usage_summary": {
"total_requests": 0,
"total_tokens": 0
}
}
]
}Update user (toggle admin, etc).
Request:
{
"is_admin": true
}Response 200:
{ "status": "updated" }System overview: disk, queue depth, container health.
Response 200:
{
"disk": {
"model_path": "/models",
"total_bytes": 0,
"used_bytes": 0,
"free_bytes": 0
},
"queues": {
"category_name": { "depth": 0, "avg_wait_ms": 0 }
},
"containers": [
{
"model_id": "string",
"healthy": true,
"uptime_seconds": 0
}
]
}Response 200:
{
"mappings": [
{
"id": "string",
"idp_id": "string",
"group_claim": "string",
"group_value": "string",
"category_id": "string"
}
]
}Request:
{
"idp_id": "string",
"group_claim": "string",
"group_value": "string",
"category_id": "string"
}Response 201:
{ "id": "string" }Response 200:
{ "status": "deleted" }These follow the OpenAI API specification.
List loaded models.
Response 200:
{
"object": "list",
"data": [
{
"id": "string",
"object": "model",
"owned_by": "sovereign-engine"
}
]
}Chat completion. Body is passed through to the llama.cpp backend.
Request: Standard OpenAI ChatCompletion request. The model field can be:
- A model category name (e.g.
"thinking") — resolved to preferred model - A specific model ID — used directly
Response 200: Standard OpenAI ChatCompletion response (or SSE stream if stream: true).
Text completion. Same routing logic as chat completions.
Search HuggingFace models.
Response 200:
{
"models": [
{
"id": "org/model-name",
"downloads": 0,
"likes": 0,
"pipeline_tag": "text-generation",
"tags": ["string"]
}
]
}Start downloading a model from HuggingFace.
Request:
{
"hf_repo": "string",
"category_id": "string | null"
}Response 202:
{
"download_id": "string",
"status": "started"
}List active/recent downloads.
Response 200:
{
"downloads": [
{
"id": "string",
"hf_repo": "string",
"progress_bytes": 0,
"total_bytes": 0,
"status": "downloading | complete | failed",
"error": "string | null"
}
]
}Cancel an active download.
Response 200:
{ "status": "cancelled" }All errors follow this structure:
{
"error": {
"message": "Human-readable description",
"type": "invalid_request_error | server_error | auth_error",
"code": "machine_readable_code"
}
}For non-OpenAI routes, a simplified form is also acceptable:
{
"error": "Human-readable description"
}