| type | reference | |||
|---|---|---|---|---|
| title | API Documentation | |||
| created | 2026-02-02 | |||
| tags |
|
|||
| related |
|
Base URL: http://localhost:8080
Currently, the API does not require authentication. Agent endpoints use session-specific tokens passed in the request body.
For configuration details, see [[CONFIGURATION]]. For usage examples, see [[WORKFLOWS]].
Health check endpoint. Returns 503 during startup sweep.
Response (200 OK)
{
"status": "ok",
"timestamp": "2026-01-29T12:00:00Z",
"services": {
"lifecycle": "running",
"inventory": "ok",
"ready": "true"
}
}Response (503 Service Unavailable - during startup)
{
"status": "unavailable",
"timestamp": "2026-01-29T12:00:00Z",
"services": {
"lifecycle": "stopped",
"inventory": "ok",
"ready": "false"
}
}Simple readiness check endpoint.
Response (200 OK)
{
"ready": true,
"timestamp": "2026-01-29T12:00:00Z"
}Response (503 Service Unavailable)
{
"ready": false,
"timestamp": "2026-01-29T12:00:00Z"
}Prometheus metrics endpoint. Returns metrics in Prometheus text format.
Key metrics:
gpu_sessions_active{provider,status}- Active session countgpu_orphans_detected_total- Orphaned instances detectedgpu_destroy_failures_total- Failed destruction attemptsgpu_ssh_verify_duration_seconds- SSH verification durationgpu_ssh_verify_failures_total- SSH verification failuresgpu_provider_api_errors_total{provider,operation}- Provider API errors
List available GPU offers from all providers.
Query Parameters
| Parameter | Type | Description |
|---|---|---|
| provider | string | Filter by provider ("vastai", "tensordock") |
| gpu_type | string | Filter by GPU type (e.g., "RTX 4090", "A100") |
| location | string | Filter by location |
| min_vram | int | Minimum VRAM in GB |
| max_price | float | Maximum price per hour in USD |
| min_gpu_count | int | Minimum number of GPUs |
| gpu_count | int | Alias for min_gpu_count |
| min_reliability | float | Minimum reliability score (0-1) |
| min_availability_confidence | float | Minimum availability confidence (0-1) |
| min_cuda | float | Minimum CUDA version (e.g., 12.9). Vast.ai only. |
| template_hash_id | string | Filter to offers compatible with this Vast.ai template. Auto-applies the template's extra_filters (CUDA version, VRAM, etc). |
| limit | int | Maximum number of results (must be positive) |
| offset | int | Number of results to skip (for pagination) |
Response
{
"offers": [
{
"id": "vastai-12345",
"provider": "vastai",
"provider_id": "12345",
"gpu_type": "RTX 4090",
"gpu_count": 1,
"vram_gb": 24,
"price_per_hour": 0.45,
"location": "US-West",
"reliability": 0.98,
"available": true,
"max_duration_hours": 0,
"fetched_at": "2026-01-29T12:00:00Z",
"cuda_version": 13.0
}
],
"count": 1,
"total": 150
}Get a specific offer by ID.
Response
{
"id": "vastai-12345",
"provider": "vastai",
"provider_id": "12345",
"gpu_type": "RTX 4090",
"gpu_count": 1,
"vram_gb": 24,
"price_per_hour": 0.45,
"location": "US-West",
"reliability": 0.98,
"available": true,
"max_duration_hours": 0,
"fetched_at": "2026-01-29T12:00:00Z"
}Errors
404 Not Found- Offer not found
Get templates compatible with a specific offer (Vast.ai only).
Response
{
"offer_id": "vastai-12345",
"compatible_templates": [
{
"hash_id": "a8a44c7363cbca20056020397e3bf072",
"name": "Ollama",
"image": "vastai/ollama",
"recommended_disk_space": 32
}
],
"count": 1
}Templates are pre-configured Docker images with optimized settings for specific workloads. They simplify provisioning by bundling image, environment variables, and startup commands.
List available templates.
Query Parameters
| Parameter | Type | Description |
|---|---|---|
| name | string | Filter by template name (case-insensitive partial match) |
Response
{
"templates": [
{
"id": 338630,
"hash_id": "a8a44c7363cbca20056020397e3bf072",
"name": "Ollama",
"image": "vastai/ollama",
"tag": "0.15.4",
"env": "-p 1111:1111 -p 21434:21434 -e OLLAMA_MODEL=qwen3:14b ...",
"onstart": "entrypoint.sh",
"runtype": "jupyter",
"use_ssh": true,
"ssh_direct": true,
"recommended": true,
"recommended_disk_space": 32,
"extra_filters": "{\"cpu_arch\": {\"in\": [\"arm64\", \"amd64\"]}}",
"creator_id": 62897,
"created_at": "2026-02-02T13:27:28-05:00",
"count_created": 302
}
],
"count": 1
}Get a specific template by hash ID.
Response
{
"id": 338630,
"hash_id": "a8a44c7363cbca20056020397e3bf072",
"name": "Ollama",
"image": "vastai/ollama",
"tag": "0.15.4",
"runtype": "jupyter",
"use_ssh": true,
"recommended": true,
"recommended_disk_space": 32
}Errors
404 Not Found- Template not found
Create a new GPU session.
Request Body
{
"consumer_id": "my-application",
"offer_id": "vastai-12345",
"workload_type": "llm",
"reservation_hours": 2,
"idle_threshold_minutes": 30,
"storage_policy": "destroy",
"disk_gb": 100,
"template_hash_id": "a8a44c7363cbca20056020397e3bf072"
}| Field | Type | Required | Description |
|---|---|---|---|
| consumer_id | string | Yes | Identifier for the consumer/application |
| offer_id | string | Yes | ID of the GPU offer to provision |
| workload_type | string | Yes | "llm", "training", or "batch" |
| reservation_hours | int | Yes | Duration in hours (1-12) |
| idle_threshold_minutes | int | No | Auto-shutdown after idle time (0 = disabled) |
| storage_policy | string | No | "preserve" or "destroy" (default: "destroy") |
| launch_mode | string | No | "ssh" or "entrypoint" (default: "ssh") |
| docker_image | string | No | Custom Docker image (for entrypoint mode) |
| model_id | string | No | HuggingFace model ID (for vLLM/TGI workloads) |
| exposed_ports | array | No | Ports to expose (e.g., [8000]) |
| quantization | string | No | Quantization method (e.g., "awq", "gptq") |
| disk_gb | int | No | Disk space in GB (default: 50). Cannot be changed after instance creation. |
| template_hash_id | string | No | Vast.ai template hash ID. When provided, uses the template's image, env vars, and startup commands. SSH access is always enabled. |
Response (201 Created)
{
"session": {
"id": "sess-abc123",
"consumer_id": "my-application",
"provider": "vastai",
"gpu_type": "RTX 4090",
"gpu_count": 1,
"status": "provisioning",
"ssh_host": "192.168.1.100",
"ssh_port": 22,
"ssh_user": "root",
"workload_type": "llm",
"reservation_hours": 2,
"price_per_hour": 0.45,
"disk_gb": 100,
"created_at": "2026-01-29T12:00:00Z",
"expires_at": "2026-01-29T14:00:00Z"
},
"ssh_private_key": "-----BEGIN RSA PRIVATE KEY-----\n..."
}Note: ssh_private_key is only returned once at creation. Poll the session status until it transitions to "running" (SSH verification complete) before connecting.
Disk Allocation Notes:
- Default disk size is 50GB if
disk_gbis not specified - Disk size cannot be changed after instance creation (Vast.ai limitation)
- For large models, allocate sufficient disk space (e.g., DeepSeek-V2.5 236B requires ~132GB)
- Vast.ai templates include a
recommended_disk_spacefield that can guide allocation
List sessions.
Query Parameters
| Parameter | Type | Description |
|---|---|---|
| consumer_id | string | Filter by consumer |
| status | string | Filter by status |
| provider | string | Filter by provider ("vastai", "tensordock") |
| limit | int | Maximum results |
Response
{
"sessions": [...],
"count": 5
}Get session details.
Response
{
"id": "sess-abc123",
"consumer_id": "my-application",
"provider": "vastai",
"gpu_type": "RTX 4090",
"gpu_count": 1,
"status": "running",
"ssh_host": "192.168.1.100",
"ssh_port": 22,
"ssh_user": "root",
"workload_type": "llm",
"reservation_hours": 2,
"price_per_hour": 0.45,
"created_at": "2026-01-29T12:00:00Z",
"expires_at": "2026-01-29T14:00:00Z"
}Session Status Values
| Status | Description |
|---|---|
| pending | Session created, not yet provisioned |
| provisioning | Provider instance being created, awaiting SSH verification |
| running | Instance running and SSH verified |
| stopping | Destruction in progress |
| stopped | Successfully terminated |
| failed | Failed to provision or crashed |
Signal that work is complete and session can be terminated.
Response
{
"message": "session shutdown initiated",
"session_id": "sess-abc123"
}Extend a session's reservation time.
Request Body
{
"additional_hours": 2
}Response
{
"message": "session extended",
"session_id": "sess-abc123",
"new_expires_at": "2026-01-29T16:00:00Z"
}Force destroy a session immediately.
Response
{
"message": "session destroyed",
"session_id": "sess-abc123"
}Errors
404 Not Found- Session not found
Get diagnostic information for a running session.
Response (200 OK)
{
"session_id": "sess-abc123",
"status": "running",
"provider": "vastai",
"gpu_type": "RTX 4090",
"gpu_count": 1,
"ssh_host": "192.168.1.100",
"ssh_port": 22,
"ssh_user": "root",
"launch_mode": "ssh",
"api_endpoint": "",
"created_at": "2026-01-29T12:00:00Z",
"expires_at": "2026-01-29T14:00:00Z",
"uptime": "1h 30m",
"time_to_expiry": "30m",
"ssh_available": true,
"note": "Full SSH diagnostics require client-side SSH access."
}Errors
400 Bad Request- Session is not running404 Not Found- Session not found
Get cost information.
Query Parameters
| Parameter | Type | Description |
|---|---|---|
| consumer_id | string | Filter by consumer |
| session_id | string | Get cost for specific session |
| start_date | string | Start date (YYYY-MM-DD) |
| end_date | string | End date (YYYY-MM-DD) |
| period | string | "daily" or "monthly" |
Response (session_id provided)
{
"session_id": "sess-abc123",
"total_cost": 1.35,
"currency": "USD"
}Response (summary)
{
"consumer_id": "my-application",
"total_cost": 45.67,
"session_count": 12,
"hours_used": 98.5,
"by_provider": {
"vastai": 30.00,
"tensordock": 15.67
},
"by_gpu_type": {
"RTX 4090": 25.00,
"A100": 20.67
}
}Get monthly cost summary.
Query Parameters
| Parameter | Type | Description |
|---|---|---|
| consumer_id | string | Filter by consumer (optional) |
Response
{
"consumer_id": "",
"total_cost": 450.00,
"session_count": 89,
"hours_used": 1024.5,
"by_provider": {
"vastai": 300.00,
"tensordock": 150.00
},
"by_gpu_type": {
"RTX 4090": 200.00,
"A100": 250.00
}
}All errors follow this format:
{
"error": "error message description",
"request_id": "uuid-of-request"
}Common HTTP status codes:
400 Bad Request- Invalid request body or parameters401 Unauthorized- Invalid authentication404 Not Found- Resource not found409 Conflict- Operation conflicts with current state (e.g., extending a stopped session)500 Internal Server Error- Server error503 Service Unavailable- Server not ready or offer no longer available (stale inventory)
When provisioning fails due to stale inventory data (offer no longer available), the API returns a structured error:
Response (503 Service Unavailable)
{
"error": "offer unavailable due to stale inventory",
"error_type": "stale_inventory",
"offer_id": "vastai-12345",
"provider": "vastai",
"retry_suggested": true,
"message": "The selected offer is no longer available. Please refresh inventory and select a different offer.",
"request_id": "uuid-of-request"
}When you receive this error:
- Refresh the inventory list
- Select a different offer
- Retry the provision request
- [[CONFIGURATION]] - Environment variables and configuration options
- [[WORKFLOWS]] - Common usage patterns and examples
- [[PROVIDERS]] - Provider-specific information
- [[TROUBLESHOOTING]] - Common issues and solutions