-
-
Notifications
You must be signed in to change notification settings - Fork 0
API Endpoints
All /v1/* and /proxy/* endpoints require Authorization: Bearer <client_token> or Bearer <api_key_token>.
API keys support RBAC — roles (admin/user/read-only), agent scoping, and endpoint restrictions. See Secrets for full RBAC configuration.
| Endpoint | Auth | Description |
|---|---|---|
POST /v1/chat/completions |
Bearer | OpenAI-compatible chat with SSE streaming, agent fallback, MCP multi-turn tool loop |
POST /v1/messages |
Bearer | Anthropic Messages API with bidirectional OpenAI↔Anthropic format conversion. Non-streaming responses correctly emit choices[].message (not choices[].delta) for Anthropic-to-OpenAI conversion. |
GET /v1/models |
Bearer | Live model catalog from discovered providers + static registry (includes context window, max tokens, capabilities, pricing) |
POST /v1/embeddings |
Bearer | Passthrough proxy with token counting and rate limiting |
POST /v1/responses |
Bearer | Passthrough proxy with URL resolution, path traversal hardening, and retry logic |
POST /v1/images/generations |
Bearer | Image generation (OpenAI-compatible) |
POST /v1/audio/transcriptions |
Bearer | Audio transcription (multipart form-data support) |
POST /v1/audio/speech |
Bearer | Text-to-speech synthesis |
POST /v1/moderations |
Bearer | Content moderation |
POST /v1/rerank |
Bearer | Re-ranking (Cohere/Jina-compatible) |
POST /v1/a2a |
Bearer | Agent-to-Agent protocol (Google A2A) |
GET /v1/files |
Bearer | File listing |
POST /v1/files |
Bearer | File upload |
DELETE /v1/files |
Bearer | File deletion |
POST/GET /v1/batches |
Bearer | Batch API operations |
POST /proxy/{provider}/* |
Bearer | Arbitrary provider endpoint passthrough (all HTTP methods, SSE streaming auto-detect) |
GET /healthz |
None | Engine health probe (returns OK if gateway is running) |
GET /statsz |
None | Token usage per model, circuit breaker states, MCP server status |
GET /metrics |
None | Prometheus-compatible metrics |
GET /debug/pprof |
Bearer | Go pprof profiling (CPU, memory, goroutines). Requires debug.pprof_enabled: true in config |
POST /v1/images/generations
Generate images from text prompts. Proxied to the upstream provider's /v1/images/generations endpoint.
Default provider: openai
Example:
curl -X POST -H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{"prompt":"A beautiful sunset","n":1,"size":"1024x1024"}' \
http://localhost:8080/v1/images/generationsPOST /v1/audio/transcriptions
Transcribe audio files to text. Sends multipart/form-data to the upstream provider's /v1/audio/transcriptions endpoint. The original Content-Type (including boundary) is preserved.
Default provider: openai
Example:
curl -X POST -H "Authorization: Bearer $TOKEN" \
-F "file=@audio.mp3" \
-F "model=whisper-1" \
http://localhost:8080/v1/audio/transcriptionsPOST /v1/audio/speech
Generate audio from text. Proxied to the upstream provider's /v1/audio/speech endpoint. Returns the audio stream (e.g., audio/mpeg) directly.
Default provider: openai
Example:
curl -X POST -H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{"model":"tts-1","input":"Hello world","voice":"alloy"}' \
http://localhost:8080/v1/audio/speech -o speech.mp3POST /v1/moderations
Classify text for potentially harmful content. Proxied to the upstream provider's /v1/moderations endpoint.
Default provider: openai
Example:
curl -X POST -H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{"input":"I want to harm someone"}' \
http://localhost:8080/v1/moderationsPOST /v1/rerank
Re-rank documents by relevance to a query. Proxied to the upstream provider's /v1/rerank endpoint.
Default provider: cohere (falls back to any available provider)
Example:
curl -X POST -H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"model":"rerank-english-v2.0",
"query":"What is the capital of France?",
"documents":["Paris is the capital.","The Eiffel Tower is in Paris."]
}' \
http://localhost:8080/v1/rerankPOST /v1/messages
Anthropic Messages API endpoint with bidirectional format conversion between OpenAI and Anthropic wire formats. Supports Anthropic-native clients directly. Proxied to the upstream provider's /v1/messages endpoint.
Default provider: anthropic (falls back to any available provider)
Example:
curl -X POST -H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-H "anthropic-version: 2023-06-01" \
-d '{
"model":"claude-3-5-sonnet-20241022",
"max_tokens":1024,
"messages":[{"role":"user","content":"Hello"}]
}' \
http://localhost:8080/v1/messagesPOST /v1/files
GET /v1/files
GET /v1/files/{file_id}
DELETE /v1/files/{file_id}
File management operations for uploading, listing, retrieving, and deleting files. Proxied to the upstream provider's files endpoints.
Default provider: openai (falls back to any available provider)
Example (upload):
curl -X POST -H "Authorization: Bearer $TOKEN" \
-F "file=@document.pdf" \
-F "purpose=assistants" \
http://localhost:8080/v1/filesPOST /v1/batches
Batch API operations for processing multiple requests asynchronously. Proxied to the upstream provider's /v1/batches endpoint.
Default provider: openai (falls back to any available provider)
Example:
curl -X POST -H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"input_file_id":"file-abc123",
"endpoint":"/v1/chat/completions",
"completion_window":"24h"
}' \
http://localhost:8080/v1/batchesPOST /v1/responses
Responses API passthrough endpoint. Proxied to the upstream provider's /v1/responses endpoint.
Default provider: openai (falls back to any available provider)
Example:
curl -X POST -H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"input":"What is the capital of France?",
"model":"gpt-4o"
}' \
http://localhost:8080/v1/responsesPOST /v1/a2a
Agent-to-Agent communication protocol (Google A2A). Proxied to the upstream provider's /v1/a2a endpoint.
Default provider: gemini (falls back to any available provider)
Example:
curl -X POST -H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{"agent_id":"agent-123","message":"Hello from agent A"}' \
http://localhost:8080/v1/a2aPOST /v1/chat/completions supports both stream: false (default, per OpenAI spec) and stream: true.
When stream: false, the gateway buffers the upstream response into a complete JSON object before returning. Response routing uses the upstream Content-Type header (text/event-stream → streaming path, otherwise → non-streaming) rather than the request's stream flag. All pipeline features (redaction, routing, circuit breaker, MCP loop) work the same way.
Note: If the client omits the stream field, the gateway defaults to stream: false per the OpenAI API specification.
The /proxy/{provider}/* endpoint routes to any provider endpoint:
POST /proxy/anthropic/v1/messages
GET /proxy/gemini/v1/models
POST /proxy/openai/v1/files
See Passthrough Proxy for details.
For each endpoint, the gateway selects a provider in this order:
- Preferred provider (named above) — if configured and has an API key
- Any configured provider with an API key (first found)
To configure a specific provider for an extension endpoint, add it to your config's providers section with the desired name.
Providers can override the upstream URL for any extension endpoint using the format_urls map in the provider configuration:
{
"providers": {
"my-openai": {
"url": "https://api.my-openai.com/v1/chat/completions",
"format_urls": {
"images/generations": "https://images.my-openai.com/v1/images/generations",
"moderations": "https://moderation.my-openai.com/v1/moderations"
}
}
}
}Simple health check for load balancers and orchestration:
curl http://localhost:8080/healthz
# OKUsage statistics and system state:
curl http://localhost:8080/statszReturns: per-model request/error/token counters, circuit breaker states, MCP server status, latency data.
Prometheus-compatible metrics for monitoring:
curl http://localhost:8080/metricsIncludes: request counts, token usage, latency histograms, circuit breaker states, rate limiter status, overflow guard triggers, MCP active goroutines.
Go pprof profiling endpoints for performance analysis:
curl http://localhost:8080/debug/pprof/Available profiles:
-
heap- Memory heap sampling -
goroutine- Goroutine stack traces -
profile- CPU profiling (30s by default) -
block- Blocking operations -
mutex- Contention analysis
Enable in config:
{
"debug": {
"pprof_enabled": true
}
}Then use go tool pprof:
go tool pprof http://localhost:8080/debug/pprof/profileSecurity: Requires Authorization: Bearer <token> header like all /v1/* endpoints. Only enable in production with proper access controls.
All endpoints enforce role-based access control when using API keys:
- Roles: admin (full access), user (configured agents + non-admin endpoints), read-only (GET only)
-
Agent scoping: Restrict access to specific agents via
allowed_agentslist -
Endpoint allowlists: Fine-grained access control via
allowed_endpointslist - Expiration: Keys can have expiration dates
- Enable/disable: Keys can be enabled or disabled
- Quick Start — First run
- Passthrough Proxy — Passthrough endpoint reference
- Configuration — Config sections
Getting Started
- Home — Project overview
- Quick Start — Install and run in 5 minutes
- Client Setup — OpenCode, Cursor, and other clients
- Deployment — Bare metal, container, Kubernetes
Core Concepts
- Configuration — Config reference and examples
- Providers — 24 providers, capabilities, special behaviors
- Routing — Latency-aware routing and fallback chains
- Architecture — Package overview and request lifecycle
- MCP Integration — MCP server integration
Reference
- Passthrough Proxy — Raw provider endpoint proxying
- Secrets — Systemd credentials and container secrets
- Model Discovery — Dynamic model catalog fetching
- API Endpoints — Endpoint reference
- Adapters — Provider adapter system
- Billing — Billing-aware routing and quota tracking
- Caching — Exact-match and semantic caching
- Provider Capabilities — Service kinds matrix
- Unknown MaxContext — Unknown context window behavior
Operations
- Demo — Test all pipeline tiers
- Troubleshooting — Common issues and solutions
- FAQ — Frequently asked questions
- Security — Security policy and vulnerability reporting
Project
- Roadmap — Planned features
- Disclaimer — Legal disclaimer