API Endpoints

All /v1/* and /proxy/* endpoints require Authorization: Bearer <client_token> or Bearer <api_key_token>. API keys support RBAC — roles (admin/user/read-only), agent scoping, and endpoint restrictions. See Secrets for full RBAC configuration.

Endpoint	Auth	Description
`POST /v1/chat/completions`	Bearer	OpenAI-compatible chat with SSE streaming, agent fallback, MCP multi-turn tool loop
`POST /v1/messages`	Bearer	Anthropic Messages API with bidirectional OpenAI↔Anthropic format conversion. Non-streaming responses correctly emit `choices[].message` (not `choices[].delta`) for Anthropic-to-OpenAI conversion.
`GET /v1/models`	Bearer	Live model catalog from discovered providers + static registry (includes context window, max tokens, capabilities, pricing)
`POST /v1/embeddings`	Bearer	Passthrough proxy with token counting and rate limiting
`POST /v1/responses`	Bearer	Passthrough proxy with URL resolution, path traversal hardening, and retry logic
`POST /v1/images/generations`	Bearer	Image generation (OpenAI-compatible)
`POST /v1/audio/transcriptions`	Bearer	Audio transcription (multipart form-data support)
`POST /v1/audio/speech`	Bearer	Text-to-speech synthesis
`POST /v1/moderations`	Bearer	Content moderation
`POST /v1/rerank`	Bearer	Re-ranking (Cohere/Jina-compatible)
`POST /v1/a2a`	Bearer	Agent-to-Agent protocol (Google A2A)
`GET /v1/files`	Bearer	File listing
`POST /v1/files`	Bearer	File upload
`DELETE /v1/files`	Bearer	File deletion
`POST/GET /v1/batches`	Bearer	Batch API operations
`POST /proxy/{provider}/*`	Bearer	Arbitrary provider endpoint passthrough (all HTTP methods, SSE streaming auto-detect)
`GET /healthz`	None	Engine health probe (returns OK if gateway is running)
`GET /statsz`	None	Token usage per model, circuit breaker states, MCP server status
`GET /metrics`	None	Prometheus-compatible metrics
`GET /debug/pprof`	Bearer	Go pprof profiling (CPU, memory, goroutines). Requires `debug.pprof_enabled: true` in config

Extension Endpoints

Image Generation

POST /v1/images/generations

Generate images from text prompts. Proxied to the upstream provider's /v1/images/generations endpoint.

Default provider: openai

Example:

curl -X POST -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"prompt":"A beautiful sunset","n":1,"size":"1024x1024"}' \
  http://localhost:8080/v1/images/generations

Audio Transcription

POST /v1/audio/transcriptions

Transcribe audio files to text. Sends multipart/form-data to the upstream provider's /v1/audio/transcriptions endpoint. The original Content-Type (including boundary) is preserved.

Default provider: openai

Example:

curl -X POST -H "Authorization: Bearer $TOKEN" \
  -F "file=@audio.mp3" \
  -F "model=whisper-1" \
  http://localhost:8080/v1/audio/transcriptions

Text-to-Speech

POST /v1/audio/speech

Generate audio from text. Proxied to the upstream provider's /v1/audio/speech endpoint. Returns the audio stream (e.g., audio/mpeg) directly.

Default provider: openai

Example:

curl -X POST -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"model":"tts-1","input":"Hello world","voice":"alloy"}' \
  http://localhost:8080/v1/audio/speech -o speech.mp3

Content Moderation

POST /v1/moderations

Classify text for potentially harmful content. Proxied to the upstream provider's /v1/moderations endpoint.

Default provider: openai

Example:

curl -X POST -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"input":"I want to harm someone"}' \
  http://localhost:8080/v1/moderations

Re-ranking

POST /v1/rerank

Re-rank documents by relevance to a query. Proxied to the upstream provider's /v1/rerank endpoint.

Default provider: cohere (falls back to any available provider)

Example:

curl -X POST -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "model":"rerank-english-v2.0",
    "query":"What is the capital of France?",
    "documents":["Paris is the capital.","The Eiffel Tower is in Paris."]
  }' \
  http://localhost:8080/v1/rerank

Anthropic Messages API

POST /v1/messages

Anthropic Messages API endpoint with bidirectional format conversion between OpenAI and Anthropic wire formats. Supports Anthropic-native clients directly. Proxied to the upstream provider's /v1/messages endpoint.

Default provider: anthropic (falls back to any available provider)

Example:

curl -X POST -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model":"claude-3-5-sonnet-20241022",
    "max_tokens":1024,
    "messages":[{"role":"user","content":"Hello"}]
  }' \
  http://localhost:8080/v1/messages

Files

POST   /v1/files
GET    /v1/files
GET    /v1/files/{file_id}
DELETE /v1/files/{file_id}

File management operations for uploading, listing, retrieving, and deleting files. Proxied to the upstream provider's files endpoints.

Default provider: openai (falls back to any available provider)

Example (upload):

curl -X POST -H "Authorization: Bearer $TOKEN" \
  -F "file=@document.pdf" \
  -F "purpose=assistants" \
  http://localhost:8080/v1/files

Batch Operations

POST /v1/batches

Batch API operations for processing multiple requests asynchronously. Proxied to the upstream provider's /v1/batches endpoint.

Default provider: openai (falls back to any available provider)

Example:

curl -X POST -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "input_file_id":"file-abc123",
    "endpoint":"/v1/chat/completions",
    "completion_window":"24h"
  }' \
  http://localhost:8080/v1/batches

Responses API

POST /v1/responses

Responses API passthrough endpoint. Proxied to the upstream provider's /v1/responses endpoint.

Default provider: openai (falls back to any available provider)

Example:

curl -X POST -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "input":"What is the capital of France?",
    "model":"gpt-4o"
  }' \
  http://localhost:8080/v1/responses

Agent-to-Agent (A2A)

POST /v1/a2a

Agent-to-Agent communication protocol (Google A2A). Proxied to the upstream provider's /v1/a2a endpoint.

Default provider: gemini (falls back to any available provider)

Example:

curl -X POST -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"agent_id":"agent-123","message":"Hello from agent A"}' \
  http://localhost:8080/v1/a2a

Non-Streaming Chat Completions

POST /v1/chat/completions supports both stream: false (default, per OpenAI spec) and stream: true.

When stream: false, the gateway buffers the upstream response into a complete JSON object before returning. Response routing uses the upstream Content-Type header (text/event-stream → streaming path, otherwise → non-streaming) rather than the request's stream flag. All pipeline features (redaction, routing, circuit breaker, MCP loop) work the same way.

Note: If the client omits the stream field, the gateway defaults to stream: false per the OpenAI API specification.

Passthrough Proxy

The /proxy/{provider}/* endpoint routes to any provider endpoint:

POST /proxy/anthropic/v1/messages
GET /proxy/gemini/v1/models
POST /proxy/openai/v1/files

See Passthrough Proxy for details.

Provider Selection

For each endpoint, the gateway selects a provider in this order:

Preferred provider (named above) — if configured and has an API key
Any configured provider with an API key (first found)

To configure a specific provider for an extension endpoint, add it to your config's providers section with the desired name.

Custom Endpoint URLs via FormatURLs

Providers can override the upstream URL for any extension endpoint using the format_urls map in the provider configuration:

{
  "providers": {
    "my-openai": {
      "url": "https://api.my-openai.com/v1/chat/completions",
      "format_urls": {
        "images/generations": "https://images.my-openai.com/v1/images/generations",
        "moderations": "https://moderation.my-openai.com/v1/moderations"
      }
    }
  }
}

/healthz

Simple health check for load balancers and orchestration:

curl http://localhost:8080/healthz
# OK

/statsz

Usage statistics and system state:

curl http://localhost:8080/statsz

Returns: per-model request/error/token counters, circuit breaker states, MCP server status, latency data.

/metrics

Prometheus-compatible metrics for monitoring:

curl http://localhost:8080/metrics

Includes: request counts, token usage, latency histograms, circuit breaker states, rate limiter status, overflow guard triggers, MCP active goroutines.

/debug/pprof

Go pprof profiling endpoints for performance analysis:

curl http://localhost:8080/debug/pprof/

Available profiles:

heap - Memory heap sampling
goroutine - Goroutine stack traces
profile - CPU profiling (30s by default)
block - Blocking operations
mutex - Contention analysis

Enable in config:

{
  "debug": {
    "pprof_enabled": true
  }
}

Then use go tool pprof:

go tool pprof http://localhost:8080/debug/pprof/profile

Security: Requires Authorization: Bearer <token> header like all /v1/* endpoints. Only enable in production with proper access controls.

RBAC Enforcement

All endpoints enforce role-based access control when using API keys:

Roles: admin (full access), user (configured agents + non-admin endpoints), read-only (GET only)
Agent scoping: Restrict access to specific agents via allowed_agents list
Endpoint allowlists: Fine-grained access control via allowed_endpoints list
Expiration: Keys can have expiration dates
Enable/disable: Keys can be enabled or disabled

Uh oh!

API Endpoints

API Endpoints

Extension Endpoints

Image Generation

Audio Transcription

Text-to-Speech

Content Moderation

Re-ranking

Anthropic Messages API

Files

Batch Operations

Responses API

Agent-to-Agent (A2A)

Non-Streaming Chat Completions

Passthrough Proxy

Provider Selection

Custom Endpoint URLs via FormatURLs

/healthz

/statsz

/metrics

/debug/pprof

RBAC Enforcement

See Also

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally