Skip to content

API Endpoints

Rafael Gumieri edited this page Jun 15, 2026 · 7 revisions

API Endpoints

All /v1/* and /proxy/* endpoints require Authorization: Bearer <client_token> or Bearer <api_key_token>. API keys support RBAC — roles (admin/user/read-only), agent scoping, and endpoint restrictions. See Secrets for full RBAC configuration.

Endpoint Auth Description
POST /v1/chat/completions Bearer OpenAI-compatible chat with SSE streaming, agent fallback, MCP multi-turn tool loop
POST /v1/messages Bearer Anthropic Messages API with bidirectional OpenAI↔Anthropic format conversion. Non-streaming responses correctly emit choices[].message (not choices[].delta) for Anthropic-to-OpenAI conversion.
GET /v1/models Bearer Live model catalog from discovered providers + static registry (includes context window, max tokens, capabilities, pricing)
POST /v1/embeddings Bearer Passthrough proxy with token counting and rate limiting
POST /v1/responses Bearer Passthrough proxy with URL resolution, path traversal hardening, and retry logic
POST /v1/images/generations Bearer Image generation (OpenAI-compatible)
POST /v1/audio/transcriptions Bearer Audio transcription (multipart form-data support)
POST /v1/audio/speech Bearer Text-to-speech synthesis
POST /v1/moderations Bearer Content moderation
POST /v1/rerank Bearer Re-ranking (Cohere/Jina-compatible)
POST /v1/a2a Bearer Agent-to-Agent protocol (Google A2A)
GET /v1/files Bearer File listing
POST /v1/files Bearer File upload
DELETE /v1/files Bearer File deletion
POST/GET /v1/batches Bearer Batch API operations
POST /proxy/{provider}/* Bearer Arbitrary provider endpoint passthrough (all HTTP methods, SSE streaming auto-detect)
GET /healthz None Engine health probe (returns OK if gateway is running)
GET /statsz None Token usage per model, circuit breaker states, MCP server status
GET /metrics None Prometheus-compatible metrics
GET /debug/pprof Bearer Go pprof profiling (CPU, memory, goroutines). Requires debug.pprof_enabled: true in config

Extension Endpoints

Image Generation

POST /v1/images/generations

Generate images from text prompts. Proxied to the upstream provider's /v1/images/generations endpoint.

Default provider: openai

Example:

curl -X POST -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"prompt":"A beautiful sunset","n":1,"size":"1024x1024"}' \
  http://localhost:8080/v1/images/generations

Audio Transcription

POST /v1/audio/transcriptions

Transcribe audio files to text. Sends multipart/form-data to the upstream provider's /v1/audio/transcriptions endpoint. The original Content-Type (including boundary) is preserved.

Default provider: openai

Example:

curl -X POST -H "Authorization: Bearer $TOKEN" \
  -F "file=@audio.mp3" \
  -F "model=whisper-1" \
  http://localhost:8080/v1/audio/transcriptions

Text-to-Speech

POST /v1/audio/speech

Generate audio from text. Proxied to the upstream provider's /v1/audio/speech endpoint. Returns the audio stream (e.g., audio/mpeg) directly.

Default provider: openai

Example:

curl -X POST -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"model":"tts-1","input":"Hello world","voice":"alloy"}' \
  http://localhost:8080/v1/audio/speech -o speech.mp3

Content Moderation

POST /v1/moderations

Classify text for potentially harmful content. Proxied to the upstream provider's /v1/moderations endpoint.

Default provider: openai

Example:

curl -X POST -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"input":"I want to harm someone"}' \
  http://localhost:8080/v1/moderations

Re-ranking

POST /v1/rerank

Re-rank documents by relevance to a query. Proxied to the upstream provider's /v1/rerank endpoint.

Default provider: cohere (falls back to any available provider)

Example:

curl -X POST -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "model":"rerank-english-v2.0",
    "query":"What is the capital of France?",
    "documents":["Paris is the capital.","The Eiffel Tower is in Paris."]
  }' \
  http://localhost:8080/v1/rerank

Anthropic Messages API

POST /v1/messages

Anthropic Messages API endpoint with bidirectional format conversion between OpenAI and Anthropic wire formats. Supports Anthropic-native clients directly. Proxied to the upstream provider's /v1/messages endpoint.

Default provider: anthropic (falls back to any available provider)

Example:

curl -X POST -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model":"claude-3-5-sonnet-20241022",
    "max_tokens":1024,
    "messages":[{"role":"user","content":"Hello"}]
  }' \
  http://localhost:8080/v1/messages

Files

POST   /v1/files
GET    /v1/files
GET    /v1/files/{file_id}
DELETE /v1/files/{file_id}

File management operations for uploading, listing, retrieving, and deleting files. Proxied to the upstream provider's files endpoints.

Default provider: openai (falls back to any available provider)

Example (upload):

curl -X POST -H "Authorization: Bearer $TOKEN" \
  -F "file=@document.pdf" \
  -F "purpose=assistants" \
  http://localhost:8080/v1/files

Batch Operations

POST /v1/batches

Batch API operations for processing multiple requests asynchronously. Proxied to the upstream provider's /v1/batches endpoint.

Default provider: openai (falls back to any available provider)

Example:

curl -X POST -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "input_file_id":"file-abc123",
    "endpoint":"/v1/chat/completions",
    "completion_window":"24h"
  }' \
  http://localhost:8080/v1/batches

Responses API

POST /v1/responses

Responses API passthrough endpoint. Proxied to the upstream provider's /v1/responses endpoint.

Default provider: openai (falls back to any available provider)

Example:

curl -X POST -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "input":"What is the capital of France?",
    "model":"gpt-4o"
  }' \
  http://localhost:8080/v1/responses

Agent-to-Agent (A2A)

POST /v1/a2a

Agent-to-Agent communication protocol (Google A2A). Proxied to the upstream provider's /v1/a2a endpoint.

Default provider: gemini (falls back to any available provider)

Example:

curl -X POST -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"agent_id":"agent-123","message":"Hello from agent A"}' \
  http://localhost:8080/v1/a2a

Non-Streaming Chat Completions

POST /v1/chat/completions supports both stream: false (default, per OpenAI spec) and stream: true.

When stream: false, the gateway buffers the upstream response into a complete JSON object before returning. Response routing uses the upstream Content-Type header (text/event-stream → streaming path, otherwise → non-streaming) rather than the request's stream flag. All pipeline features (redaction, routing, circuit breaker, MCP loop) work the same way.

Note: If the client omits the stream field, the gateway defaults to stream: false per the OpenAI API specification.

Passthrough Proxy

The /proxy/{provider}/* endpoint routes to any provider endpoint:

POST /proxy/anthropic/v1/messages
GET /proxy/gemini/v1/models
POST /proxy/openai/v1/files

See Passthrough Proxy for details.

Provider Selection

For each endpoint, the gateway selects a provider in this order:

  1. Preferred provider (named above) — if configured and has an API key
  2. Any configured provider with an API key (first found)

To configure a specific provider for an extension endpoint, add it to your config's providers section with the desired name.

Custom Endpoint URLs via FormatURLs

Providers can override the upstream URL for any extension endpoint using the format_urls map in the provider configuration:

{
  "providers": {
    "my-openai": {
      "url": "https://api.my-openai.com/v1/chat/completions",
      "format_urls": {
        "images/generations": "https://images.my-openai.com/v1/images/generations",
        "moderations": "https://moderation.my-openai.com/v1/moderations"
      }
    }
  }
}

/healthz

Simple health check for load balancers and orchestration:

curl http://localhost:8080/healthz
# OK

/statsz

Usage statistics and system state:

curl http://localhost:8080/statsz

Returns: per-model request/error/token counters, circuit breaker states, MCP server status, latency data.

/metrics

Prometheus-compatible metrics for monitoring:

curl http://localhost:8080/metrics

Includes: request counts, token usage, latency histograms, circuit breaker states, rate limiter status, overflow guard triggers, MCP active goroutines.

/debug/pprof

Go pprof profiling endpoints for performance analysis:

curl http://localhost:8080/debug/pprof/

Available profiles:

  • heap - Memory heap sampling
  • goroutine - Goroutine stack traces
  • profile - CPU profiling (30s by default)
  • block - Blocking operations
  • mutex - Contention analysis

Enable in config:

{
  "debug": {
    "pprof_enabled": true
  }
}

Then use go tool pprof:

go tool pprof http://localhost:8080/debug/pprof/profile

Security: Requires Authorization: Bearer <token> header like all /v1/* endpoints. Only enable in production with proper access controls.

RBAC Enforcement

All endpoints enforce role-based access control when using API keys:

  • Roles: admin (full access), user (configured agents + non-admin endpoints), read-only (GET only)
  • Agent scoping: Restrict access to specific agents via allowed_agents list
  • Endpoint allowlists: Fine-grained access control via allowed_endpoints list
  • Expiration: Keys can have expiration dates
  • Enable/disable: Keys can be enabled or disabled

See Also

Getting Started

Core Concepts

Reference

Operations

  • Demo — Test all pipeline tiers
  • Troubleshooting — Common issues and solutions
  • FAQ — Frequently asked questions
  • Security — Security policy and vulnerability reporting

Project

Clone this wiki locally