diff --git a/README.md b/README.md
index 82f47d13..4da6d3cc 100644
--- a/README.md
+++ b/README.md
@@ -23,7 +23,7 @@
- A fast and lightweight AI gateway written in Go, providing a unified OpenAI-compatible API for OpenAI, Anthropic, Gemini, DeepSeek, xAI, Groq, OpenRouter, Z.ai, Azure OpenAI, Oracle, Ollama, and more.
+ A fast and lightweight AI gateway written in Go, providing unified OpenAI-compatible and Anthropic-compatible APIs for OpenAI, Anthropic, Gemini, DeepSeek, xAI, Groq, OpenRouter, Z.ai, Azure OpenAI, Oracle, Ollama, and more.
@@ -44,33 +44,9 @@ docker run --rm -p 8080:8080 \
enterpilot/gomodel
```
-Pass only the provider credentials or base URL you need (at least one required):
+Full list of environment variables (including all available providers): [`.env.template`](./.env.template)
-```bash
-docker run --rm -p 8080:8080 \
- -e OPENAI_API_KEY="your-openai-key" \
- -e ANTHROPIC_API_KEY="your-anthropic-key" \
- -e GEMINI_API_KEY="your-gemini-key" \
- -e VERTEX_PROJECT="your-gcp-project" \
- -e VERTEX_LOCATION="us-central1" \
- -e VERTEX_AUTH_TYPE="gcp_adc" \
- -e DEEPSEEK_API_KEY="your-deepseek-key" \
- -e GROQ_API_KEY="your-groq-key" \
- -e OPENROUTER_API_KEY="your-openrouter-key" \
- -e ZAI_API_KEY="your-zai-key" \
- -e XAI_API_KEY="your-xai-key" \
- -e AZURE_API_KEY="your-azure-key" \
- -e AZURE_BASE_URL="https://your-resource.openai.azure.com/openai/deployments/your-deployment" \
- -e AZURE_API_VERSION="2024-10-21" \
- -e ORACLE_API_KEY="your-oracle-key" \
- -e ORACLE_BASE_URL="https://inference.generativeai.us-chicago-1.oci.oraclecloud.com/20231130/actions/v1" \
- -e ORACLE_MODELS="openai.gpt-oss-120b,xai.grok-3" \
- -e OLLAMA_BASE_URL="http://host.docker.internal:11434/v1" \
- -e VLLM_BASE_URL="http://host.docker.internal:8000/v1" \
- enterpilot/gomodel
-```
-
-⚠️ Avoid passing secrets via `-e` on the command line - they can leak via shell history and process lists. For production, use `docker run --env-file .env` to load API keys from a file instead.
+⚠️ Avoid passing secrets with `-e` on the command line in production — they can leak through shell history and process lists. Use `docker run --env-file .env` to load API keys from a file instead.
**Step 2:** Make your first API call
@@ -87,63 +63,14 @@ curl http://localhost:8080/v1/chat/completions \
### Supported LLM Providers
-Example model identifiers are illustrative and subject to change; consult provider catalogs for current models. Feature columns reflect gateway API support, not every individual model capability exposed by an upstream provider.
-
-| Provider | Credential | Example Model | Chat | `/responses` | Embed | Files | Batches | Passthru |
-| ------------------------------------ | ----------------------------------------------------------------- | ------------------------------------------ | :--: | :----------: | :---: | :---: | :-----: | :------: |
-| OpenAI | `OPENAI_API_KEY` | `gpt-5.5` | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
-| Anthropic | `ANTHROPIC_API_KEY` | `claude-sonnet-4-20250514` | ✅ | ✅ | ❌ | ❌ | ✅ | ✅ |
-| Google Gemini | `GEMINI_API_KEY` | `gemini-2.5-flash` | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ |
-| Vertex AI | `VERTEX_PROJECT` + `VERTEX_LOCATION` | `google/gemini-2.5-flash` | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ |
-| DeepSeek | `DEEPSEEK_API_KEY` | `deepseek-v4-pro` | ✅ | ✅ | ❌ | ❌ | ❌ | ✅ |
-| Groq | `GROQ_API_KEY` | `llama-3.3-70b-versatile` | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ |
-| OpenRouter | `OPENROUTER_API_KEY` | `google/gemini-2.5-flash` | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
-| Z.ai | `ZAI_API_KEY` (`ZAI_BASE_URL` optional) | `glm-5.1` | ✅ | ✅ | ✅ | ❌ | ❌ | ✅ |
-| xAI (Grok) | `XAI_API_KEY` | `grok-4` | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ |
-| Alibaba Cloud Model Studio (Bailian) | `BAILIAN_API_KEY` (`BAILIAN_BASE_URL` optional) | `qwen3-max` | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
-| MiniMax | `MINIMAX_API_KEY` (`MINIMAX_BASE_URL` optional) | `MiniMax-M3` | ✅ | ✅ | ✅ | ❌ | ❌ | ✅ |
-| Xiaomi MiMo | `XIAOMI_API_KEY` (`XIAOMI_BASE_URL` optional) | `mimo-v2.5-pro` | ✅ | ✅ | ❌ | ❌ | ❌ | ✅ |
-| OpenCode Go | `OPENCODE_GO_API_KEY` (`OPENCODE_GO_BASE_URL` optional) | `glm-5.1` | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ |
-| Azure OpenAI | `AZURE_API_KEY` + `AZURE_BASE_URL` (`AZURE_API_VERSION` optional) | `gpt-5` | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
-| Oracle | `ORACLE_API_KEY` + `ORACLE_BASE_URL` | `openai.gpt-oss-120b` | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ |
-| Ollama | `OLLAMA_BASE_URL` | `llama3.2` | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ |
-| vLLM | `VLLM_BASE_URL` (`VLLM_API_KEY` optional) | `meta-llama/Llama-3.1-8B-Instruct` | ✅ | ✅ | ✅ | ❌ | ❌ | ✅ |
-| Amazon Bedrock | `BEDROCK_BASE_URL` (region or endpoint) + AWS credentials | `anthropic.claude-3-5-haiku-20241022-v1:0` | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ |
-
-✅ Supported ❌ Unsupported
-
-For Z.ai's GLM Coding Plan, set `ZAI_BASE_URL=https://api.z.ai/api/coding/paas/v4`.
-Xiaomi MiMo TTS (`mimo-v2.5-tts*`) and ASR (`mimo-v2.5-asr`) are served through
-`/v1/audio/speech` and `/v1/audio/transcriptions` (translated to MiMo's
-chat-completions audio dialect) as well as directly via chat completions; for
-1M context append `[1m]` to the model ID and list it in `XIAOMI_MODELS`.
-OpenCode Go (OpenCode Zen) routes per model — most models use OpenAI-style
-`/chat/completions`, while `/messages`-only models (default `qwen3.7-max`,
-override with `OPENCODE_GO_MESSAGES_MODELS`) are sent to the Anthropic-native
-endpoint. Set `OPENCODE_GO_API_KEY`; the base URL defaults to
-`https://opencode.ai/zen/go/v1`.
-Configured model lists are available for every provider with
-`_MODELS`, for example
-`OPENROUTER_MODELS=openai/gpt-oss-120b,anthropic/claude-sonnet-4` or
-`ORACLE_MODELS=openai.gpt-oss-120b,xai.grok-3`. DeepSeek defaults to
-`https://api.deepseek.com`; set `DEEPSEEK_BASE_URL` only when using a compatible
-proxy or alternate DeepSeek endpoint. By default,
-`CONFIGURED_PROVIDER_MODELS_MODE=fallback` uses those lists only when upstream
-`/models` is unavailable or empty. Set `CONFIGURED_PROVIDER_MODELS_MODE=allowlist`
-to expose only configured models for providers that define a list, skipping
-their upstream `/models` calls.
-For vLLM, set `VLLM_API_KEY` only if the upstream server was started with
-`--api-key`.
-To register multiple instances of the same provider type without `config.yaml`,
-use suffixed env vars such as `OPENAI_EAST_API_KEY` and
-`OPENAI_EAST_BASE_URL`; add `OPENAI_EAST_MODELS` to configure that instance's
-model list. This registers provider `openai-east` with type `openai`.
-Vertex AI follows the same suffix pattern — `VERTEX_US_PROJECT` registers
-provider `vertex-us`. Vertex project and location env vars must match the
-instance prefix: for a suffixed instance such as `VERTEX_US_PROJECT`, also set
-`VERTEX_US_LOCATION` and any other suffixed settings for that instance, rather
-than the generic `VERTEX_PROJECT` / `VERTEX_LOCATION`. `VERTEX_AUTH_TYPE`
-defaults to Application Default Credentials (`gcp_adc`).
+GoModel supports OpenAI, Anthropic, Google Gemini, Vertex AI, DeepSeek, Groq,
+OpenRouter, Z.ai, xAI (Grok), Alibaba Cloud Model Studio (Bailian), MiniMax,
+Xiaomi MiMo, OpenCode Go, Azure OpenAI, Oracle, Ollama, vLLM, Amazon Bedrock,
+and all OpenAI-compatible providers.
+
+See the [Providers Overview](./docs/providers/overview.mdx) for the full
+per-provider feature matrix (chat, `/responses`, embeddings, files, batches,
+passthrough), credentials, and configuration notes.
---
@@ -202,163 +129,29 @@ docker run --rm -p 8080:8080 --env-file .env gomodel
## API Endpoints
-### OpenAI-Compatible API
-
-| Endpoint | Method | Description |
-| -------------------------- | ------ | ------------------------------------------------------------------------------------------------------------ |
-| `/v1/chat/completions` | POST | Chat completions (streaming supported) |
-| `/v1/responses` | POST | OpenAI Responses API |
-| `/v1/conversations` | POST | Create a conversation (gateway-managed) |
-| `/v1/conversations/{id}` | GET | Retrieve a conversation |
-| `/v1/conversations/{id}` | POST | Replace conversation metadata in full |
-| `/v1/conversations/{id}` | DELETE | Delete a conversation |
-| `/v1/embeddings` | POST | Text embeddings |
-| `/v1/models` | GET | List available models |
-| `/v1/files` | POST | Upload a file (OpenAI-compatible multipart) |
-| `/v1/files` | GET | List files |
-| `/v1/files/{id}` | GET | Retrieve file metadata |
-| `/v1/files/{id}` | DELETE | Delete a file |
-| `/v1/files/{id}/content` | GET | Retrieve raw file content |
-| `/v1/batches` | POST | Create a native provider batch (OpenAI-compatible schema; inline `requests` supported where provider-native) |
-| `/v1/batches` | GET | List stored batches |
-| `/v1/batches/{id}` | GET | Retrieve one stored batch |
-| `/v1/batches/{id}/cancel` | POST | Cancel a pending batch |
-| `/v1/batches/{id}/results` | GET | Retrieve native batch results when available |
-
-### Anthropic-Compatible API
-
-| Endpoint | Method | Description |
-| --------------------------- | ------ | ----------------------------------------------------------------------------- |
-| `/v1/messages` | POST | Anthropic Messages API through translated model routing (streaming supported) |
-| `/v1/messages/count_tokens` | POST | Heuristic Anthropic Messages input token estimate |
-
-### Provider Passthrough
-
-| Endpoint | Method | Description |
-| ------------------- | -------------------------------------------- | ---------------------------------------------------------- |
-| `/p/{provider}/...` | GET, POST, PUT, PATCH, DELETE, HEAD, OPTIONS | Provider-native passthrough with opaque upstream responses |
-
-### Admin Endpoints
-
-| Endpoint | Method | Description |
-| --------------------------- | ------ | ------------------------------------------ |
-| `/admin/dashboard` | GET | Admin dashboard UI |
-| `/admin/runtime/config` | GET | Admin runtime configuration |
-| `/admin/cache/overview` | GET | Cache statistics overview |
-| `/admin/usage/summary` | GET | Aggregate token usage statistics |
-| `/admin/usage/daily` | GET | Per-period token usage breakdown |
-| `/admin/usage/models` | GET | Usage breakdown by model |
-| `/admin/usage/user-paths` | GET | Usage breakdown by user path |
-| `/admin/usage/log` | GET | Paginated usage log entries |
-| `/admin/audit/detail` | GET | Detailed audit entry information |
-| `/admin/audit/log` | GET | Paginated audit log entries |
-| `/admin/audit/conversation` | GET | Conversation thread around one audit entry |
-| `/admin/providers/status` | GET | Provider availability status |
-| `/admin/runtime/refresh` | POST | Refresh runtime configuration |
-| `/admin/models` | GET | List models with provider type |
-| `/admin/models/categories` | GET | List model categories |
-| `/admin/model-overrides` | GET | List model overrides |
-| `/admin/model-overrides` | PUT | Create/update model override |
-| `/admin/model-overrides` | DELETE | Remove model override |
-| `/admin/auth-keys` | GET | List authentication keys |
-
-> **Legacy alias:** Until **2026-08-09**, all admin endpoints are also
-> reachable under `/admin/api/v1/*`. Legacy responses include
-> `Deprecation: true` and `Sunset: Sun, 09 Aug 2026 00:00:00 GMT` headers.
-> The endpoint formerly at `/admin/api/v1/dashboard/config` moved to
-> `/admin/runtime/config` on the new prefix.
-
-### Operations Endpoints
-
-| Endpoint | Method | Description |
-| --------------------- | ------ | ----------------------------------------------- |
-| `/health` | GET | Liveness check (always 200 while the process serves) |
-| `/health/ready` | GET | Readiness check: pings storage (503 if down) and Redis cache (degraded, still 200) |
-| `/metrics` | GET | Prometheus metrics (experimental, when enabled) |
-| `/swagger/index.html` | GET | Swagger UI (when enabled) |
+GoModel exposes OpenAI-compatible and Anthropic-compatible APIs, provider-native
+passthrough, and operations routes. See the
+[API Endpoints reference](./docs/advanced/api-endpoints.mdx) for the full
+endpoint tables, and [Admin Endpoints](./docs/advanced/admin-endpoints.mdx) for
+the admin REST API and dashboard.
---
## Gateway Configuration
-GoModel is configured through environment variables and an optional `config.yaml`. Environment variables override YAML values. See [`.env.template`](.env.template) and [`config/config.example.yaml`](config/config.example.yaml) for the available options.
-
-Key settings:
-
-| Variable | Default | Description |
-| --------------------------------------- | ----------------------------------------------- | ----------------------------------------------------------------------------------- |
-| `PORT` | `8080` | Server port |
-| `BASE_PATH` | `/` | Mount the gateway under a path prefix such as `/g` |
-| `GOMODEL_MASTER_KEY` | (none) | API key for authentication |
-| `USER_PATH_HEADER` | `X-GoModel-User-Path` | Header used to read/write request `user_path` values |
-| `ENABLE_PASSTHROUGH_ROUTES` | `true` | Enable provider-native passthrough routes under `/p/{provider}/...` |
-| `ALLOW_PASSTHROUGH_V1_ALIAS` | `true` | Allow `/p/{provider}/v1/...` aliases while keeping `/p/{provider}/...` canonical |
-| `ENABLED_PASSTHROUGH_PROVIDERS` | `openai,anthropic,openrouter,zai,vllm,deepseek` | Comma-separated list of enabled passthrough providers |
-| `GEMINI_API_MODE` | `native` | Gemini AI Studio upstream mode: `native` or `openai_compatible` |
-| `VERTEX_API_MODE` | `native` | Vertex AI Gemini upstream mode: `native` or `openai_compatible` |
-| `USE_GOOGLE_GEMINI_NATIVE_API` | `true` | Legacy global Gemini mode toggle used when per-provider `*_API_MODE` is unset |
-| `STORAGE_TYPE` | `sqlite` | Storage backend (`sqlite`, `postgresql`, `mongodb`) |
-| `METRICS_ENABLED` | `false` | Enable Prometheus metrics (experimental) |
-| `LOGGING_ENABLED` | `false` | Enable audit logging |
-| `DASHBOARD_LIVE_LOGS_ENABLED` | `true` | Stream realtime dashboard log previews with bounded replay |
-| `DASHBOARD_LIVE_LOGS_BUFFER_SIZE` | `10000` | Max in-memory live events retained; increase above ~1000 msgs/sec or bursty traffic |
-| `DASHBOARD_LIVE_LOGS_REPLAY_LIMIT` | `1000` | Max events replayed after reconnect; increase for longer reconnect windows |
-| `DASHBOARD_LIVE_LOGS_HEARTBEAT_SECONDS` | `15` | SSE heartbeat interval; lower when proxies need faster liveness checks |
-| `GUARDRAILS_ENABLED` | `false` | Enable the configured guardrails pipeline |
+GoModel is configured through environment variables and an optional `config.yaml`. Environment variables override YAML values. See the [Configuration reference](./docs/advanced/configuration.mdx) for the full list of settings organized by category, along with [`.env.template`](./.env.template) and [`config/config.example.yaml`](./config/config.example.yaml).
**Quick Start - Authentication:** By default `GOMODEL_MASTER_KEY` is unset. Without this key, API endpoints are unprotected and anyone can call them. This is insecure for production. **Strongly recommend** setting a strong secret before exposing the service. Add `GOMODEL_MASTER_KEY` to your `.env` or environment for production deployments.
---
-## Response Caching
-
-GoModel has a two-layer response cache that reduces LLM API costs and latency for repeated or semantically similar requests.
-
-### Layer 1 - Exact-match cache
-
-Hashes the full request body (path + `Workflow` + body) and returns a stored response on byte-identical requests. Sub-millisecond lookup. Activate by environment variables: `RESPONSE_CACHE_SIMPLE_ENABLED` and `REDIS_URL`.
-
-Responses served from this layer carry `X-Cache: HIT (exact)`.
-
-### Layer 2 - Semantic cache
-
-Embeds the last user message via your configured provider’s OpenAI-compatible `/v1/embeddings` API (`cache.response.semantic.embedder.provider` must name a key in the top-level `providers` map) and performs a KNN vector search. Semantically equivalent queries - e.g. _"What's the capital of France?"_ vs _"Which city is France's capital?"_ - can return the same cached response without an upstream LLM call.
-
-Expected hit rates: ~60–70% in high-repetition workloads vs. ~18% for exact-match alone.
-
-Responses served from this layer carry `X-Cache: HIT (semantic)`.
-
-Supported vector backends: `qdrant`, `pgvector`, `pinecone`, `weaviate` (set `cache.response.semantic.vector_store.type` and the matching nested block).
-
-Both cache layers run **after** guardrail/workflow patching so they always see the final prompt. Use `Cache-Control: no-cache` or `Cache-Control: no-store` to bypass caching per-request.
-
----
-
See [DEVELOPMENT.md](docs/DEVELOPMENT.md) for testing, linting, and pre-commit setup.
---
# Roadmap
-## Commercial features
-
-- [ ] Intelligent routing
-- [ ] Context window compression
-- [ ] Cluster mode
-
-## Roadmap to 0.2.0
-
-- [ ] UI visibility for prompt caching and local cache usage
-- [ ] Broader provider support, including Cohere, Command A, and Operational
-- [ ] Full support for the OpenAI `/conversations` lifecycle
-- [ ] Guardrails hardening: better UI, simpler architecture, easier custom guardrails, and response-side guardrails before output reaches the client
-- [ ] Provider-native passthrough for all providers, beyond the current beta coverage
-- [x] Budget management with limits per `user_path` and/or API key
-- [x] Editable model pricing for accurate cost tracking and budgeting
-- [x] Full support for the OpenAI `/responses` lifecycle
-- [x] Anthropic-compatible `/messages` ingress and `/messages/count_tokens`
-- [x] Prompt cache visibility showing how much of each prompt was cached by the provider
-- [x] Fix failover charts in the dashboard
+See the [Roadmap](./docs/about/roadmap.mdx) for commercial features and the public 0.2.0 milestone.
## Community
diff --git a/docs/advanced/admin-endpoints.mdx b/docs/advanced/admin-endpoints.mdx
index 16d99dd1..11d401bc 100644
--- a/docs/advanced/admin-endpoints.mdx
+++ b/docs/advanced/admin-endpoints.mdx
@@ -17,26 +17,10 @@ Both are on by default because observability shouldn't be opt-in. If you don't n
## Configuration
-| Variable | Description | Default |
-| ------------------------------------- | ------------------------------------------------ | ------- |
-| `ADMIN_ENDPOINTS_ENABLED` | Enable the admin REST API | `true` |
-| `ADMIN_UI_ENABLED` | Enable the admin dashboard UI | `true` |
-| `DASHBOARD_LIVE_LOGS_ENABLED` | Stream realtime dashboard audit/usage previews | `true` |
-| `DASHBOARD_LIVE_LOGS_BUFFER_SIZE` | In-memory replay window for live dashboard events | `10000` |
-| `DASHBOARD_LIVE_LOGS_REPLAY_LIMIT` | Max events replayed to one reconnecting client | `1000` |
-| `DASHBOARD_LIVE_LOGS_HEARTBEAT_SECONDS` | Idle stream heartbeat interval in seconds | `15` |
-
-Or in YAML:
-
-```yaml
-admin:
- endpoints_enabled: true
- ui_enabled: true
- live_logs_enabled: true
- live_logs_buffer_size: 10000
- live_logs_replay_limit: 1000
- live_logs_heartbeat_seconds: 15
-```
+Admin and dashboard behavior is controlled by environment variables (or the
+equivalent `admin:` YAML block). See
+[Admin configuration](/advanced/configuration#admin) for the full table of
+variables, defaults, and the equivalent `admin:` YAML block.
The dashboard UI requires the REST API to be enabled. If you set
diff --git a/docs/advanced/api-endpoints.mdx b/docs/advanced/api-endpoints.mdx
new file mode 100644
index 00000000..0c709279
--- /dev/null
+++ b/docs/advanced/api-endpoints.mdx
@@ -0,0 +1,73 @@
+---
+title: "API Endpoints"
+description: "Reference for GoModel's OpenAI-compatible and Anthropic-compatible endpoints, provider passthrough, and operations routes."
+icon: "list-tree"
+---
+
+GoModel exposes OpenAI-compatible and Anthropic-compatible APIs, provider-native
+passthrough, and a set of operations endpoints. Admin and dashboard routes are
+documented separately in [Admin Endpoints](/advanced/admin-endpoints).
+
+For request and response details, see the dedicated guides:
+[Responses API](/advanced/responses-api), [Conversations API](/advanced/conversations-api),
+[Anthropic Messages API](/advanced/anthropic-messages-api), and
+[Audio API](/advanced/audio-api).
+
+## OpenAI-Compatible API
+
+| Endpoint | Method | Description |
+| --------------------------------- | ------ | ------------------------------------------------------------------------------------------------------------ |
+| `/v1/chat/completions` | POST | Chat completions (streaming supported) |
+| `/v1/responses` | POST | Create an OpenAI Responses API response |
+| `/v1/responses/{id}` | GET | Retrieve a stored response |
+| `/v1/responses/{id}` | DELETE | Delete a stored response (forwards native deletion where supported) |
+| `/v1/responses/{id}/cancel` | POST | Cancel an in-progress response (provider-native where supported) |
+| `/v1/responses/{id}/input_items` | GET | List the input items of a stored response |
+| `/v1/responses/input_tokens` | POST | Count input tokens for a Responses request |
+| `/v1/responses/compact` | POST | Compact a Responses conversation (provider-native where supported) |
+| `/v1/conversations` | POST | Create a conversation (gateway-managed) |
+| `/v1/conversations/{id}` | GET | Retrieve a conversation |
+| `/v1/conversations/{id}` | POST | Replace conversation metadata in full |
+| `/v1/conversations/{id}` | DELETE | Delete a conversation |
+| `/v1/embeddings` | POST | Text embeddings |
+| `/v1/models` | GET | List available models |
+| `/v1/audio/speech` | POST | Text-to-speech, returning binary audio |
+| `/v1/audio/transcriptions` | POST | Speech-to-text from a multipart upload |
+| `/v1/realtime` | GET | Realtime speech-to-speech websocket upgrade (when `REALTIME_ENABLED`) |
+| `/v1/files` | POST | Upload a file (OpenAI-compatible multipart) |
+| `/v1/files` | GET | List files |
+| `/v1/files/{id}` | GET | Retrieve file metadata |
+| `/v1/files/{id}` | DELETE | Delete a file |
+| `/v1/files/{id}/content` | GET | Retrieve raw file content |
+| `/v1/batches` | POST | Create a native provider batch (OpenAI-compatible schema; inline `requests` supported where provider-native) |
+| `/v1/batches` | GET | List stored batches |
+| `/v1/batches/{id}` | GET | Retrieve one stored batch |
+| `/v1/batches/{id}/cancel` | POST | Cancel a pending batch |
+| `/v1/batches/{id}/results` | GET | Retrieve native batch results when available |
+
+## Anthropic-Compatible API
+
+| Endpoint | Method | Description |
+| --------------------------- | ------ | ----------------------------------------------------------------------------- |
+| `/v1/messages` | POST | Anthropic Messages API through translated model routing (streaming supported) |
+| `/v1/messages/count_tokens` | POST | Heuristic Anthropic Messages input token estimate |
+
+## Provider Passthrough
+
+| Endpoint | Method | Description |
+| ------------------- | -------------------------------------------- | ---------------------------------------------------------- |
+| `/p/{provider}/...` | GET, POST, PUT, PATCH, DELETE, HEAD, OPTIONS | Provider-native passthrough with opaque upstream responses |
+
+## Admin Endpoints
+
+Admin REST and dashboard routes (`/admin/*`) are covered in
+[Admin Endpoints](/advanced/admin-endpoints).
+
+## Operations Endpoints
+
+| Endpoint | Method | Description |
+| --------------------- | ------ | ---------------------------------------------------------------------------------- |
+| `/health` | GET | Liveness check (always 200 while the process serves) |
+| `/health/ready` | GET | Readiness check: pings storage (503 if down) and Redis cache (degraded, still 200) |
+| `/metrics` | GET | Prometheus metrics (experimental, when enabled) |
+| `/swagger/index.html` | GET | Swagger UI (when enabled) |
diff --git a/docs/docs.json b/docs/docs.json
index 275652c0..4ac18d86 100644
--- a/docs/docs.json
+++ b/docs/docs.json
@@ -60,6 +60,7 @@
"advanced/configuration",
"advanced/config-yaml",
"advanced/cli",
+ "advanced/api-endpoints",
"advanced/resilience",
"advanced/responses-api",
"advanced/responses-compatibility",
diff --git a/docs/providers/overview.mdx b/docs/providers/overview.mdx
index 9058a452..281e6e27 100644
--- a/docs/providers/overview.mdx
+++ b/docs/providers/overview.mdx
@@ -13,30 +13,69 @@ quirks.
## Supported providers
-| Provider | Credential | Guide |
-| -------- | ---------- | ----- |
-| OpenAI | `OPENAI_API_KEY` | — |
-| Anthropic | `ANTHROPIC_API_KEY` | [Anthropic](/providers/anthropic) |
-| Google Gemini | `GEMINI_API_KEY` | [Google Gemini](/providers/gemini) |
-| Google Vertex AI | `VERTEX_PROJECT` + `VERTEX_LOCATION` + GCP credentials | [Google Vertex AI](/providers/vertex) |
-| DeepSeek | `DEEPSEEK_API_KEY` | [DeepSeek](/providers/deepseek) |
-| Groq | `GROQ_API_KEY` | — |
-| OpenRouter | `OPENROUTER_API_KEY` | — |
-| Z.ai | `ZAI_API_KEY` (`ZAI_BASE_URL` optional) | — |
-| xAI (Grok) | `XAI_API_KEY` | — |
-| MiniMax | `MINIMAX_API_KEY` (`MINIMAX_BASE_URL` optional) | — |
-| Alibaba Cloud Model Studio (Bailian) | `BAILIAN_API_KEY` (`BAILIAN_BASE_URL` optional) | [Alibaba Cloud Model Studio](/providers/bailian) |
-| Xiaomi MiMo | `XIAOMI_API_KEY` (`XIAOMI_BASE_URL` optional) | [Xiaomi MiMo](/providers/xiaomi) |
-| OpenCode Go | `OPENCODE_GO_API_KEY` (`OPENCODE_GO_BASE_URL` optional) | [OpenCode Go](/providers/opencode-go) |
-| Azure OpenAI | `AZURE_API_KEY` + `AZURE_BASE_URL` (`AZURE_API_VERSION` optional) | [Azure OpenAI](/providers/azure) |
-| Amazon Bedrock | `BEDROCK_BASE_URL` (region or endpoint) + AWS credentials | [Amazon Bedrock](/providers/bedrock) |
-| Oracle GenAI | `ORACLE_API_KEY` + `ORACLE_BASE_URL` | [Oracle GenAI](/providers/oracle) |
-| Ollama | `OLLAMA_BASE_URL` | [Ollama](/providers/multiple-ollama) |
-| vLLM | `VLLM_BASE_URL` (`VLLM_API_KEY` optional) | [vLLM](/providers/vllm) |
+Example model identifiers are illustrative and subject to change; consult
+provider catalogs for current models. Feature columns reflect gateway API
+support, not every individual model capability exposed by an upstream provider.
-See the [README provider table](https://github.com/ENTERPILOT/GoModel#supported-llm-providers)
-for per-provider feature support (chat, Responses, embeddings, files, batches,
-passthrough).
+| Provider | Credential | Example Model | Chat | `/responses` | Embed | Files | Batches | Passthru | Guide |
+| -------- | ---------- | ------------- | :--: | :----------: | :---: | :---: | :-----: | :------: | ----- |
+| OpenAI | `OPENAI_API_KEY` | `gpt-5.5` | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | — |
+| Anthropic | `ANTHROPIC_API_KEY` | `claude-sonnet-4-20250514` | ✅ | ✅ | ❌ | ❌ | ✅ | ✅ | [Anthropic](/providers/anthropic) |
+| Google Gemini | `GEMINI_API_KEY` | `gemini-2.5-flash` | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | [Google Gemini](/providers/gemini) |
+| Google Vertex AI | `VERTEX_PROJECT` + `VERTEX_LOCATION` + GCP credentials | `google/gemini-2.5-flash` | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | [Google Vertex AI](/providers/vertex) |
+| DeepSeek | `DEEPSEEK_API_KEY` | `deepseek-v4-pro` | ✅ | ✅ | ❌ | ❌ | ❌ | ✅ | [DeepSeek](/providers/deepseek) |
+| Groq | `GROQ_API_KEY` | `llama-3.3-70b-versatile` | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | — |
+| OpenRouter | `OPENROUTER_API_KEY` | `google/gemini-2.5-flash` | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | — |
+| Z.ai | `ZAI_API_KEY` (`ZAI_BASE_URL` optional) | `glm-5.1` | ✅ | ✅ | ✅ | ❌ | ❌ | ✅ | — |
+| xAI (Grok) | `XAI_API_KEY` | `grok-4` | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | — |
+| Alibaba Cloud Model Studio (Bailian) | `BAILIAN_API_KEY` (`BAILIAN_BASE_URL` optional) | `qwen3-max` | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | [Alibaba Cloud Model Studio](/providers/bailian) |
+| MiniMax | `MINIMAX_API_KEY` (`MINIMAX_BASE_URL` optional) | `MiniMax-M3` | ✅ | ✅ | ✅ | ❌ | ❌ | ✅ | — |
+| Xiaomi MiMo | `XIAOMI_API_KEY` (`XIAOMI_BASE_URL` optional) | `mimo-v2.5-pro` | ✅ | ✅ | ❌ | ❌ | ❌ | ✅ | [Xiaomi MiMo](/providers/xiaomi) |
+| OpenCode Go | `OPENCODE_GO_API_KEY` (`OPENCODE_GO_BASE_URL` optional) | `glm-5.1` | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | [OpenCode Go](/providers/opencode-go) |
+| Azure OpenAI | `AZURE_API_KEY` + `AZURE_BASE_URL` (`AZURE_API_VERSION` optional) | `gpt-5` | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | [Azure OpenAI](/providers/azure) |
+| Oracle GenAI | `ORACLE_API_KEY` + `ORACLE_BASE_URL` | `openai.gpt-oss-120b` | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | [Oracle GenAI](/providers/oracle) |
+| Ollama | `OLLAMA_BASE_URL` | `llama3.2` | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | [Ollama](/providers/multiple-ollama) |
+| vLLM | `VLLM_BASE_URL` (`VLLM_API_KEY` optional) | `meta-llama/Llama-3.1-8B-Instruct` | ✅ | ✅ | ✅ | ❌ | ❌ | ✅ | [vLLM](/providers/vllm) |
+| Amazon Bedrock | `BEDROCK_BASE_URL` (region or endpoint) + AWS credentials | `anthropic.claude-3-5-haiku-20241022-v1:0` | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | [Amazon Bedrock](/providers/bedrock) |
+
+✅ Supported ❌ Unsupported
+
+## Provider notes
+
+- **Z.ai GLM Coding Plan** — set
+ `ZAI_BASE_URL=https://api.z.ai/api/coding/paas/v4`.
+- **Xiaomi MiMo** — TTS (`mimo-v2.5-tts*`) and ASR (`mimo-v2.5-asr`) are served
+ through `/v1/audio/speech` and `/v1/audio/transcriptions` (translated to
+ MiMo's chat-completions audio dialect) as well as directly via chat
+ completions; for 1M context append `[1m]` to the model ID and list it in
+ `XIAOMI_MODELS`.
+- **OpenCode Go (OpenCode Zen)** — routes per model: most models use
+ OpenAI-style `/chat/completions`, while `/messages`-only models (default
+ `qwen3.7-max`, override with `OPENCODE_GO_MESSAGES_MODELS`) are sent to the
+ Anthropic-native endpoint. Set `OPENCODE_GO_API_KEY`; the base URL defaults to
+ `https://opencode.ai/zen/go/v1`.
+- **Configured model lists** — available for every provider with
+ `_MODELS`, for example
+ `OPENROUTER_MODELS=openai/gpt-oss-120b,anthropic/claude-sonnet-4` or
+ `ORACLE_MODELS=openai.gpt-oss-120b,xai.grok-3`. DeepSeek defaults to
+ `https://api.deepseek.com`; set `DEEPSEEK_BASE_URL` only when using a
+ compatible proxy or alternate DeepSeek endpoint. By default,
+ `CONFIGURED_PROVIDER_MODELS_MODE=fallback` uses those lists only when upstream
+ `/models` is unavailable or empty. Set
+ `CONFIGURED_PROVIDER_MODELS_MODE=allowlist` to expose only configured models
+ for providers that define a list, skipping their upstream `/models` calls.
+- **vLLM** — set `VLLM_API_KEY` only if the upstream server was started with
+ `--api-key`.
+- **Multiple instances of one provider type** — without `config.yaml`, use
+ suffixed env vars such as `OPENAI_EAST_API_KEY` and `OPENAI_EAST_BASE_URL`;
+ add `OPENAI_EAST_MODELS` to configure that instance's model list. This
+ registers provider `openai-east` with type `openai`. Vertex AI follows the
+ same suffix pattern — `VERTEX_US_PROJECT` registers provider `vertex-us`.
+ Vertex project and location env vars must match the instance prefix: for a
+ suffixed instance such as `VERTEX_US_PROJECT`, also set `VERTEX_US_LOCATION`
+ and any other suffixed settings for that instance, rather than the generic
+ `VERTEX_PROJECT` / `VERTEX_LOCATION`. `VERTEX_AUTH_TYPE` defaults to
+ Application Default Credentials (`gcp_adc`).
## Why some providers have dedicated pages
diff --git a/docs/providers/xiaomi.mdx b/docs/providers/xiaomi.mdx
index 404a869e..d8fcb551 100644
--- a/docs/providers/xiaomi.mdx
+++ b/docs/providers/xiaomi.mdx
@@ -1,7 +1,7 @@
---
title: "Xiaomi MiMo"
description: "Configure Xiaomi MiMo in GoModel: thinking mode, the [1m] context suffix, and how TTS/ASR map onto the standard audio endpoints."
-icon: "microphone"
+icon: "mic"
---
Xiaomi MiMo speaks an OpenAI-compatible chat API with a few dialect quirks: