From a84b6745a074abac1124182a84c57c9bc80cfe7c Mon Sep 17 00:00:00 2001
From: "Jakub A. W"
- A fast and lightweight AI gateway written in Go, providing a unified OpenAI-compatible API for OpenAI, Anthropic, Gemini, DeepSeek, xAI, Groq, OpenRouter, Z.ai, Azure OpenAI, Oracle, Ollama, and more. + A fast and lightweight AI gateway written in Go, providing unified OpenAI- and Anthropic-compatible APIs for OpenAI, Anthropic, Gemini, DeepSeek, xAI, Groq, OpenRouter, Z.ai, Azure OpenAI, Oracle, Ollama, and more.
@@ -44,33 +44,9 @@ docker run --rm -p 8080:8080 \ enterpilot/gomodel ``` -Pass only the provider credentials or base URL you need (at least one required): +Full list of environment variables (including all available providers): [.env.template](./.env.template) -```bash -docker run --rm -p 8080:8080 \ - -e OPENAI_API_KEY="your-openai-key" \ - -e ANTHROPIC_API_KEY="your-anthropic-key" \ - -e GEMINI_API_KEY="your-gemini-key" \ - -e VERTEX_PROJECT="your-gcp-project" \ - -e VERTEX_LOCATION="us-central1" \ - -e VERTEX_AUTH_TYPE="gcp_adc" \ - -e DEEPSEEK_API_KEY="your-deepseek-key" \ - -e GROQ_API_KEY="your-groq-key" \ - -e OPENROUTER_API_KEY="your-openrouter-key" \ - -e ZAI_API_KEY="your-zai-key" \ - -e XAI_API_KEY="your-xai-key" \ - -e AZURE_API_KEY="your-azure-key" \ - -e AZURE_BASE_URL="https://your-resource.openai.azure.com/openai/deployments/your-deployment" \ - -e AZURE_API_VERSION="2024-10-21" \ - -e ORACLE_API_KEY="your-oracle-key" \ - -e ORACLE_BASE_URL="https://inference.generativeai.us-chicago-1.oci.oraclecloud.com/20231130/actions/v1" \ - -e ORACLE_MODELS="openai.gpt-oss-120b,xai.grok-3" \ - -e OLLAMA_BASE_URL="http://host.docker.internal:11434/v1" \ - -e VLLM_BASE_URL="http://host.docker.internal:8000/v1" \ - enterpilot/gomodel -``` - -⚠️ Avoid passing secrets via `-e` on the command line - they can leak via shell history and process lists. For production, use `docker run --env-file .env` to load API keys from a file instead. +⚠️ Avoid passing secrets with `-e` on the command line in production — they can leak through shell history and process lists. Use `docker run --env-file .env` to load API keys from a file instead. **Step 2:** Make your first API call @@ -87,63 +63,14 @@ curl http://localhost:8080/v1/chat/completions \ ### Supported LLM Providers -Example model identifiers are illustrative and subject to change; consult provider catalogs for current models. Feature columns reflect gateway API support, not every individual model capability exposed by an upstream provider. - -| Provider | Credential | Example Model | Chat | `/responses` | Embed | Files | Batches | Passthru | -| ------------------------------------ | ----------------------------------------------------------------- | ------------------------------------------ | :--: | :----------: | :---: | :---: | :-----: | :------: | -| OpenAI | `OPENAI_API_KEY` | `gpt-5.5` | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | -| Anthropic | `ANTHROPIC_API_KEY` | `claude-sonnet-4-20250514` | ✅ | ✅ | ❌ | ❌ | ✅ | ✅ | -| Google Gemini | `GEMINI_API_KEY` | `gemini-2.5-flash` | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | -| Vertex AI | `VERTEX_PROJECT` + `VERTEX_LOCATION` | `google/gemini-2.5-flash` | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | -| DeepSeek | `DEEPSEEK_API_KEY` | `deepseek-v4-pro` | ✅ | ✅ | ❌ | ❌ | ❌ | ✅ | -| Groq | `GROQ_API_KEY` | `llama-3.3-70b-versatile` | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | -| OpenRouter | `OPENROUTER_API_KEY` | `google/gemini-2.5-flash` | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | -| Z.ai | `ZAI_API_KEY` (`ZAI_BASE_URL` optional) | `glm-5.1` | ✅ | ✅ | ✅ | ❌ | ❌ | ✅ | -| xAI (Grok) | `XAI_API_KEY` | `grok-4` | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | -| Alibaba Cloud Model Studio (Bailian) | `BAILIAN_API_KEY` (`BAILIAN_BASE_URL` optional) | `qwen3-max` | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | -| MiniMax | `MINIMAX_API_KEY` (`MINIMAX_BASE_URL` optional) | `MiniMax-M3` | ✅ | ✅ | ✅ | ❌ | ❌ | ✅ | -| Xiaomi MiMo | `XIAOMI_API_KEY` (`XIAOMI_BASE_URL` optional) | `mimo-v2.5-pro` | ✅ | ✅ | ❌ | ❌ | ❌ | ✅ | -| OpenCode Go | `OPENCODE_GO_API_KEY` (`OPENCODE_GO_BASE_URL` optional) | `glm-5.1` | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | -| Azure OpenAI | `AZURE_API_KEY` + `AZURE_BASE_URL` (`AZURE_API_VERSION` optional) | `gpt-5` | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | -| Oracle | `ORACLE_API_KEY` + `ORACLE_BASE_URL` | `openai.gpt-oss-120b` | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | -| Ollama | `OLLAMA_BASE_URL` | `llama3.2` | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | -| vLLM | `VLLM_BASE_URL` (`VLLM_API_KEY` optional) | `meta-llama/Llama-3.1-8B-Instruct` | ✅ | ✅ | ✅ | ❌ | ❌ | ✅ | -| Amazon Bedrock | `BEDROCK_BASE_URL` (region or endpoint) + AWS credentials | `anthropic.claude-3-5-haiku-20241022-v1:0` | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ | - -✅ Supported ❌ Unsupported - -For Z.ai's GLM Coding Plan, set `ZAI_BASE_URL=https://api.z.ai/api/coding/paas/v4`. -Xiaomi MiMo TTS (`mimo-v2.5-tts*`) and ASR (`mimo-v2.5-asr`) are served through -`/v1/audio/speech` and `/v1/audio/transcriptions` (translated to MiMo's -chat-completions audio dialect) as well as directly via chat completions; for -1M context append `[1m]` to the model ID and list it in `XIAOMI_MODELS`. -OpenCode Go (OpenCode Zen) routes per model — most models use OpenAI-style -`/chat/completions`, while `/messages`-only models (default `qwen3.7-max`, -override with `OPENCODE_GO_MESSAGES_MODELS`) are sent to the Anthropic-native -endpoint. Set `OPENCODE_GO_API_KEY`; the base URL defaults to -`https://opencode.ai/zen/go/v1`. -Configured model lists are available for every provider with -`- A fast and lightweight AI gateway written in Go, providing unified OpenAI- and Anthropic-compatible APIs for OpenAI, Anthropic, Gemini, DeepSeek, xAI, Groq, OpenRouter, Z.ai, Azure OpenAI, Oracle, Ollama, and more. + A fast and lightweight AI gateway written in Go, providing unified OpenAI-compatible and Anthropic-compatible APIs for OpenAI, Anthropic, Gemini, DeepSeek, xAI, Groq, OpenRouter, Z.ai, Azure OpenAI, Oracle, Ollama, and more.
@@ -129,7 +129,7 @@ docker run --rm -p 8080:8080 --env-file .env gomodel ## API Endpoints -GoModel exposes OpenAI- and Anthropic-compatible APIs, provider-native +GoModel exposes OpenAI-compatible and Anthropic-compatible APIs, provider-native passthrough, and operations routes. See the [API Endpoints reference](./docs/advanced/api-endpoints.mdx) for the full endpoint tables, and [Admin Endpoints](./docs/advanced/admin-endpoints.mdx) for @@ -145,14 +145,6 @@ GoModel is configured through environment variables and an optional `config.yaml --- -## Response Caching - -GoModel has a two-layer response cache that reduces LLM API cost and latency for repeated or semantically similar requests: an exact-match layer (`X-Cache: HIT (exact)`) for byte-identical requests, and a semantic layer (`X-Cache: HIT (semantic)`) that matches paraphrased prompts via embeddings and a KNN vector search. Expected hit rates reach ~60–70% in high-repetition workloads vs. ~18% for exact-match alone. - -See [Response Caching](./docs/features/cache.mdx) for setup, supported vector backends, and `user_path` behavior. - ---- - See [DEVELOPMENT.md](docs/DEVELOPMENT.md) for testing, linting, and pre-commit setup. --- diff --git a/docs/advanced/api-endpoints.mdx b/docs/advanced/api-endpoints.mdx index 50ffb870..12277664 100644 --- a/docs/advanced/api-endpoints.mdx +++ b/docs/advanced/api-endpoints.mdx @@ -1,10 +1,10 @@ --- title: "API Endpoints" -description: "Reference for GoModel's OpenAI- and Anthropic-compatible endpoints, provider passthrough, and operations routes." +description: "Reference for GoModel's OpenAI-compatible and Anthropic-compatible endpoints, provider passthrough, and operations routes." icon: "list-tree" --- -GoModel exposes OpenAI- and Anthropic-compatible APIs, provider-native +GoModel exposes OpenAI-compatible and Anthropic-compatible APIs, provider-native passthrough, and a set of operations endpoints. Admin and dashboard routes are documented separately in [Admin Endpoints](/advanced/admin-endpoints). From 7f465de9b27af6f010e87fca3b47fad4a1f8288c Mon Sep 17 00:00:00 2001 From: "Jakub A. W"