-
-
Notifications
You must be signed in to change notification settings - Fork 0
Client Setup
Nenya works as a transparent OpenAI-compatible proxy. Point your AI coding client's local endpoint at Nenya to get secret redaction, multi-provider routing, and fallback chains.
Set the local endpoint environment variable:
export LOCAL_ENDPOINT=http://localhost:8080/v1
export OPENAI_API_KEY=your-nenya-client-tokenOr in your OpenCode config:
{
"provider": "openai",
"model": "gpt-4o",
"base_url": "http://localhost:8080/v1"
}Use an agent name as the model for agent-based routing:
{
"provider": {
"api_key": "your-nenya-client-token",
"model": "build",
"base_url": "http://localhost:8080/v1"
}
}{
"providers": {
"openai": {
"url": "https://api.openai.com/v1/chat/completions",
"auth_style": "bearer"
},
"deepseek": {
"url": "https://api.deepseek.com/v1/chat/completions",
"auth_style": "bearer"
},
"ollama": {
"url": "http://localhost:11434/v1/chat/completions",
"auth_style": "none"
}
},
"agents": {
"coder": {
"models": [
{ "provider": "openai", "model": "gpt-4o" },
{ "provider": "deepseek", "model": "deepseek-coder" },
{ "provider": "ollama", "model": "qwen2.5-coder:7b" }
],
"max_retries": 2,
"cooldown_seconds": 60
}
}
}Use coder as the model name in OpenCode to get automatic fallback across providers.
curl -s http://localhost:8080/v1/chat/completions \
-H "Authorization: Bearer your-client-token" \
-H "Content-Type: application/json" \
-d '{"model":"gpt-4o","messages":[{"role":"user","content":"hello"}],"stream":true}'| Endpoint | Status | Notes |
|---|---|---|
POST /v1/chat/completions |
Full support | Content pipeline, streaming, tool calls |
GET /v1/models |
Supported | OpenCode doesn't call this (hardcoded models), but available |
POST /v1/responses |
Passthrough | Available if needed |
POST /v1/embeddings |
Passthrough | Available if needed |
OpenCode uses the official OpenAI Go SDK which sets its own User-Agent. In Copilot mode, OpenCode sends identifying headers:
| Header | Value |
|---|---|
User-Agent |
OpenCode/1.0 (Copilot mode) or OpenAI SDK UA (standard mode) |
Editor-Version |
OpenCode/1.0 (Copilot mode) |
Editor-Plugin-Version |
OpenCode/1.0 (Copilot mode) |
Nenya detects OpenCode via any of these headers containing "opencode". When detected, IDE-aware pipeline behavior activates automatically.
Note: When OpenCode uses the standard OpenAI SDK (non-Copilot mode), the User-Agent is set by the SDK and does not contain "opencode". In this case, Nenya treats it as a standard client with full pipeline processing. To force IDE detection, set Editor-Version: OpenCode/1.0 in your provider config or middleware.
When OpenCode is detected as an IDE client:
| Stage | Behavior |
|---|---|
| Secret redaction | Regex redaction skips code inside markdown fences. Prose outside code blocks is still redacted. |
| Text compaction | Skipped. Preserves whitespace and line-number references in code payloads. |
| Truncation | Code-boundary aware — cuts at blank-line boundaries. When tfidf_query_source is set, uses TF-IDF relevance scoring instead. |
| Engine summarization | Uses code-preserving prompt — only redacts secrets in prose, never restructures code. |
| Tool calls |
tool_calls, tool_call_id, function_call pass through unmodified. |
Nenya supports both stream: false (default, per OpenAI spec) and stream: true for streaming responses. When stream: false, the gateway buffers the upstream response into a complete JSON object before returning it to the client. Response routing is determined by the upstream Content-Type header — text/event-stream responses go through the streaming pipeline, while standard JSON responses go through the non-streaming path.
OpenCode sends tools in standard OpenAI format. Multi-turn tool conversations work through Nenya:
{
"tools": [{ "type": "function", "function": { "name": "bash", "parameters": { ... } } }],
"messages": [
{ "role": "user", "content": "list files" },
{ "role": "assistant", "tool_calls": [{ "id": "call_1", "type": "function", "function": { "name": "bash", "arguments": "{\"cmd\":\"ls\"}" } }] },
{ "role": "tool", "tool_call_id": "call_1", "content": "file1.go\nfile2.go" },
{ "role": "assistant", "content": "Here are your files..." }
]
}OpenCode sends user messages as content arrays (not plain strings):
{
"content": [{ "type": "text", "text": "explain this code" }]
}This is compatible with Nenya's content array handling. For providers without content array support, text is extracted and flattened.
OpenCode sends reasoning_effort and max_completion_tokens for models with CanReason: true (o1, o3, DeepSeek v4). These pass through Nenya untouched:
{
"model": "o3-mini",
"reasoning_effort": "medium",
"max_completion_tokens": 16384,
"messages": [...]
}OpenCode requests stream_options: { include_usage: true } to get token usage in the final SSE chunk. This is stripped by the adapter for providers that don't support it (e.g., OpenAI, Groq, Nvidia) and preserved for providers that do (e.g., DeepSeek, OpenRouter).
Define a resilient agent that tries multiple providers:
{
"agents": {
"smart-coder": {
"models": [
{ "provider": "openai", "model": "gpt-4o", "max_context": 128000 },
{ "provider": "deepseek", "model": "deepseek-coder", "max_context": 64000 },
{ "provider": "ollama", "model": "qwen2.5-coder:7b", "max_context": 32000 }
],
"max_retries": 2,
"cooldown_seconds": 120,
"strategy": "round-robin"
}
}
}Use "model": "smart-coder" in OpenCode. If OpenAI rate-limits (429), Nenya automatically falls back to DeepSeek, then to local Ollama. Circuit breakers prevent hammering a tripped provider.
Configure Cursor to point at Nenya:
- Open Cursor Settings → Models → OpenAI-compatible
- Set Base URL:
http://localhost:8080/v1 - Set API key: Your Nenya
client_token - Set Model: An agent name (e.g.,
build) or a specific model ID (e.g.,gemini-3-flash)
curl -s http://localhost:8080/v1/models \
-H "Authorization: Bearer your-client-token" | jq .Cursor calls /v1/models to discover available models. The response includes all agent names and discovered models from configured providers.
| Endpoint | Status | Notes |
|---|---|---|
POST /v1/chat/completions |
Full support | Content pipeline, streaming, tool calls |
GET /v1/models |
Full support | Model catalog from agents + providers |
POST /v1/responses |
Passthrough | Transparent proxy, no content pipeline |
POST /v1/embeddings |
Passthrough | If needed by extensions |
Cursor is detected as an IDE client via User-Agent. The following pipeline adaptations apply automatically:
| Stage | Behavior |
|---|---|
| Secret redaction | Regex redaction skips code inside markdown fences (```). Prose and documentation outside code blocks are still redacted. |
| Text compaction | Skipped. Cursor carefully formats payloads with line-number references; collapsing whitespace would break them. |
| Truncation | Code-boundary aware — cuts at blank-line boundaries between functions/blocks. When tfidf_query_source is set, uses TF-IDF relevance scoring to keep the most relevant blocks instead. |
| Engine summarization | Uses code-preserving prompt — only redacts secrets in prose, never restructures or summarizes code. |
| Tool calls |
tool_calls, tool_call_id, function_call pass through unmodified. |
Cursor sends reasoning_effort and max_completion_tokens for reasoning models (o1, o3, o4-mini). These pass through Nenya untouched to the upstream provider.
Cursor's agent mode sends tool definitions in standard OpenAI format:
{
"tools": [
{
"type": "function",
"function": {
"name": "read_file",
"parameters": { "type": "object", "properties": { "path": { "type": "string" } } }
}
}
],
"messages": [
{ "role": "assistant", "tool_calls": [{ "id": "call_123", "type": "function", "function": { "name": "read_file", "arguments": "{...}" } }] },
{ "role": "tool", "tool_call_id": "call_123", "content": "file contents..." }
]
}All tool call fields pass through Nenya's sanitization and adapter layers unmodified. Tool call arguments in SSE streaming responses are not filtered by the stream security filter.
You can set the model override in .cursorrules:
You are an AI coding assistant. Your model is served through Nenya.
Cursor can send mixed content arrays (text + images):
{
"content": [
{ "type": "text", "text": "What's in this image?" },
{ "type": "image_url", "image_url": { "url": "data:image/png;base64,..." } }
]
}Nenya handles image_url content types — they are counted as [image] for token estimation and preserved in the payload. Providers without content array support have text extracted and non-text types dropped (with a warning logged).
Cursor sends full file contents, git diffs, and multi-file context. This can easily exceed Nenya's soft limit (derived from the target model's max_context). The 3-tier pipeline handles this:
- Below soft limit: pass through unchanged
- Between soft/hard: send to Ollama for privacy-preserving summarization (code structure preserved for IDE clients)
-
Above hard limit: truncate (TF-IDF relevance-scored when
tfidf_query_sourceis set, otherwise code-boundary middle-out), then summarize. If TF-IDF reduces payload belowsoft_limit, engine call is skipped entirely.
If Ollama is unavailable and fail_open is true (default), the original payload is forwarded unchanged.
Any client that supports OpenAI-compatible chat completions can use Nenya:
| Client | Base URL | Notes |
|---|---|---|
| OpenCode | http://localhost:8080/v1 |
Full support |
| Cursor | http://localhost:8080/v1 |
Full support |
| Aider | --openai-api-base http://localhost:8080/v1 |
Use --model for agent name |
| Continue | http://localhost:8080/v1 |
OpenAI-compatible provider |
| Any OpenAI client | http://localhost:8080/v1 |
Bearer token auth |
All /v1/* and /proxy/* endpoints require Authorization: Bearer <client_token> where client_token is defined in your secrets configuration. Set this as the API key in your client.
When your client sends a request with model: "agent-name", Nenya resolves the agent and routes through its model list (with fallback chain and circuit breaker). This means you can define complex routing strategies on the gateway side and keep your client configuration simple.
See Configuration for agent definitions and Routing for fallback strategies.
Getting Started
- Home — Project overview
- Quick Start — Install and run in 5 minutes
- Client Setup — OpenCode, Cursor, and other clients
- Deployment — Bare metal, container, Kubernetes
Core Concepts
- Configuration — Config reference and examples
- Providers — 24 providers, capabilities, special behaviors
- Routing — Latency-aware routing and fallback chains
- Architecture — Package overview and request lifecycle
- MCP Integration — MCP server integration
Reference
- Passthrough Proxy — Raw provider endpoint proxying
- Secrets — Systemd credentials and container secrets
- Model Discovery — Dynamic model catalog fetching
- API Endpoints — Endpoint reference
- Adapters — Provider adapter system
- Billing — Billing-aware routing and quota tracking
- Caching — Exact-match and semantic caching
- Provider Capabilities — Service kinds matrix
- Unknown MaxContext — Unknown context window behavior
Operations
- Demo — Test all pipeline tiers
- Troubleshooting — Common issues and solutions
- FAQ — Frequently asked questions
- Security — Security policy and vulnerability reporting
Project
- Roadmap — Planned features
- Disclaimer — Legal disclaimer