Client Setup

Nenya works as a transparent OpenAI-compatible proxy. Point your AI coding client's local endpoint at Nenya to get secret redaction, multi-provider routing, and fallback chains.

OpenCode

Setup

Set the local endpoint environment variable:

export LOCAL_ENDPOINT=http://localhost:8080/v1
export OPENAI_API_KEY=your-nenya-client-token

Or in your OpenCode config:

{
  "provider": "openai",
  "model": "gpt-4o",
  "base_url": "http://localhost:8080/v1"
}

Use an agent name as the model for agent-based routing:

{
  "provider": {
    "api_key": "your-nenya-client-token",
    "model": "build",
    "base_url": "http://localhost:8080/v1"
  }
}

Nenya Config

{
  "providers": {
    "openai": {
      "url": "https://api.openai.com/v1/chat/completions",
      "auth_style": "bearer"
    },
    "deepseek": {
      "url": "https://api.deepseek.com/v1/chat/completions",
      "auth_style": "bearer"
    },
    "ollama": {
      "url": "http://localhost:11434/v1/chat/completions",
      "auth_style": "none"
    }
  },
  "agents": {
    "coder": {
      "models": [
        { "provider": "openai", "model": "gpt-4o" },
        { "provider": "deepseek", "model": "deepseek-coder" },
        { "provider": "ollama", "model": "qwen2.5-coder:7b" }
      ],
      "max_retries": 2,
      "cooldown_seconds": 60
    }
  }
}

Use coder as the model name in OpenCode to get automatic fallback across providers.

Verify

curl -s http://localhost:8080/v1/chat/completions \
  -H "Authorization: Bearer your-client-token" \
  -H "Content-Type: application/json" \
  -d '{"model":"gpt-4o","messages":[{"role":"user","content":"hello"}],"stream":true}'

Supported Endpoints

Endpoint	Status	Notes
`POST /v1/chat/completions`	Full support	Content pipeline, streaming, tool calls
`GET /v1/models`	Supported	OpenCode doesn't call this (hardcoded models), but available
`POST /v1/responses`	Passthrough	Available if needed
`POST /v1/embeddings`	Passthrough	Available if needed

Client Detection

OpenCode uses the official OpenAI Go SDK which sets its own User-Agent. In Copilot mode, OpenCode sends identifying headers:

Header	Value
`User-Agent`	`OpenCode/1.0` (Copilot mode) or OpenAI SDK UA (standard mode)
`Editor-Version`	`OpenCode/1.0` (Copilot mode)
`Editor-Plugin-Version`	`OpenCode/1.0` (Copilot mode)

Nenya detects OpenCode via any of these headers containing "opencode". When detected, IDE-aware pipeline behavior activates automatically.

Note: When OpenCode uses the standard OpenAI SDK (non-Copilot mode), the User-Agent is set by the SDK and does not contain "opencode". In this case, Nenya treats it as a standard client with full pipeline processing. To force IDE detection, set Editor-Version: OpenCode/1.0 in your provider config or middleware.

Pipeline Behavior

When OpenCode is detected as an IDE client:

Stage	Behavior
Secret redaction	Regex redaction skips code inside markdown fences. Prose outside code blocks is still redacted.
Text compaction	Skipped. Preserves whitespace and line-number references in code payloads.
Truncation	Code-boundary aware — cuts at blank-line boundaries. When `tfidf_query_source` is set, uses TF-IDF relevance scoring instead.
Engine summarization	Uses code-preserving prompt — only redacts secrets in prose, never restructures code.
Tool calls	`tool_calls`, `tool_call_id`, `function_call` pass through unmodified.

Non-Streaming Support

Nenya supports both stream: false (default, per OpenAI spec) and stream: true for streaming responses. When stream: false, the gateway buffers the upstream response into a complete JSON object before returning it to the client. Response routing is determined by the upstream Content-Type header — text/event-stream responses go through the streaming pipeline, while standard JSON responses go through the non-streaming path.

Tool Use

OpenCode sends tools in standard OpenAI format. Multi-turn tool conversations work through Nenya:

{
  "tools": [{ "type": "function", "function": { "name": "bash", "parameters": { ... } } }],
  "messages": [
    { "role": "user", "content": "list files" },
    { "role": "assistant", "tool_calls": [{ "id": "call_1", "type": "function", "function": { "name": "bash", "arguments": "{\"cmd\":\"ls\"}" } }] },
    { "role": "tool", "tool_call_id": "call_1", "content": "file1.go\nfile2.go" },
    { "role": "assistant", "content": "Here are your files..." }
  ]
}

Content Arrays

OpenCode sends user messages as content arrays (not plain strings):

{
  "content": [{ "type": "text", "text": "explain this code" }]
}

This is compatible with Nenya's content array handling. For providers without content array support, text is extracted and flattened.

Reasoning Models

OpenCode sends reasoning_effort and max_completion_tokens for models with CanReason: true (o1, o3, DeepSeek v4). These pass through Nenya untouched:

{
  "model": "o3-mini",
  "reasoning_effort": "medium",
  "max_completion_tokens": 16384,
  "messages": [...]
}

Stream Options

OpenCode requests stream_options: { include_usage: true } to get token usage in the final SSE chunk. This is stripped by the adapter for providers that don't support it (e.g., OpenAI, Groq, Nvidia) and preserved for providers that do (e.g., DeepSeek, OpenRouter).

Agent Fallback Example

Define a resilient agent that tries multiple providers:

{
  "agents": {
    "smart-coder": {
      "models": [
        { "provider": "openai", "model": "gpt-4o", "max_context": 128000 },
        { "provider": "deepseek", "model": "deepseek-coder", "max_context": 64000 },
        { "provider": "ollama", "model": "qwen2.5-coder:7b", "max_context": 32000 }
      ],
      "max_retries": 2,
      "cooldown_seconds": 120,
      "strategy": "round-robin"
    }
  }
}

Use "model": "smart-coder" in OpenCode. If OpenAI rate-limits (429), Nenya automatically falls back to DeepSeek, then to local Ollama. Circuit breakers prevent hammering a tripped provider.

Cursor

Setup

Configure Cursor to point at Nenya:

Open Cursor Settings → Models → OpenAI-compatible
Set Base URL: http://localhost:8080/v1
Set API key: Your Nenya client_token
Set Model: An agent name (e.g., build) or a specific model ID (e.g., gemini-3-flash)

Verify

curl -s http://localhost:8080/v1/models \
  -H "Authorization: Bearer your-client-token" | jq .

Cursor calls /v1/models to discover available models. The response includes all agent names and discovered models from configured providers.

Supported Endpoints

Endpoint	Status	Notes
`POST /v1/chat/completions`	Full support	Content pipeline, streaming, tool calls
`GET /v1/models`	Full support	Model catalog from agents + providers
`POST /v1/responses`	Passthrough	Transparent proxy, no content pipeline
`POST /v1/embeddings`	Passthrough	If needed by extensions

Pipeline Behavior

Cursor is detected as an IDE client via User-Agent. The following pipeline adaptations apply automatically:

Stage	Behavior
Secret redaction	Regex redaction skips code inside markdown fences (```). Prose and documentation outside code blocks are still redacted.
Text compaction	Skipped. Cursor carefully formats payloads with line-number references; collapsing whitespace would break them.
Truncation	Code-boundary aware — cuts at blank-line boundaries between functions/blocks. When `tfidf_query_source` is set, uses TF-IDF relevance scoring to keep the most relevant blocks instead.
Engine summarization	Uses code-preserving prompt — only redacts secrets in prose, never restructures or summarizes code.
Tool calls	`tool_calls`, `tool_call_id`, `function_call` pass through unmodified.

Reasoning Models

Cursor sends reasoning_effort and max_completion_tokens for reasoning models (o1, o3, o4-mini). These pass through Nenya untouched to the upstream provider.

Tool Use / Agent Mode

Cursor's agent mode sends tool definitions in standard OpenAI format:

{
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "read_file",
        "parameters": { "type": "object", "properties": { "path": { "type": "string" } } }
      }
    }
  ],
  "messages": [
    { "role": "assistant", "tool_calls": [{ "id": "call_123", "type": "function", "function": { "name": "read_file", "arguments": "{...}" } }] },
    { "role": "tool", "tool_call_id": "call_123", "content": "file contents..." }
  ]
}

All tool call fields pass through Nenya's sanitization and adapter layers unmodified. Tool call arguments in SSE streaming responses are not filtered by the stream security filter.

Override Model

You can set the model override in .cursorrules:

You are an AI coding assistant. Your model is served through Nenya.

Content Types

Cursor can send mixed content arrays (text + images):

{
  "content": [
    { "type": "text", "text": "What's in this image?" },
    { "type": "image_url", "image_url": { "url": "data:image/png;base64,..." } }
  ]
}

Nenya handles image_url content types — they are counted as [image] for token estimation and preserved in the payload. Providers without content array support have text extracted and non-text types dropped (with a warning logged).

Large Payloads

Cursor sends full file contents, git diffs, and multi-file context. This can easily exceed Nenya's soft limit (derived from the target model's max_context). The 3-tier pipeline handles this:

Below soft limit: pass through unchanged
Between soft/hard: send to Ollama for privacy-preserving summarization (code structure preserved for IDE clients)
Above hard limit: truncate (TF-IDF relevance-scored when tfidf_query_source is set, otherwise code-boundary middle-out), then summarize. If TF-IDF reduces payload below soft_limit, engine call is skipped entirely.

If Ollama is unavailable and fail_open is true (default), the original payload is forwarded unchanged.

Other Clients

Any client that supports OpenAI-compatible chat completions can use Nenya:

Client	Base URL	Notes
OpenCode	`http://localhost:8080/v1`	Full support
Cursor	`http://localhost:8080/v1`	Full support
Aider	`--openai-api-base http://localhost:8080/v1`	Use `--model` for agent name
Continue	`http://localhost:8080/v1`	OpenAI-compatible provider
Any OpenAI client	`http://localhost:8080/v1`	Bearer token auth

Authentication

All /v1/* and /proxy/* endpoints require Authorization: Bearer <client_token> where client_token is defined in your secrets configuration. Set this as the API key in your client.

Agent-Based Routing

When your client sends a request with model: "agent-name", Nenya resolves the agent and routes through its model list (with fallback chain and circuit breaker). This means you can define complex routing strategies on the gateway side and keep your client configuration simple.

See Configuration for agent definitions and Routing for fallback strategies.

Nenya on GitHub | Report an Issue | Apache 2.0 License

Getting Started

Home — Project overview
Quick Start — Install and run in 5 minutes
Client Setup — OpenCode, Cursor, and other clients
Deployment — Bare metal, container, Kubernetes

Core Concepts

Configuration — Config reference and examples
Providers — 24 providers, capabilities, special behaviors
Routing — Latency-aware routing and fallback chains
Architecture — Package overview and request lifecycle
MCP Integration — MCP server integration

Reference

Passthrough Proxy — Raw provider endpoint proxying
Secrets — Systemd credentials and container secrets
Model Discovery — Dynamic model catalog fetching
API Endpoints — Endpoint reference
Adapters — Provider adapter system
Billing — Billing-aware routing and quota tracking
Caching — Exact-match and semantic caching
Provider Capabilities — Service kinds matrix
Unknown MaxContext — Unknown context window behavior

Operations

Demo — Test all pipeline tiers
Troubleshooting — Common issues and solutions
FAQ — Frequently asked questions
Security — Security policy and vulnerability reporting

Project

Roadmap — Planned features
Disclaimer — Legal disclaimer

Uh oh!

Client Setup

Client Setup

OpenCode

Setup

Nenya Config

Verify

Supported Endpoints

Client Detection

Pipeline Behavior

Non-Streaming Support

Tool Use

Content Arrays

Reasoning Models

Stream Options

Agent Fallback Example

Cursor

Setup

Verify

Supported Endpoints

Pipeline Behavior

Reasoning Models

Tool Use / Agent Mode

Override Model

Content Types

Large Payloads

Other Clients

Authentication

Agent-Based Routing

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally