Skip to content

Client Setup

Rafael Gumieri edited this page Jun 15, 2026 · 4 revisions

Client Setup

Nenya works as a transparent OpenAI-compatible proxy. Point your AI coding client's local endpoint at Nenya to get secret redaction, multi-provider routing, and fallback chains.

OpenCode

Setup

Set the local endpoint environment variable:

export LOCAL_ENDPOINT=http://localhost:8080/v1
export OPENAI_API_KEY=your-nenya-client-token

Or in your OpenCode config:

{
  "provider": "openai",
  "model": "gpt-4o",
  "base_url": "http://localhost:8080/v1"
}

Use an agent name as the model for agent-based routing:

{
  "provider": {
    "api_key": "your-nenya-client-token",
    "model": "build",
    "base_url": "http://localhost:8080/v1"
  }
}

Nenya Config

{
  "providers": {
    "openai": {
      "url": "https://api.openai.com/v1/chat/completions",
      "auth_style": "bearer"
    },
    "deepseek": {
      "url": "https://api.deepseek.com/v1/chat/completions",
      "auth_style": "bearer"
    },
    "ollama": {
      "url": "http://localhost:11434/v1/chat/completions",
      "auth_style": "none"
    }
  },
  "agents": {
    "coder": {
      "models": [
        { "provider": "openai", "model": "gpt-4o" },
        { "provider": "deepseek", "model": "deepseek-coder" },
        { "provider": "ollama", "model": "qwen2.5-coder:7b" }
      ],
      "max_retries": 2,
      "cooldown_seconds": 60
    }
  }
}

Use coder as the model name in OpenCode to get automatic fallback across providers.

Verify

curl -s http://localhost:8080/v1/chat/completions \
  -H "Authorization: Bearer your-client-token" \
  -H "Content-Type: application/json" \
  -d '{"model":"gpt-4o","messages":[{"role":"user","content":"hello"}],"stream":true}'

Supported Endpoints

Endpoint Status Notes
POST /v1/chat/completions Full support Content pipeline, streaming, tool calls
GET /v1/models Supported OpenCode doesn't call this (hardcoded models), but available
POST /v1/responses Passthrough Available if needed
POST /v1/embeddings Passthrough Available if needed

Client Detection

OpenCode uses the official OpenAI Go SDK which sets its own User-Agent. In Copilot mode, OpenCode sends identifying headers:

Header Value
User-Agent OpenCode/1.0 (Copilot mode) or OpenAI SDK UA (standard mode)
Editor-Version OpenCode/1.0 (Copilot mode)
Editor-Plugin-Version OpenCode/1.0 (Copilot mode)

Nenya detects OpenCode via any of these headers containing "opencode". When detected, IDE-aware pipeline behavior activates automatically.

Note: When OpenCode uses the standard OpenAI SDK (non-Copilot mode), the User-Agent is set by the SDK and does not contain "opencode". In this case, Nenya treats it as a standard client with full pipeline processing. To force IDE detection, set Editor-Version: OpenCode/1.0 in your provider config or middleware.

Pipeline Behavior

When OpenCode is detected as an IDE client:

Stage Behavior
Secret redaction Regex redaction skips code inside markdown fences. Prose outside code blocks is still redacted.
Text compaction Skipped. Preserves whitespace and line-number references in code payloads.
Truncation Code-boundary aware — cuts at blank-line boundaries. When tfidf_query_source is set, uses TF-IDF relevance scoring instead.
Engine summarization Uses code-preserving prompt — only redacts secrets in prose, never restructures code.
Tool calls tool_calls, tool_call_id, function_call pass through unmodified.

Non-Streaming Support

Nenya supports both stream: false (default, per OpenAI spec) and stream: true for streaming responses. When stream: false, the gateway buffers the upstream response into a complete JSON object before returning it to the client. Response routing is determined by the upstream Content-Type header — text/event-stream responses go through the streaming pipeline, while standard JSON responses go through the non-streaming path.

Tool Use

OpenCode sends tools in standard OpenAI format. Multi-turn tool conversations work through Nenya:

{
  "tools": [{ "type": "function", "function": { "name": "bash", "parameters": { ... } } }],
  "messages": [
    { "role": "user", "content": "list files" },
    { "role": "assistant", "tool_calls": [{ "id": "call_1", "type": "function", "function": { "name": "bash", "arguments": "{\"cmd\":\"ls\"}" } }] },
    { "role": "tool", "tool_call_id": "call_1", "content": "file1.go\nfile2.go" },
    { "role": "assistant", "content": "Here are your files..." }
  ]
}

Content Arrays

OpenCode sends user messages as content arrays (not plain strings):

{
  "content": [{ "type": "text", "text": "explain this code" }]
}

This is compatible with Nenya's content array handling. For providers without content array support, text is extracted and flattened.

Reasoning Models

OpenCode sends reasoning_effort and max_completion_tokens for models with CanReason: true (o1, o3, DeepSeek v4). These pass through Nenya untouched:

{
  "model": "o3-mini",
  "reasoning_effort": "medium",
  "max_completion_tokens": 16384,
  "messages": [...]
}

Stream Options

OpenCode requests stream_options: { include_usage: true } to get token usage in the final SSE chunk. This is stripped by the adapter for providers that don't support it (e.g., OpenAI, Groq, Nvidia) and preserved for providers that do (e.g., DeepSeek, OpenRouter).

Agent Fallback Example

Define a resilient agent that tries multiple providers:

{
  "agents": {
    "smart-coder": {
      "models": [
        { "provider": "openai", "model": "gpt-4o", "max_context": 128000 },
        { "provider": "deepseek", "model": "deepseek-coder", "max_context": 64000 },
        { "provider": "ollama", "model": "qwen2.5-coder:7b", "max_context": 32000 }
      ],
      "max_retries": 2,
      "cooldown_seconds": 120,
      "strategy": "round-robin"
    }
  }
}

Use "model": "smart-coder" in OpenCode. If OpenAI rate-limits (429), Nenya automatically falls back to DeepSeek, then to local Ollama. Circuit breakers prevent hammering a tripped provider.

Cursor

Setup

Configure Cursor to point at Nenya:

  1. Open Cursor Settings → Models → OpenAI-compatible
  2. Set Base URL: http://localhost:8080/v1
  3. Set API key: Your Nenya client_token
  4. Set Model: An agent name (e.g., build) or a specific model ID (e.g., gemini-3-flash)

Verify

curl -s http://localhost:8080/v1/models \
  -H "Authorization: Bearer your-client-token" | jq .

Cursor calls /v1/models to discover available models. The response includes all agent names and discovered models from configured providers.

Supported Endpoints

Endpoint Status Notes
POST /v1/chat/completions Full support Content pipeline, streaming, tool calls
GET /v1/models Full support Model catalog from agents + providers
POST /v1/responses Passthrough Transparent proxy, no content pipeline
POST /v1/embeddings Passthrough If needed by extensions

Pipeline Behavior

Cursor is detected as an IDE client via User-Agent. The following pipeline adaptations apply automatically:

Stage Behavior
Secret redaction Regex redaction skips code inside markdown fences (```). Prose and documentation outside code blocks are still redacted.
Text compaction Skipped. Cursor carefully formats payloads with line-number references; collapsing whitespace would break them.
Truncation Code-boundary aware — cuts at blank-line boundaries between functions/blocks. When tfidf_query_source is set, uses TF-IDF relevance scoring to keep the most relevant blocks instead.
Engine summarization Uses code-preserving prompt — only redacts secrets in prose, never restructures or summarizes code.
Tool calls tool_calls, tool_call_id, function_call pass through unmodified.

Reasoning Models

Cursor sends reasoning_effort and max_completion_tokens for reasoning models (o1, o3, o4-mini). These pass through Nenya untouched to the upstream provider.

Tool Use / Agent Mode

Cursor's agent mode sends tool definitions in standard OpenAI format:

{
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "read_file",
        "parameters": { "type": "object", "properties": { "path": { "type": "string" } } }
      }
    }
  ],
  "messages": [
    { "role": "assistant", "tool_calls": [{ "id": "call_123", "type": "function", "function": { "name": "read_file", "arguments": "{...}" } }] },
    { "role": "tool", "tool_call_id": "call_123", "content": "file contents..." }
  ]
}

All tool call fields pass through Nenya's sanitization and adapter layers unmodified. Tool call arguments in SSE streaming responses are not filtered by the stream security filter.

Override Model

You can set the model override in .cursorrules:

You are an AI coding assistant. Your model is served through Nenya.

Content Types

Cursor can send mixed content arrays (text + images):

{
  "content": [
    { "type": "text", "text": "What's in this image?" },
    { "type": "image_url", "image_url": { "url": "data:image/png;base64,..." } }
  ]
}

Nenya handles image_url content types — they are counted as [image] for token estimation and preserved in the payload. Providers without content array support have text extracted and non-text types dropped (with a warning logged).

Large Payloads

Cursor sends full file contents, git diffs, and multi-file context. This can easily exceed Nenya's soft limit (derived from the target model's max_context). The 3-tier pipeline handles this:

  1. Below soft limit: pass through unchanged
  2. Between soft/hard: send to Ollama for privacy-preserving summarization (code structure preserved for IDE clients)
  3. Above hard limit: truncate (TF-IDF relevance-scored when tfidf_query_source is set, otherwise code-boundary middle-out), then summarize. If TF-IDF reduces payload below soft_limit, engine call is skipped entirely.

If Ollama is unavailable and fail_open is true (default), the original payload is forwarded unchanged.

Other Clients

Any client that supports OpenAI-compatible chat completions can use Nenya:

Client Base URL Notes
OpenCode http://localhost:8080/v1 Full support
Cursor http://localhost:8080/v1 Full support
Aider --openai-api-base http://localhost:8080/v1 Use --model for agent name
Continue http://localhost:8080/v1 OpenAI-compatible provider
Any OpenAI client http://localhost:8080/v1 Bearer token auth

Authentication

All /v1/* and /proxy/* endpoints require Authorization: Bearer <client_token> where client_token is defined in your secrets configuration. Set this as the API key in your client.

Agent-Based Routing

When your client sends a request with model: "agent-name", Nenya resolves the agent and routes through its model list (with fallback chain and circuit breaker). This means you can define complex routing strategies on the gateway side and keep your client configuration simple.

See Configuration for agent definitions and Routing for fallback strategies.

Getting Started

Core Concepts

Reference

Operations

  • Demo — Test all pipeline tiers
  • Troubleshooting — Common issues and solutions
  • FAQ — Frequently asked questions
  • Security — Security policy and vulnerability reporting

Project

Clone this wiki locally