Home

Nenya AI Gateway

A lightweight, zero-dependency AI API Gateway written in Go. Nenya sits between your AI coding clients and upstream LLM providers, adding secret redaction, context management, agent routing, and MCP tool integration — all with transparent SSE streaming. Security-hardened: non-root execution, mlock for secrets, seccomp + no-new-privileges.

Compatible with any provider that implements the OpenAI or Anthropic Chat Completions API. For 24 providers we ship built-in adapters with specialized handling.

go-version License zero-deps

Features

Routing & Agents

Config-driven provider registry — add providers via JSON, zero code changes
24 built-in providers with specialized adapters for wire format differences
Dynamic model discovery — fetches live model catalogs from providers at startup and on reload
Model registry — reference models by string shorthand with automatic provider/context resolution
Multi-provider model resolution — when a model exists in multiple providers, all are added to the agent's fallback chain
Three-tier model resolution — config overrides > discovered models > static registry
Per-model wire format — models from multi-format gateways (OpenCode Zen) auto-convert between OpenAI, Anthropic, and Gemini wire formats
Agent fallback chains — round-robin or sequential with circuit breaker and automatic failover
Latency-aware routing — auto-reorder targets by historical median response time with ±5% jitter to prevent thundering herd
Multi-account provider selection — LRU-based account pool with per-account credential, error classification, and backoff
Billing-aware routing — per-account spend tracking, quota polling (API/headers), exhaustion filtering, free model scoring bonuses, and per-agent budget limits
Per-agent system prompts — inline or file-based

Security & Privacy

Tier-0 regex secret filter — always-on redaction of AWS keys, GitHub tokens, passwords, etc.
Pluggable Interceptor Chain — priority-ordered pipeline (regex redaction, entropy filtering, TF-IDF relevance scoring, engine summarization) with per-step metrics
3-Tier content pipeline — pass-through, engine summarization, or TF-IDF relevance-scored truncation
Context window compaction — sliding window summarization with configurable engine
Stale tool call pruning — compact old assistant+tool response pairs to save tokens
Thought pruning — strip reasoning blocks from assistant message history
Input validation — strict body limits, JSON sanitization, header filtering
Graceful degradation — never blocks requests due to engine or pipeline failures
Role-Based Access Control (RBAC) — per-API key roles (admin, user, read-only) with agent and endpoint restrictions
Secure memory — mlock-protected token storage, read-only sealing, core dump prevention

Hardening (Deployment Security)

Secure memory (default): All tokens stored in mlock-protected RAM, sealed read-only after init, core dumps disabled
Non-root execution — runs as UID 65532 with dropped capabilities
Memory protection — LimitMEMLOCK=infinity and LimitCORE=0 in systemd
Read-only filesystem — immutable root + private /tmp
Seccomp + no-new-privileges — restricted syscalls, prevents privilege escalation
Zero-trust secrets — loaded via systemd credentials or container mounts, never to disk
Socket activation — seamless restarts with zero dropped connections

Reliability

Zero external dependencies — Go standard library only
Hot reload — systemctl reload nenya for zero-downtime config changes
Circuit breaker — per agent+provider+model with automatic failover, exponential backoff, and semantic error classification
Rate limiting — per upstream host (RPM/TPM) with per-provider overrides
Response cache — in-memory LRU with SHA-256 fingerprinting
Context-Limit Auto-Retry — automatic summarization via engine and retry on upstream context-length errors
Local Engine Lifecycle — preload and unload local Ollama models with LRU eviction
Graceful shutdown — 30s grace period for in-flight requests, MCP client cleanup

MCP Tool Integration

Tool discovery — connect to MCP servers for automatic tool injection
Multi-turn execution — intercept tool calls, execute against MCP servers, forward results
Auto-search — pre-fetch relevant context from MCP servers before forwarding
Auto-save — persist assistant responses to MCP memory servers

Request Flow

+----------------------------------------------+
| Client (Cursor / OpenCode / Aider / etc.)    |
| OpenAI-compatible request                    |
| POST /v1/chat/completions + Bearer token     |
+----------------------------------------------+
                        |
                        v
+----------------------------------------------+
| Nenya Gateway                                |
| - auth check                                 |
| - parse JSON + extract model                 |
| - resolve agent/provider                     |
| - optional cache (HIT => replay cached response in correct format)         |
| - optional MCP context/tool injection        |
+----------------------------------------------+
                        |
                        v
+----------------------------------------------+
| Privacy / Context Pipeline (best-effort)     |
| - Tier-0 regex + entropy secret redaction    |
| - compaction / pruning / window mgmt         |
| - engine summarize (usually local Ollama)    |
+----------------------------------------------+
                        |
                        v
+----------------------------------------------+
| Routing                                      |
|  A) Standard forwarding                      |
|     - fallback chain + circuit breaker + RL  |
|  B) MCP multi-turn tool loop (if enabled)    |
|     - buffer SSE, execute MCP tools, re-send |
+----------------------------------------------+
                        |
                        v
+----------------------------------------------+
| Upstream LLM Providers                       |
| Anthropic | Gemini | DeepSeek | Mistral | ...|
+----------------------------------------------+
                        |
                        |  SSE stream
                        v
+----------------------------------------------+
| Nenya SSE Pipeline                           |
| - adapter response transforms                |
| - usage accounting + stream filter           |
| - flush + (optional) cache capture           |
| - (optional) MCP auto-save                   |
+----------------------------------------------+
                        |
                        v
+----------------------------------------------+
| Client receives transparent SSE or JSON output       |
+----------------------------------------------+

Quick Start

1. Install

curl -fsSL https://raw.githubusercontent.com/gumieri/nenya/main/install.sh | sudo sh

2. Run with Podman

Create minimal config and secrets:

mkdir -p config secrets
cat > config/config.json << 'EOF'
{
  "server": { "listen_addr": ":8080" },
  "agents": {
    "default": {
      "strategy": "fallback",
      "models": ["gemini-3-flash"]
    }
  }
}
EOF

cat > secrets/provider_keys.json << 'EOF'
{
  "provider_keys": {
    "gemini": "AIza..."
  }
}
EOF

cat > secrets/client.json << 'EOF'
{
  "client_token": "nk-$(openssl rand -hex 32)"
}
EOF

Run the container:

podman run -d \
  --name nenya \
  -p 8080:8080 \
  -v ./config:/etc/nenya:ro \
  -v ./secrets:/run/secrets/nenya:ro \
  -e NENYA_SECRETS_DIR=/run/secrets/nenya \
  --cap-drop=ALL \
  --cap-add=IPC_LOCK \
  --security-opt=no-new-privileges:true \
  --read-only \
  --tmpfs /tmp:rw,noexec,nosuid,size=64M \
  ghcr.io/gumieri/nenya:latest

Test it:

curl -H "Authorization: Bearer $(jq -r '.client_token' secrets/client.json)" \
  http://localhost:8080/healthz

Runtime Configuration

Variable	Default	Description
`PORT`	`8080`	Listening port (overrides `server.listen_addr`)
`HOST`	—	Optional bind address (e.g. `127.0.0.1`). Only used when combined with `PORT`
`NENYA_CONFIG_DIR`	`/etc/nenya/`	Configuration directory path
`NENYA_CONFIG_FILE`	—	Single config file path (takes precedence over `NENYA_CONFIG_DIR`)
`NENYA_SECRETS_DIR`	—	Secrets directory (overrides `CREDENTIALS_DIRECTORY`)

Navigation

Getting Started

Quick Start — Detailed installation and first run
Client Setup — Configure OpenCode, Cursor, and other clients
Deployment — Bare metal (systemd), container, and Kubernetes guides

Core Concepts

Configuration — Full config reference with examples
Providers — All 24 providers, capabilities, and special behaviors
Routing — Latency-aware routing and fallback chains
Architecture — Package overview, request lifecycle, circuit breaker
MCP Integration — Model Context Protocol server integration

Reference

API Endpoints — Endpoint reference with auth requirements (includes /v1/messages for Anthropic clients)
Passthrough Proxy — Raw provider endpoint proxying
Secrets — Systemd credentials, env var fallback, container/K8s deployment
Model Discovery — Dynamic model catalog fetching
Adapters — Provider adapter system
Billing — Billing-aware routing and quota tracking
Caching — Exact-match and semantic caching
Provider Capabilities — Service kinds matrix
Unknown MaxContext — Unknown context window behavior

Operations

Demo — Step-by-step testing of all pipeline tiers
Troubleshooting — Common issues and solutions
FAQ — Frequently asked questions
Security — Security policy and vulnerability reporting

Project

Roadmap — Planned features and improvements
Disclaimer — Legal disclaimer and usage terms

Nenya on GitHub | Apache 2.0 License

Nenya on GitHub | Report an Issue | Apache 2.0 License

Getting Started

Home — Project overview
Quick Start — Install and run in 5 minutes
Client Setup — OpenCode, Cursor, and other clients
Deployment — Bare metal, container, Kubernetes

Core Concepts

Configuration — Config reference and examples
Providers — 24 providers, capabilities, special behaviors
Routing — Latency-aware routing and fallback chains
Architecture — Package overview and request lifecycle
MCP Integration — MCP server integration

Reference

Passthrough Proxy — Raw provider endpoint proxying
Secrets — Systemd credentials and container secrets
Model Discovery — Dynamic model catalog fetching
API Endpoints — Endpoint reference
Adapters — Provider adapter system
Billing — Billing-aware routing and quota tracking
Caching — Exact-match and semantic caching
Provider Capabilities — Service kinds matrix
Unknown MaxContext — Unknown context window behavior

Operations

Demo — Test all pipeline tiers
Troubleshooting — Common issues and solutions
FAQ — Frequently asked questions
Security — Security policy and vulnerability reporting

Project

Roadmap — Planned features
Disclaimer — Legal disclaimer

Uh oh!

Home

Nenya AI Gateway

Features

Routing & Agents

Security & Privacy

Hardening (Deployment Security)

Reliability

MCP Tool Integration

Request Flow

Quick Start

1. Install

2. Run with Podman

Runtime Configuration

Navigation

Getting Started

Core Concepts

Reference

Operations

Project

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally