Skip to content
Rafael Gumieri edited this page Jun 15, 2026 · 5 revisions

Nenya AI Gateway

A lightweight, zero-dependency AI API Gateway written in Go. Nenya sits between your AI coding clients and upstream LLM providers, adding secret redaction, context management, agent routing, and MCP tool integration — all with transparent SSE streaming. Security-hardened: non-root execution, mlock for secrets, seccomp + no-new-privileges.

Compatible with any provider that implements the OpenAI or Anthropic Chat Completions API. For 24 providers we ship built-in adapters with specialized handling.

go-version License zero-deps

Features

Routing & Agents

  • Config-driven provider registry — add providers via JSON, zero code changes
  • 24 built-in providers with specialized adapters for wire format differences
  • Dynamic model discovery — fetches live model catalogs from providers at startup and on reload
  • Model registry — reference models by string shorthand with automatic provider/context resolution
  • Multi-provider model resolution — when a model exists in multiple providers, all are added to the agent's fallback chain
  • Three-tier model resolution — config overrides > discovered models > static registry
  • Per-model wire format — models from multi-format gateways (OpenCode Zen) auto-convert between OpenAI, Anthropic, and Gemini wire formats
  • Agent fallback chains — round-robin or sequential with circuit breaker and automatic failover
  • Latency-aware routing — auto-reorder targets by historical median response time with ±5% jitter to prevent thundering herd
  • Multi-account provider selection — LRU-based account pool with per-account credential, error classification, and backoff
  • Billing-aware routing — per-account spend tracking, quota polling (API/headers), exhaustion filtering, free model scoring bonuses, and per-agent budget limits
  • Per-agent system prompts — inline or file-based

Security & Privacy

  • Tier-0 regex secret filter — always-on redaction of AWS keys, GitHub tokens, passwords, etc.
  • Pluggable Interceptor Chain — priority-ordered pipeline (regex redaction, entropy filtering, TF-IDF relevance scoring, engine summarization) with per-step metrics
  • 3-Tier content pipeline — pass-through, engine summarization, or TF-IDF relevance-scored truncation
  • Context window compaction — sliding window summarization with configurable engine
  • Stale tool call pruning — compact old assistant+tool response pairs to save tokens
  • Thought pruning — strip reasoning blocks from assistant message history
  • Input validation — strict body limits, JSON sanitization, header filtering
  • Graceful degradation — never blocks requests due to engine or pipeline failures
  • Role-Based Access Control (RBAC) — per-API key roles (admin, user, read-only) with agent and endpoint restrictions
  • Secure memory — mlock-protected token storage, read-only sealing, core dump prevention

Hardening (Deployment Security)

  • Secure memory (default): All tokens stored in mlock-protected RAM, sealed read-only after init, core dumps disabled
  • Non-root execution — runs as UID 65532 with dropped capabilities
  • Memory protectionLimitMEMLOCK=infinity and LimitCORE=0 in systemd
  • Read-only filesystem — immutable root + private /tmp
  • Seccomp + no-new-privileges — restricted syscalls, prevents privilege escalation
  • Zero-trust secrets — loaded via systemd credentials or container mounts, never to disk
  • Socket activation — seamless restarts with zero dropped connections

Reliability

  • Zero external dependencies — Go standard library only
  • Hot reloadsystemctl reload nenya for zero-downtime config changes
  • Circuit breaker — per agent+provider+model with automatic failover, exponential backoff, and semantic error classification
  • Rate limiting — per upstream host (RPM/TPM) with per-provider overrides
  • Response cache — in-memory LRU with SHA-256 fingerprinting
  • Context-Limit Auto-Retry — automatic summarization via engine and retry on upstream context-length errors
  • Local Engine Lifecycle — preload and unload local Ollama models with LRU eviction
  • Graceful shutdown — 30s grace period for in-flight requests, MCP client cleanup

MCP Tool Integration

  • Tool discovery — connect to MCP servers for automatic tool injection
  • Multi-turn execution — intercept tool calls, execute against MCP servers, forward results
  • Auto-search — pre-fetch relevant context from MCP servers before forwarding
  • Auto-save — persist assistant responses to MCP memory servers

Request Flow

+----------------------------------------------+
| Client (Cursor / OpenCode / Aider / etc.)    |
| OpenAI-compatible request                    |
| POST /v1/chat/completions + Bearer token     |
+----------------------------------------------+
                        |
                        v
+----------------------------------------------+
| Nenya Gateway                                |
| - auth check                                 |
| - parse JSON + extract model                 |
| - resolve agent/provider                     |
| - optional cache (HIT => replay cached response in correct format)         |
| - optional MCP context/tool injection        |
+----------------------------------------------+
                        |
                        v
+----------------------------------------------+
| Privacy / Context Pipeline (best-effort)     |
| - Tier-0 regex + entropy secret redaction    |
| - compaction / pruning / window mgmt         |
| - engine summarize (usually local Ollama)    |
+----------------------------------------------+
                        |
                        v
+----------------------------------------------+
| Routing                                      |
|  A) Standard forwarding                      |
|     - fallback chain + circuit breaker + RL  |
|  B) MCP multi-turn tool loop (if enabled)    |
|     - buffer SSE, execute MCP tools, re-send |
+----------------------------------------------+
                        |
                        v
+----------------------------------------------+
| Upstream LLM Providers                       |
| Anthropic | Gemini | DeepSeek | Mistral | ...|
+----------------------------------------------+
                        |
                        |  SSE stream
                        v
+----------------------------------------------+
| Nenya SSE Pipeline                           |
| - adapter response transforms                |
| - usage accounting + stream filter           |
| - flush + (optional) cache capture           |
| - (optional) MCP auto-save                   |
+----------------------------------------------+
                        |
                        v
+----------------------------------------------+
| Client receives transparent SSE or JSON output       |
+----------------------------------------------+

Quick Start

1. Install

curl -fsSL https://raw.githubusercontent.com/gumieri/nenya/main/install.sh | sudo sh

2. Run with Podman

Create minimal config and secrets:

mkdir -p config secrets
cat > config/config.json << 'EOF'
{
  "server": { "listen_addr": ":8080" },
  "agents": {
    "default": {
      "strategy": "fallback",
      "models": ["gemini-3-flash"]
    }
  }
}
EOF

cat > secrets/provider_keys.json << 'EOF'
{
  "provider_keys": {
    "gemini": "AIza..."
  }
}
EOF

cat > secrets/client.json << 'EOF'
{
  "client_token": "nk-$(openssl rand -hex 32)"
}
EOF

Run the container:

podman run -d \
  --name nenya \
  -p 8080:8080 \
  -v ./config:/etc/nenya:ro \
  -v ./secrets:/run/secrets/nenya:ro \
  -e NENYA_SECRETS_DIR=/run/secrets/nenya \
  --cap-drop=ALL \
  --cap-add=IPC_LOCK \
  --security-opt=no-new-privileges:true \
  --read-only \
  --tmpfs /tmp:rw,noexec,nosuid,size=64M \
  ghcr.io/gumieri/nenya:latest

Test it:

curl -H "Authorization: Bearer $(jq -r '.client_token' secrets/client.json)" \
  http://localhost:8080/healthz

Runtime Configuration

Variable Default Description
PORT 8080 Listening port (overrides server.listen_addr)
HOST Optional bind address (e.g. 127.0.0.1). Only used when combined with PORT
NENYA_CONFIG_DIR /etc/nenya/ Configuration directory path
NENYA_CONFIG_FILE Single config file path (takes precedence over NENYA_CONFIG_DIR)
NENYA_SECRETS_DIR Secrets directory (overrides CREDENTIALS_DIRECTORY)

Navigation

Getting Started

  • Quick Start — Detailed installation and first run
  • Client Setup — Configure OpenCode, Cursor, and other clients
  • Deployment — Bare metal (systemd), container, and Kubernetes guides

Core Concepts

  • Configuration — Full config reference with examples
  • Providers — All 24 providers, capabilities, and special behaviors
  • Routing — Latency-aware routing and fallback chains
  • Architecture — Package overview, request lifecycle, circuit breaker
  • MCP Integration — Model Context Protocol server integration

Reference

Operations

  • Demo — Step-by-step testing of all pipeline tiers
  • Troubleshooting — Common issues and solutions
  • FAQ — Frequently asked questions
  • Security — Security policy and vulnerability reporting

Project

  • Roadmap — Planned features and improvements
  • Disclaimer — Legal disclaimer and usage terms

Nenya on GitHub | Apache 2.0 License

Getting Started

Core Concepts

Reference

Operations

  • Demo — Test all pipeline tiers
  • Troubleshooting — Common issues and solutions
  • FAQ — Frequently asked questions
  • Security — Security policy and vulnerability reporting

Project

Clone this wiki locally