-
-
Notifications
You must be signed in to change notification settings - Fork 0
Home
Rafael Gumieri edited this page Jun 15, 2026
·
5 revisions
A lightweight, zero-dependency AI API Gateway written in Go. Nenya sits between your AI coding clients and upstream LLM providers, adding secret redaction, context management, agent routing, and MCP tool integration — all with transparent SSE streaming. Security-hardened: non-root execution, mlock for secrets, seccomp + no-new-privileges.
Compatible with any provider that implements the OpenAI or Anthropic Chat Completions API. For 24 providers we ship built-in adapters with specialized handling.
- Config-driven provider registry — add providers via JSON, zero code changes
- 24 built-in providers with specialized adapters for wire format differences
- Dynamic model discovery — fetches live model catalogs from providers at startup and on reload
- Model registry — reference models by string shorthand with automatic provider/context resolution
- Multi-provider model resolution — when a model exists in multiple providers, all are added to the agent's fallback chain
- Three-tier model resolution — config overrides > discovered models > static registry
- Per-model wire format — models from multi-format gateways (OpenCode Zen) auto-convert between OpenAI, Anthropic, and Gemini wire formats
- Agent fallback chains — round-robin or sequential with circuit breaker and automatic failover
- Latency-aware routing — auto-reorder targets by historical median response time with ±5% jitter to prevent thundering herd
- Multi-account provider selection — LRU-based account pool with per-account credential, error classification, and backoff
- Billing-aware routing — per-account spend tracking, quota polling (API/headers), exhaustion filtering, free model scoring bonuses, and per-agent budget limits
- Per-agent system prompts — inline or file-based
- Tier-0 regex secret filter — always-on redaction of AWS keys, GitHub tokens, passwords, etc.
- Pluggable Interceptor Chain — priority-ordered pipeline (regex redaction, entropy filtering, TF-IDF relevance scoring, engine summarization) with per-step metrics
- 3-Tier content pipeline — pass-through, engine summarization, or TF-IDF relevance-scored truncation
- Context window compaction — sliding window summarization with configurable engine
- Stale tool call pruning — compact old assistant+tool response pairs to save tokens
- Thought pruning — strip reasoning blocks from assistant message history
- Input validation — strict body limits, JSON sanitization, header filtering
- Graceful degradation — never blocks requests due to engine or pipeline failures
- Role-Based Access Control (RBAC) — per-API key roles (admin, user, read-only) with agent and endpoint restrictions
- Secure memory — mlock-protected token storage, read-only sealing, core dump prevention
- Secure memory (default): All tokens stored in mlock-protected RAM, sealed read-only after init, core dumps disabled
- Non-root execution — runs as UID 65532 with dropped capabilities
-
Memory protection —
LimitMEMLOCK=infinityandLimitCORE=0in systemd -
Read-only filesystem — immutable root + private
/tmp - Seccomp + no-new-privileges — restricted syscalls, prevents privilege escalation
- Zero-trust secrets — loaded via systemd credentials or container mounts, never to disk
- Socket activation — seamless restarts with zero dropped connections
- Zero external dependencies — Go standard library only
-
Hot reload —
systemctl reload nenyafor zero-downtime config changes - Circuit breaker — per agent+provider+model with automatic failover, exponential backoff, and semantic error classification
- Rate limiting — per upstream host (RPM/TPM) with per-provider overrides
- Response cache — in-memory LRU with SHA-256 fingerprinting
- Context-Limit Auto-Retry — automatic summarization via engine and retry on upstream context-length errors
- Local Engine Lifecycle — preload and unload local Ollama models with LRU eviction
- Graceful shutdown — 30s grace period for in-flight requests, MCP client cleanup
- Tool discovery — connect to MCP servers for automatic tool injection
- Multi-turn execution — intercept tool calls, execute against MCP servers, forward results
- Auto-search — pre-fetch relevant context from MCP servers before forwarding
- Auto-save — persist assistant responses to MCP memory servers
+----------------------------------------------+
| Client (Cursor / OpenCode / Aider / etc.) |
| OpenAI-compatible request |
| POST /v1/chat/completions + Bearer token |
+----------------------------------------------+
|
v
+----------------------------------------------+
| Nenya Gateway |
| - auth check |
| - parse JSON + extract model |
| - resolve agent/provider |
| - optional cache (HIT => replay cached response in correct format) |
| - optional MCP context/tool injection |
+----------------------------------------------+
|
v
+----------------------------------------------+
| Privacy / Context Pipeline (best-effort) |
| - Tier-0 regex + entropy secret redaction |
| - compaction / pruning / window mgmt |
| - engine summarize (usually local Ollama) |
+----------------------------------------------+
|
v
+----------------------------------------------+
| Routing |
| A) Standard forwarding |
| - fallback chain + circuit breaker + RL |
| B) MCP multi-turn tool loop (if enabled) |
| - buffer SSE, execute MCP tools, re-send |
+----------------------------------------------+
|
v
+----------------------------------------------+
| Upstream LLM Providers |
| Anthropic | Gemini | DeepSeek | Mistral | ...|
+----------------------------------------------+
|
| SSE stream
v
+----------------------------------------------+
| Nenya SSE Pipeline |
| - adapter response transforms |
| - usage accounting + stream filter |
| - flush + (optional) cache capture |
| - (optional) MCP auto-save |
+----------------------------------------------+
|
v
+----------------------------------------------+
| Client receives transparent SSE or JSON output |
+----------------------------------------------+
curl -fsSL https://raw.githubusercontent.com/gumieri/nenya/main/install.sh | sudo shCreate minimal config and secrets:
mkdir -p config secrets
cat > config/config.json << 'EOF'
{
"server": { "listen_addr": ":8080" },
"agents": {
"default": {
"strategy": "fallback",
"models": ["gemini-3-flash"]
}
}
}
EOF
cat > secrets/provider_keys.json << 'EOF'
{
"provider_keys": {
"gemini": "AIza..."
}
}
EOF
cat > secrets/client.json << 'EOF'
{
"client_token": "nk-$(openssl rand -hex 32)"
}
EOFRun the container:
podman run -d \
--name nenya \
-p 8080:8080 \
-v ./config:/etc/nenya:ro \
-v ./secrets:/run/secrets/nenya:ro \
-e NENYA_SECRETS_DIR=/run/secrets/nenya \
--cap-drop=ALL \
--cap-add=IPC_LOCK \
--security-opt=no-new-privileges:true \
--read-only \
--tmpfs /tmp:rw,noexec,nosuid,size=64M \
ghcr.io/gumieri/nenya:latestTest it:
curl -H "Authorization: Bearer $(jq -r '.client_token' secrets/client.json)" \
http://localhost:8080/healthz| Variable | Default | Description |
|---|---|---|
PORT |
8080 |
Listening port (overrides server.listen_addr) |
HOST |
— | Optional bind address (e.g. 127.0.0.1). Only used when combined with PORT
|
NENYA_CONFIG_DIR |
/etc/nenya/ |
Configuration directory path |
NENYA_CONFIG_FILE |
— | Single config file path (takes precedence over NENYA_CONFIG_DIR) |
NENYA_SECRETS_DIR |
— | Secrets directory (overrides CREDENTIALS_DIRECTORY) |
- Quick Start — Detailed installation and first run
- Client Setup — Configure OpenCode, Cursor, and other clients
- Deployment — Bare metal (systemd), container, and Kubernetes guides
- Configuration — Full config reference with examples
- Providers — All 24 providers, capabilities, and special behaviors
- Routing — Latency-aware routing and fallback chains
- Architecture — Package overview, request lifecycle, circuit breaker
- MCP Integration — Model Context Protocol server integration
-
API Endpoints — Endpoint reference with auth requirements (includes
/v1/messagesfor Anthropic clients) - Passthrough Proxy — Raw provider endpoint proxying
- Secrets — Systemd credentials, env var fallback, container/K8s deployment
- Model Discovery — Dynamic model catalog fetching
- Adapters — Provider adapter system
- Billing — Billing-aware routing and quota tracking
- Caching — Exact-match and semantic caching
- Provider Capabilities — Service kinds matrix
- Unknown MaxContext — Unknown context window behavior
- Demo — Step-by-step testing of all pipeline tiers
- Troubleshooting — Common issues and solutions
- FAQ — Frequently asked questions
- Security — Security policy and vulnerability reporting
- Roadmap — Planned features and improvements
- Disclaimer — Legal disclaimer and usage terms
Nenya on GitHub | Apache 2.0 License
Getting Started
- Home — Project overview
- Quick Start — Install and run in 5 minutes
- Client Setup — OpenCode, Cursor, and other clients
- Deployment — Bare metal, container, Kubernetes
Core Concepts
- Configuration — Config reference and examples
- Providers — 24 providers, capabilities, special behaviors
- Routing — Latency-aware routing and fallback chains
- Architecture — Package overview and request lifecycle
- MCP Integration — MCP server integration
Reference
- Passthrough Proxy — Raw provider endpoint proxying
- Secrets — Systemd credentials and container secrets
- Model Discovery — Dynamic model catalog fetching
- API Endpoints — Endpoint reference
- Adapters — Provider adapter system
- Billing — Billing-aware routing and quota tracking
- Caching — Exact-match and semantic caching
- Provider Capabilities — Service kinds matrix
- Unknown MaxContext — Unknown context window behavior
Operations
- Demo — Test all pipeline tiers
- Troubleshooting — Common issues and solutions
- FAQ — Frequently asked questions
- Security — Security policy and vulnerability reporting
Project
- Roadmap — Planned features
- Disclaimer — Legal disclaimer