fairvisor · levleontiev · Mar 17, 2026 · Mar 17, 2026 · Mar 17, 2026 · Mar 17, 2026
@@ -46,7 +46,6 @@
 - [CLI](#cli)
 - [SaaS control plane (optional)](#saas-control-plane-optional)
 - [Project layout](#project-layout)
-- [Contributing](#contributing)
 - [License](#license)
 
 ---
@@ -84,16 +83,22 @@ If you have an existing gateway, the question is whether Fairvisor adds anything
 
 **If nginx `limit_req` is enough for you**, use it. It has zero overhead and is the right tool for simple per-IP global throttling. Fairvisor becomes relevant when you need per-tenant awareness, JWT-claim-based bucketing, or cost/token tracking that `limit_req` has no model for.
 
-**If you are already running Kong**, the built-in rate limiting plugin stores counters in Redis or Postgres — every decision is a network call. Fairvisor can run alongside Kong as an `auth_request` decision service with no external state.
+**If you are already running Kong**, the built-in rate limiting plugin stores counters in Redis or Postgres — every decision is a network call. Fairvisor can run alongside Kong as an `auth_request` decision service with no external state. See [Kong / Traefik integration →](https://docs.fairvisor.com/docs/gateway/)
 
-**If you are running Envoy**, the [global rate limit service](https://github.com/envoyproxy/ratelimit) requires deploying a separate Redis-backed service with its own config language. Fairvisor is one container, one JSON file, and integrates via `ext_authz` in the same position.
+**If you are running Envoy**, the [global rate limit service](https://github.com/envoyproxy/ratelimit) requires deploying a separate Redis-backed service with its own config language. Fairvisor is one container, one JSON file, and integrates via `ext_authz` in the same position. See [Envoy ext_authz integration →](https://docs.fairvisor.com/docs/gateway/envoy/)
 
 **If you are on Cloudflare or Akamai**, per-JWT-claim limits, LLM token budgets, and cost caps are not in the platform's model. If your limits are tenant-aware or cost-aware, you need something that runs in your own stack.
 
-Fairvisor integrates *alongside* Kong, nginx, and Envoy — it is not a replacement. See [docs/gateway-integration.md](docs/gateway-integration.md) for integration patterns.
+Fairvisor integrates *alongside* Kong, nginx, and Envoy — it is not a replacement. See [nginx auth_request →](https://docs.fairvisor.com/docs/gateway/nginx/) · [Envoy ext_authz →](https://docs.fairvisor.com/docs/gateway/envoy/) · [Kong / Traefik →](https://docs.fairvisor.com/docs/gateway/) for integration patterns.
 
 ## Quick start
 
+> **Runnable quickstart:** `examples/quickstart/` — `docker compose up -d` and run your first enforce/reject test in under a minute. See [`examples/quickstart/README.md`](examples/quickstart/README.md).
+>
+> **Recipes:** `examples/recipes/` — deployable team budgets, runaway agent guard, and circuit-breaker examples.
+>
+> **Sample artifacts:** `fixtures/` — canonical request/response fixtures for enforce, reject (TPM, TPD, prompt-too-large), and provider-native error bodies (OpenAI, Anthropic, Gemini).
+
 ### 1. Create a policy
 
 ```bash
@@ -156,11 +161,15 @@ curl -s -w "\nHTTP %{http_code}\n" \
 
 ## LLM token budget in 30 seconds
 
+The fastest path is **wrapper mode**: Fairvisor sits in front of the LLM API, enforces budgets, and strips the upstream key from the client. No gateway changes needed — just point your client at Fairvisor instead of OpenAI.
+
+**1. Policy** — one rule, per-org TPM + daily cap:
+
 ```json
 {
   "id": "llm-budget",
   "spec": {
-    "selector": { "pathPrefix": "/v1/chat" },
+    "selector": { "pathPrefix": "/" },
     "mode": "enforce",
     "rules": [
       {
@@ -178,17 +187,39 @@ curl -s -w "\nHTTP %{http_code}\n" \
 }
 ```
 
-Each organization (from the JWT `org_id` claim) gets its own independent 60k TPM / 1.2M TPD budget. Requests over the limit return a `429` with an OpenAI-compatible error body — no client changes needed.
+**2. Call the API** — token format `Bearer <client-jwt>:<upstream-key>`:
+
+```bash
+curl https://your-fairvisor-host/openai/v1/chat/completions   -H "Authorization: Bearer eyJhbGc...:sk-proj-..."   -H "Content-Type: application/json"   -d '{"model":"gpt-4o-mini","messages":[{"role":"user","content":"Hello"}]}'
+```
+
+Fairvisor parses the JWT claims (no signature validation — the JWT is trusted as-is), extracts `org_id`, charges tokens against the budget, strips the `Authorization` header, and forwards with the upstream key. The upstream never sees the client JWT.
+
+When the budget is exhausted:
+
+```http
+HTTP/1.1 429 Too Many Requests
+X-Fairvisor-Reason: tpm_exceeded
+Retry-After: 12
+RateLimit-Limit: 60000
+RateLimit-Remaining: 0
+```
+
+Each organization gets its own independent 60k TPM / 1.2M TPD budget. Works with OpenAI, Anthropic, Azure OpenAI, Mistral, and any OpenAI-compatible endpoint.
+
+The selector matches the incoming wrapper path. Use `pathPrefix: "/"` to cover all providers, or `pathPrefix: "/openai"` to limit to one provider only.
 
-Works with OpenAI, Anthropic, Azure OpenAI, Mistral, and any OpenAI-compatible endpoint.
+> **Decision service / reverse proxy mode:** if you already have a gateway, use `selector: { "pathPrefix": "/v1/chat" }` and call `POST /v1/decision` from your existing `auth_request` or `ext_authz` hook instead.
 
 ## How a request flows
 
 **Decision service mode** — Fairvisor runs as a sidecar. Your existing gateway calls `/v1/decision` via `auth_request` (nginx) or `ext_authz` (Envoy) and handles forwarding itself.
 
 **Reverse proxy mode** — Fairvisor sits inline. Traffic arrives at Fairvisor directly, gets evaluated, and is proxied to the upstream if allowed. No separate gateway needed.
 
-Both modes use the same policy bundle and return the same rejection headers.
+**Wrapper mode** — Fairvisor acts as a transparent LLM proxy. Clients send requests to Fairvisor's OpenAI-compatible endpoint (`/openai/v1/chat/completions`, `/anthropic/v1/messages`, `/gemini/v1/generateContent`). Fairvisor enforces token budgets and cost limits, strips the client auth header, injects the upstream API key, and forwards the request. No changes needed in the client — swap the base URL and you're done.
+
+All three modes use the same policy bundle and return the same rejection headers.
 
 When a request is rejected:
 
@@ -206,40 +237,72 @@ Headers follow [RFC 9333 RateLimit Fields](https://www.rfc-editor.org/rfc/rfc933
 
 ### Architecture
 
-**Decision service mode** (sidecar — your gateway calls `/v1/decision`, handles forwarding itself):
-
-```
- Client ──► Your gateway (nginx / Envoy / Kong)
-                  │
-                  │  POST /v1/decision
-                  │  (auth_request / ext_authz)
-                  ▼
-          ┌─────────────────────┐
-          │   Fairvisor Edge    │
-          │  decision_service   │
-          │                     │
-          │  rule_engine        │
-          │  ngx.shared.dict    │  ◄── no Redis, no network
-          └──────────┬──────────┘
-                     │
-          204 allow  │  429 reject
-                     ▼
-          gateway proxies or returns rejection
+**Decision service mode** — sidecar: your gateway calls `/v1/decision`, handles forwarding itself.
+
+```mermaid
+sequenceDiagram
+    participant C as Client
+    participant G as Your Gateway<br/>(nginx / Envoy / Kong)
+    participant F as Fairvisor Edge<br/>decision_service
+    participant U as Upstream service
+
+    C->>G: Request
+    G->>F: POST /v1/decision<br/>(auth_request / ext_authz)
+    alt allow
+        F-->>G: 204 No Content
+        G->>U: Forward request
+        U-->>G: Response
+        G-->>C: Response
+    else reject
+        F-->>G: 429 + RateLimit headers
+        G-->>C: 429 Too Many Requests
+    end
 ```
 
-**Reverse proxy mode** (inline — Fairvisor handles proxying):
+No Redis, no external state — all counters live in `ngx.shared.dict`.
+
+**Reverse proxy mode** — inline: Fairvisor handles both enforcement and proxying.
+
+```mermaid
+sequenceDiagram
+    participant C as Client
+    participant F as Fairvisor Edge<br/>reverse_proxy
+    participant U as Upstream service
 
+    C->>F: Request
+    alt allow
+        F->>U: Forward request
+        U-->>F: Response
+        F-->>C: Response
+    else reject
+        F-->>C: 429 + RFC 9333 headers
+    end
 ```
- Client ──► Fairvisor Edge (reverse_proxy)
-                  │
-                  │  access.lua → rule_engine
-                  │  ngx.shared.dict
-                  │
-          allow ──► upstream service
-          reject ──► 429 + RFC 9333 headers
+
+**Wrapper mode** — transparent LLM proxy: swap the base URL, no other client changes needed.
+
+```mermaid
+sequenceDiagram
+    participant C as Client
+    participant F as Fairvisor Edge<br/>wrapper
+    participant U as Upstream LLM<br/>(OpenAI / Anthropic / Gemini)
+
+    C->>F: POST /openai/v1/chat/completions<br/>Authorization: Bearer CLIENT_JWT:UPSTREAM_KEY
+    F->>F: 1. Parse JWT claims (org_id, user_id)
+    F->>F: 2. Enforce TPM / TPD / cost budget
+    alt budget ok
+        F->>U: POST /v1/chat/completions<br/>Authorization: Bearer UPSTREAM_KEY
+        U-->>F: 200 OK + token usage
+        F->>F: 3. Count tokens · refund unused reservation
+        F-->>C: 200 OK (Authorization stripped from reply)
+    else budget exceeded
+        F-->>C: 429 X-Fairvisor-Reason: tpm_exceeded
+    end
 ```
 
-Both modes use the same policy bundle and produce the same rejection headers.
+Supported upstream paths: `/openai/*`, `/anthropic/*`, `/gemini/*`, `/grok/*`.
+
+All three modes use the same policy bundle and produce the same rejection headers.
 
 ## Enforcement capabilities
 
@@ -274,17 +337,6 @@ Policies are versioned JSON — commit them to Git, review changes in PRs, roll
 
 ## Performance
 
-### Benchmark methodology (March 2026)
-
-- **Hosts:** 2 × AWS `c7i.xlarge` (4 vCPU, 8 GiB each), cluster placement group, eu-central-1
-- **OS:** Ubuntu 24.04 LTS
-- **Runtime:** OpenResty 1.29.2.1, Fairvisor latest `main` (no Docker)
-- **Load tool:** `k6` v0.54.0, `constant-arrival-rate`, 10,000 RPS for 60s, 10s warmup
-- **Benchmark script:** `run-all.sh` from `fairvisor/benchmark`
-- **Topology:** two-host — Fairvisor and k6 on separate machines (VPC private network)
-- **Decision endpoint contract:** `POST /v1/decision` with `X-Original-Method` and `X-Original-URI`
-- **Note:** reverse proxy numbers include policy evaluation and upstream proxy hop to backend nginx.
-
 ### Latest measured latency @ 10,000 RPS
 
 | Percentile | Decision service | Reverse proxy | Raw nginx (baseline) |
@@ -305,7 +357,8 @@ Policies are versioned JSON — commit them to Git, review changes in PRs, roll
 
 **No external datastore.** All enforcement state lives in in-process shared memory (`ngx.shared.dict`). No Redis, no Postgres, no network round-trips in the decision path.
 
-> Reproduce: `git clone https://github.com/fairvisor/benchmark && cd benchmark && bash run-all.sh`
+Reproduce: see [fairvisor/benchmark](https://github.com/fairvisor/benchmark) — the canonical benchmark source of truth for Fairvisor Edge performance numbers.
+
 
 ## Deployment
 
@@ -349,25 +402,16 @@ If the SaaS is unreachable, the edge keeps enforcing with the last-known policy
 ## Project layout
 
 ```
-src/fairvisor/    runtime modules (OpenResty/LuaJIT)
-cli/              command-line tooling
-spec/             unit and integration tests (busted)
-tests/e2e/        Docker-based E2E tests (pytest)
-examples/         sample policy bundles
-helm/             Helm chart
-docker/           Docker artifacts
-docs/             reference documentation
-```
-
-## Contributing
-
-See [CONTRIBUTING.md](CONTRIBUTING.md). Bug reports, issues, and pull requests welcome.
-
-Run the test suite:
-
-```bash
-busted spec          # unit + integration
-pytest tests/e2e -v  # E2E (requires Docker)
+src/fairvisor/           runtime modules (OpenResty/LuaJIT)
+cli/                     command-line tooling
+spec/                    unit and integration tests (busted)
+tests/e2e/               Docker-based E2E tests (pytest)
+examples/quickstart/     runnable quickstart (docker compose up -d)
+examples/recipes/        deployable policy recipes (team budgets, agent guard, failover)
+fixtures/                canonical request/response sample artifacts
+helm/                    Helm chart
+docker/                  Docker artifacts
+docs/                    reference documentation
 ```
 
 ## License
@@ -377,3 +421,4 @@ pytest tests/e2e -v  # E2E (requires Docker)
 ---
 
 **Docs:** [docs.fairvisor.com](https://docs.fairvisor.com/docs/) · **Website:** [fairvisor.com](https://fairvisor.com) · **Quickstart:** [5 minutes to enforcement](https://docs.fairvisor.com/docs/quickstart/)
+
@@ -0,0 +1,108 @@
+# Fairvisor Edge — Quickstart
+
+Go from `git clone` to working policy enforcement in one step.
+
+## Prerequisites
+
+- Docker with Compose V2 (`docker compose version`)
+- Port 8080 free on localhost
+
+## Start
+
+```bash
+docker compose up -d
+```
+
+Wait for the edge service to report healthy:
+
+```bash
+docker compose ps
+# edge should show "healthy"
+```
+
+## Verify enforcement
+
+This quickstart runs in `FAIRVISOR_MODE=reverse_proxy`. Requests to `/v1/*`
+are enforced by the TPM policy and forwarded to a local mock LLM backend.
+No real API keys are required.
+
+**Allowed request** — should return `200`:
+
+```bash
+curl -s -X POST http://localhost:8080/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -d @../../fixtures/normal_request.json
+```
+
+Expected response body shape matches `../../fixtures/allow_response.json`.
+
+**Over-limit request** — should return `429`:
+
+```bash
+curl -s -X POST http://localhost:8080/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -d @../../fixtures/over_limit_request.json
+```
+
+Expected response body shape: `../../fixtures/reject_tpm_exceeded.json`.
+The response will also include:
+- `X-Fairvisor-Reason: tpm_exceeded`
+- `Retry-After: 60`
+- `RateLimit-Limit: 100` (matches the quickstart policy `tokens_per_minute`)
+- `RateLimit-Remaining: 0`
+
+## How the policy works
+
+The quickstart policy (`policy.json`) enforces a TPM limit keyed on `ip:address`:
+
+- `tokens_per_minute: 100` — allows roughly 2 small requests per minute
+- `tokens_per_day: 1000` — daily cap
+- `default_max_completion: 50` — pessimistic reservation per request when `max_tokens` is not set
+
+Sending `over_limit_request.json` (which sets `max_tokens: 200000`) immediately
+exceeds the 100-token per-minute budget and triggers a `429`.
+
+## Wrapper mode (real provider routing)
+
+Wrapper mode routes requests to real upstream providers using provider-prefixed paths
+and a composite Bearer token. It requires real provider API keys and cannot be
+demonstrated with this mock stack.
+
+**Path and auth format:**
+
+```
+POST /openai/v1/chat/completions
+Authorization: Bearer CLIENT_JWT:UPSTREAM_KEY
+```
+
+Where:
+- `CLIENT_JWT` — signed JWT identifying the calling client/tenant (used for policy enforcement)
+- `UPSTREAM_KEY` — real upstream API key forwarded to the provider (e.g. `sk-...` for OpenAI)
+
+Fairvisor strips the composite header, injects the correct provider auth before forwarding,
+and **never returns upstream auth headers to the caller**
+(see `../../fixtures/allow_response.json`).
+
+**Provider-prefixed paths:**
+
+| Path prefix | Upstream | Auth header injected |
+|---|---|---|
+| `/openai/v1/...` | `https://api.openai.com/v1/...` | `Authorization: Bearer UPSTREAM_KEY` |
+| `/anthropic/v1/...` | `https://api.anthropic.com/v1/...` | `x-api-key: UPSTREAM_KEY` |
+| `/gemini/v1beta/...` | `https://generativelanguage.googleapis.com/v1beta/...` | `x-goog-api-key: UPSTREAM_KEY` |
+
+To run in wrapper mode, change the compose env to `FAIRVISOR_MODE: wrapper` and
+supply real credentials in the `Authorization` header.
+
+## Teardown
+
+```bash
+docker compose down
+```
+
+## Next steps
+
+- See `../recipes/` for team budgets, runaway agent guard, and provider failover scenarios
+- See `../../fixtures/` for all sample request/response artifacts
+- See [fairvisor/benchmark](https://github.com/fairvisor/benchmark) for performance benchmarks
+- See [docs/install/](../../docs/install/) for Kubernetes, VM, and SaaS deployment options