Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
9473d09
feat(quickstart): add runnable quickstart, recipes, and fixtures (iss…
levleontiev Mar 17, 2026
c608efc
feat(quickstart): add runnable quickstart, recipes, and fixtures (iss…
levleontiev Mar 17, 2026
046ac3b
feat(quickstart): add runnable quickstart, recipes, and fixtures (iss…
levleontiev Mar 17, 2026
7b4cbbe
feat(quickstart): add runnable quickstart, recipes, and fixtures (iss…
levleontiev Mar 17, 2026
a0551ff
feat(quickstart): add runnable quickstart, recipes, and fixtures (iss…
levleontiev Mar 17, 2026
a36312e
feat(quickstart): add runnable quickstart, recipes, and fixtures (iss…
levleontiev Mar 17, 2026
d70e2c4
feat(quickstart): add runnable quickstart, recipes, and fixtures (iss…
levleontiev Mar 17, 2026
489a28a
feat(quickstart): add runnable quickstart, recipes, and fixtures (iss…
levleontiev Mar 17, 2026
53a7035
feat(quickstart): add runnable quickstart, recipes, and fixtures (iss…
levleontiev Mar 17, 2026
3b078d0
feat(quickstart): add runnable quickstart, recipes, and fixtures (iss…
levleontiev Mar 17, 2026
4b4d249
feat(quickstart): add runnable quickstart, recipes, and fixtures (iss…
levleontiev Mar 17, 2026
70ed186
feat(quickstart): add runnable quickstart, recipes, and fixtures (iss…
levleontiev Mar 17, 2026
e03dfcc
feat(quickstart): add runnable quickstart, recipes, and fixtures (iss…
levleontiev Mar 17, 2026
b538176
feat(quickstart): add runnable quickstart, recipes, and fixtures (iss…
levleontiev Mar 17, 2026
f021fd6
feat(quickstart): add runnable quickstart, recipes, and fixtures (iss…
levleontiev Mar 17, 2026
3b54d3f
feat(quickstart): add runnable quickstart, recipes, and fixtures (iss…
levleontiev Mar 17, 2026
e1dd56d
feat(quickstart): add runnable quickstart, recipes, and fixtures (iss…
levleontiev Mar 17, 2026
800c4f9
feat(quickstart): add runnable quickstart, recipes, and fixtures (iss…
levleontiev Mar 17, 2026
f13e641
feat(quickstart): add runnable quickstart, recipes, and fixtures (iss…
levleontiev Mar 17, 2026
fbcf12d
feat(quickstart): add runnable quickstart, recipes, and fixtures (iss…
levleontiev Mar 17, 2026
a93b377
docs(readme): add quickstart pointer, update project layout, fix benc…
levleontiev Mar 17, 2026
a4ad21c
fix(quickstart): fix review issues — reverse_proxy mode, correct conf…
levleontiev Mar 18, 2026
411c6c7
fix(quickstart): fix review issues — reverse_proxy mode, correct conf…
levleontiev Mar 18, 2026
288c9c7
fix(quickstart): fix review issues — reverse_proxy mode, correct conf…
levleontiev Mar 18, 2026
80365c9
fix(quickstart): fix review issues — reverse_proxy mode, correct conf…
levleontiev Mar 18, 2026
e778599
fix(quickstart): fix review issues — reverse_proxy mode, correct conf…
levleontiev Mar 18, 2026
399cd93
fix(quickstart): fix review issues — reverse_proxy mode, correct conf…
levleontiev Mar 18, 2026
e327a0c
fix(quickstart): fix review issues — reverse_proxy mode, correct conf…
levleontiev Mar 18, 2026
ee2ab17
fix(quickstart): fix review issues — reverse_proxy mode, correct conf…
levleontiev Mar 18, 2026
348789e
refactor: replace provider-failover recipe with circuit-breaker
Mar 18, 2026
58fa21e
Merge branch 'main' into feature/issue-32-quickstart
levleontiev Mar 19, 2026
7a13c7f
docs: add wrapper mode to README + integration links in comparison se…
levleontiev Mar 19, 2026
480a409
docs: rewrite LLM token budget section to showcase wrapper mode
levleontiev Mar 19, 2026
c297873
docs: wrapper mode selector pathPrefix "/" covers all providers
levleontiev Mar 19, 2026
989cc04
docs: replace ASCII architecture diagrams with Mermaid sequence diagrams
levleontiev Mar 19, 2026
a2bb9da
docs: fix JWT wording — Fairvisor parses claims, does not validate si…
levleontiev Mar 19, 2026
4759b72
trim README: remove benchmark methodology, Contributing section; fix …
levleontiev Mar 19, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
179 changes: 112 additions & 67 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,6 @@
- [CLI](#cli)
- [SaaS control plane (optional)](#saas-control-plane-optional)
- [Project layout](#project-layout)
- [Contributing](#contributing)
- [License](#license)

---
Expand Down Expand Up @@ -84,16 +83,22 @@ If you have an existing gateway, the question is whether Fairvisor adds anything

**If nginx `limit_req` is enough for you**, use it. It has zero overhead and is the right tool for simple per-IP global throttling. Fairvisor becomes relevant when you need per-tenant awareness, JWT-claim-based bucketing, or cost/token tracking that `limit_req` has no model for.

**If you are already running Kong**, the built-in rate limiting plugin stores counters in Redis or Postgres — every decision is a network call. Fairvisor can run alongside Kong as an `auth_request` decision service with no external state.
**If you are already running Kong**, the built-in rate limiting plugin stores counters in Redis or Postgres — every decision is a network call. Fairvisor can run alongside Kong as an `auth_request` decision service with no external state. See [Kong / Traefik integration →](https://docs.fairvisor.com/docs/gateway/)

**If you are running Envoy**, the [global rate limit service](https://github.com/envoyproxy/ratelimit) requires deploying a separate Redis-backed service with its own config language. Fairvisor is one container, one JSON file, and integrates via `ext_authz` in the same position.
**If you are running Envoy**, the [global rate limit service](https://github.com/envoyproxy/ratelimit) requires deploying a separate Redis-backed service with its own config language. Fairvisor is one container, one JSON file, and integrates via `ext_authz` in the same position. See [Envoy ext_authz integration →](https://docs.fairvisor.com/docs/gateway/envoy/)

**If you are on Cloudflare or Akamai**, per-JWT-claim limits, LLM token budgets, and cost caps are not in the platform's model. If your limits are tenant-aware or cost-aware, you need something that runs in your own stack.

Fairvisor integrates *alongside* Kong, nginx, and Envoy — it is not a replacement. See [docs/gateway-integration.md](docs/gateway-integration.md) for integration patterns.
Fairvisor integrates *alongside* Kong, nginx, and Envoy — it is not a replacement. See [nginx auth_request →](https://docs.fairvisor.com/docs/gateway/nginx/) · [Envoy ext_authz →](https://docs.fairvisor.com/docs/gateway/envoy/) · [Kong / Traefik →](https://docs.fairvisor.com/docs/gateway/) for integration patterns.

## Quick start

> **Runnable quickstart:** `examples/quickstart/` — `docker compose up -d` and run your first enforce/reject test in under a minute. See [`examples/quickstart/README.md`](examples/quickstart/README.md).
>
> **Recipes:** `examples/recipes/` — deployable team budgets, runaway agent guard, and circuit-breaker examples.
>
> **Sample artifacts:** `fixtures/` — canonical request/response fixtures for enforce, reject (TPM, TPD, prompt-too-large), and provider-native error bodies (OpenAI, Anthropic, Gemini).

### 1. Create a policy

```bash
Expand Down Expand Up @@ -156,11 +161,15 @@ curl -s -w "\nHTTP %{http_code}\n" \

## LLM token budget in 30 seconds

The fastest path is **wrapper mode**: Fairvisor sits in front of the LLM API, enforces budgets, and strips the upstream key from the client. No gateway changes needed — just point your client at Fairvisor instead of OpenAI.

**1. Policy** — one rule, per-org TPM + daily cap:

```json
{
"id": "llm-budget",
"spec": {
"selector": { "pathPrefix": "/v1/chat" },
"selector": { "pathPrefix": "/" },
"mode": "enforce",
"rules": [
{
Expand All @@ -178,17 +187,39 @@ curl -s -w "\nHTTP %{http_code}\n" \
}
```

Each organization (from the JWT `org_id` claim) gets its own independent 60k TPM / 1.2M TPD budget. Requests over the limit return a `429` with an OpenAI-compatible error body — no client changes needed.
**2. Call the API** — token format `Bearer <client-jwt>:<upstream-key>`:

```bash
curl https://your-fairvisor-host/openai/v1/chat/completions -H "Authorization: Bearer eyJhbGc...:sk-proj-..." -H "Content-Type: application/json" -d '{"model":"gpt-4o-mini","messages":[{"role":"user","content":"Hello"}]}'
```

Fairvisor parses the JWT claims (no signature validation — the JWT is trusted as-is), extracts `org_id`, charges tokens against the budget, strips the `Authorization` header, and forwards with the upstream key. The upstream never sees the client JWT.

When the budget is exhausted:

```http
HTTP/1.1 429 Too Many Requests
X-Fairvisor-Reason: tpm_exceeded
Retry-After: 12
RateLimit-Limit: 60000
RateLimit-Remaining: 0
```

Each organization gets its own independent 60k TPM / 1.2M TPD budget. Works with OpenAI, Anthropic, Azure OpenAI, Mistral, and any OpenAI-compatible endpoint.

The selector matches the incoming wrapper path. Use `pathPrefix: "/"` to cover all providers, or `pathPrefix: "/openai"` to limit to one provider only.

Works with OpenAI, Anthropic, Azure OpenAI, Mistral, and any OpenAI-compatible endpoint.
> **Decision service / reverse proxy mode:** if you already have a gateway, use `selector: { "pathPrefix": "/v1/chat" }` and call `POST /v1/decision` from your existing `auth_request` or `ext_authz` hook instead.

## How a request flows

**Decision service mode** — Fairvisor runs as a sidecar. Your existing gateway calls `/v1/decision` via `auth_request` (nginx) or `ext_authz` (Envoy) and handles forwarding itself.

**Reverse proxy mode** — Fairvisor sits inline. Traffic arrives at Fairvisor directly, gets evaluated, and is proxied to the upstream if allowed. No separate gateway needed.

Both modes use the same policy bundle and return the same rejection headers.
**Wrapper mode** — Fairvisor acts as a transparent LLM proxy. Clients send requests to Fairvisor's OpenAI-compatible endpoint (`/openai/v1/chat/completions`, `/anthropic/v1/messages`, `/gemini/v1/generateContent`). Fairvisor enforces token budgets and cost limits, strips the client auth header, injects the upstream API key, and forwards the request. No changes needed in the client — swap the base URL and you're done.

All three modes use the same policy bundle and return the same rejection headers.

When a request is rejected:

Expand All @@ -206,40 +237,72 @@ Headers follow [RFC 9333 RateLimit Fields](https://www.rfc-editor.org/rfc/rfc933

### Architecture

**Decision service mode** (sidecar — your gateway calls `/v1/decision`, handles forwarding itself):

```
Client ──► Your gateway (nginx / Envoy / Kong)
│ POST /v1/decision
│ (auth_request / ext_authz)
┌─────────────────────┐
│ Fairvisor Edge │
│ decision_service │
│ │
│ rule_engine │
│ ngx.shared.dict │ ◄── no Redis, no network
└──────────┬──────────┘
204 allow │ 429 reject
gateway proxies or returns rejection
**Decision service mode** — sidecar: your gateway calls `/v1/decision`, handles forwarding itself.

```mermaid
sequenceDiagram
participant C as Client
participant G as Your Gateway<br/>(nginx / Envoy / Kong)
participant F as Fairvisor Edge<br/>decision_service
participant U as Upstream service

C->>G: Request
G->>F: POST /v1/decision<br/>(auth_request / ext_authz)
alt allow
F-->>G: 204 No Content
G->>U: Forward request
U-->>G: Response
G-->>C: Response
else reject
F-->>G: 429 + RateLimit headers
G-->>C: 429 Too Many Requests
end
```

**Reverse proxy mode** (inline — Fairvisor handles proxying):
No Redis, no external state — all counters live in `ngx.shared.dict`.

**Reverse proxy mode** — inline: Fairvisor handles both enforcement and proxying.

```mermaid
sequenceDiagram
participant C as Client
participant F as Fairvisor Edge<br/>reverse_proxy
participant U as Upstream service

C->>F: Request
alt allow
F->>U: Forward request
U-->>F: Response
F-->>C: Response
else reject
F-->>C: 429 + RFC 9333 headers
end
```
Client ──► Fairvisor Edge (reverse_proxy)
│ access.lua → rule_engine
│ ngx.shared.dict
allow ──► upstream service
reject ──► 429 + RFC 9333 headers

**Wrapper mode** — transparent LLM proxy: swap the base URL, no other client changes needed.

```mermaid
sequenceDiagram
participant C as Client
participant F as Fairvisor Edge<br/>wrapper
participant U as Upstream LLM<br/>(OpenAI / Anthropic / Gemini)

C->>F: POST /openai/v1/chat/completions<br/>Authorization: Bearer CLIENT_JWT:UPSTREAM_KEY
F->>F: 1. Parse JWT claims (org_id, user_id)
F->>F: 2. Enforce TPM / TPD / cost budget
alt budget ok
F->>U: POST /v1/chat/completions<br/>Authorization: Bearer UPSTREAM_KEY
U-->>F: 200 OK + token usage
F->>F: 3. Count tokens · refund unused reservation
F-->>C: 200 OK (Authorization stripped from reply)
else budget exceeded
F-->>C: 429 X-Fairvisor-Reason: tpm_exceeded
end
```

Both modes use the same policy bundle and produce the same rejection headers.
Supported upstream paths: `/openai/*`, `/anthropic/*`, `/gemini/*`, `/grok/*`.

All three modes use the same policy bundle and produce the same rejection headers.

## Enforcement capabilities

Expand Down Expand Up @@ -274,17 +337,6 @@ Policies are versioned JSON — commit them to Git, review changes in PRs, roll

## Performance

### Benchmark methodology (March 2026)

- **Hosts:** 2 × AWS `c7i.xlarge` (4 vCPU, 8 GiB each), cluster placement group, eu-central-1
- **OS:** Ubuntu 24.04 LTS
- **Runtime:** OpenResty 1.29.2.1, Fairvisor latest `main` (no Docker)
- **Load tool:** `k6` v0.54.0, `constant-arrival-rate`, 10,000 RPS for 60s, 10s warmup
- **Benchmark script:** `run-all.sh` from `fairvisor/benchmark`
- **Topology:** two-host — Fairvisor and k6 on separate machines (VPC private network)
- **Decision endpoint contract:** `POST /v1/decision` with `X-Original-Method` and `X-Original-URI`
- **Note:** reverse proxy numbers include policy evaluation and upstream proxy hop to backend nginx.

### Latest measured latency @ 10,000 RPS

| Percentile | Decision service | Reverse proxy | Raw nginx (baseline) |
Expand All @@ -305,7 +357,8 @@ Policies are versioned JSON — commit them to Git, review changes in PRs, roll

**No external datastore.** All enforcement state lives in in-process shared memory (`ngx.shared.dict`). No Redis, no Postgres, no network round-trips in the decision path.

> Reproduce: `git clone https://github.com/fairvisor/benchmark && cd benchmark && bash run-all.sh`
Reproduce: see [fairvisor/benchmark](https://github.com/fairvisor/benchmark) — the canonical benchmark source of truth for Fairvisor Edge performance numbers.


## Deployment

Expand Down Expand Up @@ -349,25 +402,16 @@ If the SaaS is unreachable, the edge keeps enforcing with the last-known policy
## Project layout

```
src/fairvisor/ runtime modules (OpenResty/LuaJIT)
cli/ command-line tooling
spec/ unit and integration tests (busted)
tests/e2e/ Docker-based E2E tests (pytest)
examples/ sample policy bundles
helm/ Helm chart
docker/ Docker artifacts
docs/ reference documentation
```

## Contributing

See [CONTRIBUTING.md](CONTRIBUTING.md). Bug reports, issues, and pull requests welcome.

Run the test suite:

```bash
busted spec # unit + integration
pytest tests/e2e -v # E2E (requires Docker)
src/fairvisor/ runtime modules (OpenResty/LuaJIT)
cli/ command-line tooling
spec/ unit and integration tests (busted)
tests/e2e/ Docker-based E2E tests (pytest)
examples/quickstart/ runnable quickstart (docker compose up -d)
examples/recipes/ deployable policy recipes (team budgets, agent guard, failover)
fixtures/ canonical request/response sample artifacts
helm/ Helm chart
docker/ Docker artifacts
docs/ reference documentation
```

## License
Expand All @@ -377,3 +421,4 @@ pytest tests/e2e -v # E2E (requires Docker)
---

**Docs:** [docs.fairvisor.com](https://docs.fairvisor.com/docs/) · **Website:** [fairvisor.com](https://fairvisor.com) · **Quickstart:** [5 minutes to enforcement](https://docs.fairvisor.com/docs/quickstart/)

108 changes: 108 additions & 0 deletions examples/quickstart/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
# Fairvisor Edge — Quickstart

Go from `git clone` to working policy enforcement in one step.

## Prerequisites

- Docker with Compose V2 (`docker compose version`)
- Port 8080 free on localhost

## Start

```bash
docker compose up -d
```

Wait for the edge service to report healthy:

```bash
docker compose ps
# edge should show "healthy"
```

## Verify enforcement

This quickstart runs in `FAIRVISOR_MODE=reverse_proxy`. Requests to `/v1/*`
are enforced by the TPM policy and forwarded to a local mock LLM backend.
No real API keys are required.

**Allowed request** — should return `200`:

```bash
curl -s -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d @../../fixtures/normal_request.json
```

Expected response body shape matches `../../fixtures/allow_response.json`.

**Over-limit request** — should return `429`:

```bash
curl -s -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d @../../fixtures/over_limit_request.json
```

Expected response body shape: `../../fixtures/reject_tpm_exceeded.json`.
The response will also include:
- `X-Fairvisor-Reason: tpm_exceeded`
- `Retry-After: 60`
- `RateLimit-Limit: 100` (matches the quickstart policy `tokens_per_minute`)
- `RateLimit-Remaining: 0`

## How the policy works

The quickstart policy (`policy.json`) enforces a TPM limit keyed on `ip:address`:

- `tokens_per_minute: 100` — allows roughly 2 small requests per minute
- `tokens_per_day: 1000` — daily cap
- `default_max_completion: 50` — pessimistic reservation per request when `max_tokens` is not set

Sending `over_limit_request.json` (which sets `max_tokens: 200000`) immediately
exceeds the 100-token per-minute budget and triggers a `429`.

## Wrapper mode (real provider routing)

Wrapper mode routes requests to real upstream providers using provider-prefixed paths
and a composite Bearer token. It requires real provider API keys and cannot be
demonstrated with this mock stack.

**Path and auth format:**

```
POST /openai/v1/chat/completions
Authorization: Bearer CLIENT_JWT:UPSTREAM_KEY
```

Where:
- `CLIENT_JWT` — signed JWT identifying the calling client/tenant (used for policy enforcement)
- `UPSTREAM_KEY` — real upstream API key forwarded to the provider (e.g. `sk-...` for OpenAI)

Fairvisor strips the composite header, injects the correct provider auth before forwarding,
and **never returns upstream auth headers to the caller**
(see `../../fixtures/allow_response.json`).

**Provider-prefixed paths:**

| Path prefix | Upstream | Auth header injected |
|---|---|---|
| `/openai/v1/...` | `https://api.openai.com/v1/...` | `Authorization: Bearer UPSTREAM_KEY` |
| `/anthropic/v1/...` | `https://api.anthropic.com/v1/...` | `x-api-key: UPSTREAM_KEY` |
| `/gemini/v1beta/...` | `https://generativelanguage.googleapis.com/v1beta/...` | `x-goog-api-key: UPSTREAM_KEY` |

To run in wrapper mode, change the compose env to `FAIRVISOR_MODE: wrapper` and
supply real credentials in the `Authorization` header.

## Teardown

```bash
docker compose down
```

## Next steps

- See `../recipes/` for team budgets, runaway agent guard, and provider failover scenarios
- See `../../fixtures/` for all sample request/response artifacts
- See [fairvisor/benchmark](https://github.com/fairvisor/benchmark) for performance benchmarks
- See [docs/install/](../../docs/install/) for Kubernetes, VM, and SaaS deployment options
Loading
Loading