Skip to content

feat(deploy): Phase 3 — per-IP rate limit on canonical demo proxy#315

Merged
blove merged 6 commits into
mainfrom
claude/canonical-demo-rate-limit
May 14, 2026
Merged

feat(deploy): Phase 3 — per-IP rate limit on canonical demo proxy#315
blove merged 6 commits into
mainfrom
claude/canonical-demo-rate-limit

Conversation

@blove
Copy link
Copy Markdown
Contributor

@blove blove commented May 14, 2026

Summary

Phase 3 of the canonical-demo deployment plan. Caps anonymous OpenAI spend on `demo.cacheplane.ai` by limiting `POST /api/threads/*/runs/stream` to 10 requests per minute per IP. Backed by Neon Postgres (already provisioned for this team) — not Upstash.

Architecture

  • `scripts/rate-limit.ts` — `checkRateLimit(ip)` helper runs DELETE+INSERT+COUNT against a self-pruning `rate_limit_events` table. Uses `@neondatabase/serverless` HTTP driver (no connection pool needed for Vercel serverless).
  • `scripts/langgraph-proxy.ts` — new `ProxyConfig.checkRateLimit` hook. Gates only `POST /api/threads/*/runs/stream` (the path that costs OpenAI tokens). Other endpoints bypass.
  • `scripts/demo-middleware.ts` — wires the hook for the demo wrapper. `scripts/examples-middleware.ts` (cockpit-examples) intentionally unchanged.
  • `migrations/0001_rate_limit_events.sql` — idempotent `CREATE TABLE IF NOT EXISTS` + composite index on `(ip, ts DESC)`.

Fail-open: if `DATABASE_URL` is unset at module load or Postgres throws mid-request, the proxy logs a warning and allows the request through. Marketing demo > strict protection during a rare dependency outage.

External setup (already done)

  • ✅ Neon DB provisioned (reuses the existing `cacheplane` Vercel-Neon integration store)
  • ✅ `DATABASE_URL` auto-set on `cacheplane-demo` project by the integration
  • ✅ Migration applied to the Neon DB via `@neondatabase/serverless`

Spec & Plan

  • `docs/superpowers/specs/2026-05-13-canonical-demo-rate-limit-design.md`
  • `docs/superpowers/plans/2026-05-13-canonical-demo-rate-limit.md`

Test plan

  • 5 unit tests in `scripts/rate-limit.spec.ts` (no env → noop, under limit, at-limit boundary, over limit, fail-open on SQL throw)
  • 3 new tests in `scripts/langgraph-proxy.spec.ts` (non-gated bypass, gated-allowed, gated-denied with 429+Retry-After)
  • Bundle includes `@neondatabase` + 3 `rate_limit_events` SQL references
  • After merge, fire 12 `/runs/stream` requests from one IP — expect first 10 = 200, last 2 = 429 with `Retry-After: 60`

🤖 Generated with Claude Code

blove and others added 6 commits May 13, 2026 21:23
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Caps anonymous OpenAI spend on demo.cacheplane.ai by limiting
POST /api/threads/*/runs/stream to 10 requests per minute per IP.
Backed by Neon Postgres (already provisioned) instead of Upstash —
saves a vendor. Self-cleaning via inline DELETE+INSERT+COUNT in a
single transaction. Fail-open if Postgres is unreachable.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…gration

Foundation for Phase 3 per-IP rate limiting on the canonical demo
proxy. The migration creates a self-pruning events table (composite
index on (ip, ts DESC)). @neondatabase/serverless is the HTTP-fetch
driver compatible with Vercel Node serverless functions — no
connection pool required.

The migration will be applied to the existing Neon database in a
separate step controlled by the deploying engineer.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a small helper that runs DELETE+INSERT+COUNT against a
rate_limit_events table to enforce a sliding-window per-IP rate
limit. Uses @neondatabase/serverless (HTTP fetch driver) so it
works in Vercel Node serverless functions without a connection
pool.

Fail-open by design — if DATABASE_URL is unset at module init or a
SQL call throws at runtime, allows the request through and logs a
warning. Marketing demo > strict protection during a rare dep
outage.

5 unit tests cover: missing env (no-op), below limit, at-limit
boundary, over limit (429 + retry-after), SQL throw (fail-open).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a checkRateLimit hook to ProxyConfig. When configured (only on
the demo wrapper), the proxy gates POST /api/threads/{id}/runs/stream
requests through the provided hook before forwarding. Denied requests
get 429 + Retry-After header + JSON body.

Non-gated requests (GET /api/info, POST /api/threads, etc.) bypass
the hook entirely — protection lives only on the path that actually
burns OpenAI tokens.

3 new unit tests cover: non-gated bypass, gated-allowed forwards,
gated-denied returns 429 without calling fetch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The shared langgraph-proxy factory accepts an optional checkRateLimit
hook (added in the previous commit). The demo wrapper now provides
it; the examples wrapper stays unset so examples remains unrate-
limited (separate decision).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@vercel
Copy link
Copy Markdown

vercel Bot commented May 14, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
cacheplane Ready Ready Preview, Comment May 14, 2026 5:33am

Request Review

@blove blove merged commit 6465a98 into main May 14, 2026
16 checks passed
blove added a commit that referenced this pull request May 14, 2026
…lue, not inside a quoted literal (#316)

Phase 3 (#315) introduced a per-IP rate limit that was a silent
no-op in production. Symptom: 12 streaming requests in a row all
returned 200; rate_limit_events table had 0 rows.

Root cause: the SQL used `interval '${WINDOW_SECONDS} seconds'`
inside a tagged-template literal. The @neondatabase/serverless
driver substitutes `${...}` placeholders as $N parameters, but
parameters cannot appear inside a Postgres string literal. The
driver emitted `interval '$2 seconds'` and the planner rejected it
with `invalid input syntax for type interval`. The proxy's
fail-open catch then allowed the request through.

Fix: build `WINDOW_INTERVAL = '60 seconds'` at module load and
splice it as a parameterized value cast to ::interval:
  `ts < now() - ${WINDOW_INTERVAL}::interval`
That emits `ts < now() - $2::interval`, which Postgres evaluates
correctly.

Also added `AND ts > now() - ${WINDOW_INTERVAL}::interval` to the
SELECT — the DELETE+SELECT now use the same window boundary so the
count can't accidentally include rows that the DELETE didn't yet
prune.

Smoke-tested against the live Neon DB:
  Request 1: count=1, allowed=true
  ...
  Request 10: count=10, allowed=true
  Request 11: count=11, allowed=false  ← rate limited
  Request 12: count=12, allowed=false

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant