feat(deploy): Phase 3 — per-IP rate limit on canonical demo proxy#315
Merged
Conversation
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Caps anonymous OpenAI spend on demo.cacheplane.ai by limiting POST /api/threads/*/runs/stream to 10 requests per minute per IP. Backed by Neon Postgres (already provisioned) instead of Upstash — saves a vendor. Self-cleaning via inline DELETE+INSERT+COUNT in a single transaction. Fail-open if Postgres is unreachable. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…gration Foundation for Phase 3 per-IP rate limiting on the canonical demo proxy. The migration creates a self-pruning events table (composite index on (ip, ts DESC)). @neondatabase/serverless is the HTTP-fetch driver compatible with Vercel Node serverless functions — no connection pool required. The migration will be applied to the existing Neon database in a separate step controlled by the deploying engineer. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a small helper that runs DELETE+INSERT+COUNT against a rate_limit_events table to enforce a sliding-window per-IP rate limit. Uses @neondatabase/serverless (HTTP fetch driver) so it works in Vercel Node serverless functions without a connection pool. Fail-open by design — if DATABASE_URL is unset at module init or a SQL call throws at runtime, allows the request through and logs a warning. Marketing demo > strict protection during a rare dep outage. 5 unit tests cover: missing env (no-op), below limit, at-limit boundary, over limit (429 + retry-after), SQL throw (fail-open). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a checkRateLimit hook to ProxyConfig. When configured (only on
the demo wrapper), the proxy gates POST /api/threads/{id}/runs/stream
requests through the provided hook before forwarding. Denied requests
get 429 + Retry-After header + JSON body.
Non-gated requests (GET /api/info, POST /api/threads, etc.) bypass
the hook entirely — protection lives only on the path that actually
burns OpenAI tokens.
3 new unit tests cover: non-gated bypass, gated-allowed forwards,
gated-denied returns 429 without calling fetch.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The shared langgraph-proxy factory accepts an optional checkRateLimit hook (added in the previous commit). The demo wrapper now provides it; the examples wrapper stays unset so examples remains unrate- limited (separate decision). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
3 tasks
blove
added a commit
that referenced
this pull request
May 14, 2026
…lue, not inside a quoted literal (#316) Phase 3 (#315) introduced a per-IP rate limit that was a silent no-op in production. Symptom: 12 streaming requests in a row all returned 200; rate_limit_events table had 0 rows. Root cause: the SQL used `interval '${WINDOW_SECONDS} seconds'` inside a tagged-template literal. The @neondatabase/serverless driver substitutes `${...}` placeholders as $N parameters, but parameters cannot appear inside a Postgres string literal. The driver emitted `interval '$2 seconds'` and the planner rejected it with `invalid input syntax for type interval`. The proxy's fail-open catch then allowed the request through. Fix: build `WINDOW_INTERVAL = '60 seconds'` at module load and splice it as a parameterized value cast to ::interval: `ts < now() - ${WINDOW_INTERVAL}::interval` That emits `ts < now() - $2::interval`, which Postgres evaluates correctly. Also added `AND ts > now() - ${WINDOW_INTERVAL}::interval` to the SELECT — the DELETE+SELECT now use the same window boundary so the count can't accidentally include rows that the DELETE didn't yet prune. Smoke-tested against the live Neon DB: Request 1: count=1, allowed=true ... Request 10: count=10, allowed=true Request 11: count=11, allowed=false ← rate limited Request 12: count=12, allowed=false Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Phase 3 of the canonical-demo deployment plan. Caps anonymous OpenAI spend on `demo.cacheplane.ai` by limiting `POST /api/threads/*/runs/stream` to 10 requests per minute per IP. Backed by Neon Postgres (already provisioned for this team) — not Upstash.
Architecture
Fail-open: if `DATABASE_URL` is unset at module load or Postgres throws mid-request, the proxy logs a warning and allows the request through. Marketing demo > strict protection during a rare dependency outage.
External setup (already done)
Spec & Plan
Test plan
🤖 Generated with Claude Code