fix(idempotency): gate side-effecting handlers with Valkey claim to prevent redelivery duplicates#212
Conversation
…revent redelivery duplicates Closes #202 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (10)
📝 WalkthroughWalkthroughThis PR implements webhook delivery idempotency using Valkey. A new ChangesWebhook Delivery Idempotency
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related issues
Possibly related PRs
Suggested labels
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Pull request overview
This PR adds a Valkey-backed, delivery-ID–keyed idempotency claim (SET ... NX EX 259200) to prevent GitHub webhook redeliveries from re-running side-effecting webhook handlers (LLM calls, workflow dispatch, and GitHub writes), while remaining fail-open when Valkey is unavailable.
Changes:
- Introduces
claimDelivery(deliveryId, log)and uses it to early-return on redeliveries in the side-effecting webhook handlers. - Updates webhook tests to accommodate the new async gate and adds focused unit tests for
claimDelivery. - Updates docs (CLAUDE.md + architecture docs) to describe the new idempotency mechanism and the
review.tsexemption.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
src/webhook/idempotency.ts |
New Valkey-based delivery claim helper for webhook idempotency with fail-open semantics. |
src/webhook/events/issue-comment.ts |
Adds claimDelivery gate before dispatch/side effects on issue comment triggers. |
src/webhook/events/review-comment.ts |
Adds claimDelivery gate before dispatch/side effects on review comment triggers. |
src/webhook/events/issues.ts |
Adds claimDelivery gate before label-trigger dispatch path for issues. |
src/webhook/events/pull-request.ts |
Adds claimDelivery gate before label-trigger dispatch path for pull requests. |
src/webhook/events/review.ts |
Documents intentional exemption from delivery-claim gating (reactor wake only). |
test/webhook/idempotency.test.ts |
New unit tests covering claim-once, TTL args, and fail-open behaviors. |
test/webhook/events/issues.test.ts |
Adds microtask draining helper to stabilize assertions with the new async gate. |
docs/build/architecture.md |
Updates architecture docs to reflect handler-level delivery claiming. |
CLAUDE.md |
Updates project docs describing idempotency behavior and fail-open semantics. |
…from review Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Summary
Production webhook handlers bypassed the documented idempotency router, so GitHub redeliveries (same
X-GitHub-Deliveryheader, replayed up to 3 days on auto-retry or operator redelivery) re-ran the full dispatch path, producing duplicate LLM charges, duplicateworkflow_runsrows, and duplicate GitHub replies.This PR introduces a durable Valkey-backed claim at the top of each side-effecting handler.
claimDelivery(deliveryId, log)issues aSET <key> 1 NX EX 259200(3-day TTL matching GitHub's redelivery window) and returnstrueexactly once per delivery ID. Redeliveries getfalseand the handler returns immediately. The gate is fail-open: when Valkey is unconfigured, configured-but-disconnected (gated onisValkeyHealthy()so a down connection skips the SET rather than blocking on Bun's default offline queue), or throws, it returnstrueand processing continues, degrading to at-least-once rather than dropping webhooks. Theidx_workflow_runs_inflightpartial-unique index remains the durable backstop when the Valkey claim is skipped.Flow
flowchart TD classDef old fill:#c0392b,color:#ffffff classDef added fill:#1a5276,color:#ffffff classDef gate fill:#1e8449,color:#ffffff classDef neutral fill:#ecf0f1,color:#2c3e50 WHK["Webhook arrives<br/>X-GitHub-Delivery D"]:::neutral OWN["Owner allowlist check"]:::neutral DSP_OLD["BEFORE: dispatch, LLM call,<br/>GitHub write on every delivery"]:::old CLM["claimDelivery<br/>SET D NX EX 259200"]:::gate DRP["Redelivery: return early,<br/>no side effects"]:::added DSP_NEW["First delivery or fail-open:<br/>dispatch, LLM call, GitHub write"]:::added IDX["idx_workflow_runs_inflight<br/>partial-unique index<br/>durable backstop"]:::neutral WHK --> OWN OWN -.->|before, no gate| DSP_OLD OWN --> CLM CLM -->|false| DRP CLM -->|true| DSP_NEW DSP_NEW --> IDXChanges
src/webhook/idempotency.ts(new):claimDelivery(deliveryId, log)wrapsisValkeyHealthy()+ ValkeySET NX EX 259200. Fail-open on null client, disconnected client, and any thrown error.src/webhook/events/issue-comment.ts,events/review-comment.ts, label branches ofevents/issues.tsandevents/pull-request.ts:if (!(await claimDelivery(deliveryId, log))) return;inserted after the owner-allowlist gate, before any LLM or GitHub write.src/webhook/events/review.ts: intentionally ungated. Added a doc comment explaining the exemption (fires only an idempotent reactor wake, no dispatch or write).test/webhook/idempotency.test.ts(new): 6 unit cases covering claim-once, reject-redelivery, independent IDs,NX + EX 259200assertion, fail-open on null client, fail-open on throw, and fail-open without issuing SET when configured-but-disconnected.test/webhook/events/issues.test.ts: added aflushMicrotasksdrain to push past the new async Valkey gate, preventing the now-deferred dispatch from leaking across tests.CLAUDE.md+docs/build/architecture.md: rewrote the Idempotency section to document theclaimDeliveryValkey gate, thereview.tsexemption, fail-open semantics, and theidx_workflow_runs_inflightdurable backstop. Retired the stale in-memoryMap/isAlreadyProcesseddescription (it lives on the deadrouter.ts processRequestpath that production handlers bypass).Out of scope, tracked in issue #211: chat-thread DB re-check,
markRedeliverycatch rewrite, retiring the deadprocessRequestMap.Test plan
test/webhook/idempotency.test.ts(6 cases, 100% coverage ofclaimDelivery)test/webhook/events/issues.test.tsmicrotask-drain update; all webhook tests pass in isolation (CI runs each file in its own process)Closes #202
🤖 Generated with Claude Code
Summary by CodeRabbit
Release Notes
New Features
Documentation