You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The webhook idempotency claim function claimDelivery at src/webhook/idempotency.ts:36 has three terminal outcomes but emits a structured event field on only one of them, breaking parity with every other recently-added structured event family in this repo (retry.* issue #225, dispatcher.offer.* issue #161, github.api.*src/utils/octokit-observability.ts:29, pipeline.stage issue #166).
Concretely, walking the four code paths:
src/webhook/idempotency.ts:46-49 — Valkey unconfigured or unhealthy → log.warn({ deliveryId }, "claimDelivery: Valkey unavailable, proceeding (fail-open)"). No event tag, no reason discriminator.
src/webhook/idempotency.ts:61 — SET … NX EX returned "OK" (claim won, the happy path). No log line at all. An operator cannot count successful claims, so the dedup-hit-rate denominator is invisible.
src/webhook/idempotency.ts:62-66 — SET … NX EX returned null (duplicate redelivery). Emits event: "dedup-skip" — the one structured tag, but the verb breaks the dotted-namespace convention every other observability issue has standardised on (retry.attempt_failed, pipeline.completed, dispatcher.offer.sent).
src/webhook/idempotency.ts:67-72 — Bun RedisClient.send threw → log.warn(..., "claimDelivery: Valkey error, proceeding (fail-open)"). No event tag, no reason discriminator, and indistinguishable from outcome 1 in a structured query without parsing the free-text message.
This is exactly the same gap that #225 closed for retryWithBackoff and #207 closed for the dispatcher: every terminal outcome of a load-bearing chokepoint needs an enumerable event so operator queries (event:idempotency.duplicate_skipped) can compute rates without resorting to message-field regex.
The fix is the established template (mirroring src/utils/retry-log-fields.ts): add src/webhook/idempotency-log-fields.ts with a z.strictObjectIdempotencyLogFieldsSchema discriminated on event over the four-tag family idempotency.{claimed, duplicate_skipped, failed_open} (with reason: "unavailable" | "error" discriminator on failed_open), wire it into claimDelivery's four emit sites, retire the legacy dedup-skip literal, and document the new fields under the "Idempotency log fields" heading in docs/operate/observability.md (the existing table at lines 22-47 is the canonical place).
Diagram
flowchart TD
WH["Webhook delivery<br/>X-GitHub-Delivery"] --> CD["claimDelivery<br/>idempotency.ts line 36"]
CD --> VC{"Valkey configured<br/>AND healthy?"}
VC -->|no| FOU["fail-open path<br/>line 47<br/>current: warn msg only<br/>proposed: idempotency.failed_open<br/>reason=unavailable"]
VC -->|yes| SNX["SET key NX EX 259200<br/>lines 54-60"]
SNX -->|reply OK| WON["claim won<br/>line 61<br/>current: NO LOG<br/>proposed: idempotency.claimed"]
SNX -->|reply null| DUP["duplicate redelivery<br/>lines 62-65<br/>current: event=dedup-skip<br/>proposed: idempotency.duplicate_skipped"]
SNX -->|throws| FOE["fail-open path<br/>lines 67-71<br/>current: warn msg only<br/>proposed: idempotency.failed_open<br/>reason=error"]
WON --> HOK["handler proceeds<br/>side effects run"]
FOU --> HOK
FOE --> HOK
DUP --> HSK["handler returns<br/>NO side effects"]
classDef cTrigger fill:#1f4e79,color:#ffffff
classDef cCall fill:#196f3d,color:#ffffff
classDef cDecision fill:#7d6608,color:#ffffff
classDef cOk fill:#196f3d,color:#ffffff
classDef cDup fill:#1f4e79,color:#ffffff
classDef cFailopen fill:#943126,color:#ffffff
classDef cHandler fill:#4a235a,color:#ffffff
class WH cTrigger
class CD cCall
class SNX cCall
class VC cDecision
class WON cOk
class DUP cDup
class FOU cFailopen
class FOE cFailopen
class HOK cHandler
class HSK cHandler
Loading
Rationale
claimDelivery is one of two load-bearing dedup defences (the other being the idx_workflow_runs_inflight partial-unique index, per the "Idempotency" section of CLAUDE.md). Without structured outcome tags, the operator queries that would matter on an incident are impossible:
"Dedup hit rate" = count(event:idempotency.duplicate_skipped) / count(event:idempotency.{claimed,duplicate_skipped}). With outcome 2 emitting nothing, the denominator can only be reconstructed by counting downstream pipeline starts — fragile and lossy. A sudden spike in this rate would indicate GitHub redeliveries (manual operator replay, automatic retry, or webhook misconfiguration causing the same delivery to land multiple times) and is exactly the kind of operational signal observability is for.
"Fail-open rate" = count(event:idempotency.failed_open) / total claims. With the unavailable and error paths sharing only a free-text message, an alert like "fail-open rate > 1% sustained for 10m" requires message:~"fail-open" regex matching, which is brittle and breaks the field-vs-message-content separation the retry.* schema (src/utils/retry-log-fields.ts:48-95) deliberately enforces.
reason discriminator matters because unavailable (Valkey unset/disconnected → infrastructure config) and error (Bun RedisClient.send threw mid-flight → transient network or server-side error) have completely different runbook responses but are currently the same warn line.
The patch surface is tiny: one new file (the Zod schema, ~40 lines mirroring retry-log-fields.ts), four edits in idempotency.ts (two new emit sites, two reshapes), and a co-located test asserting field parity (the existing pattern in src/utils/retry-log-fields.test.ts). No new npm dependencies, no runtime cost on the hot path (the schema is a drift-prevention contract, not a runtime validator — same policy as RetryLogFieldsSchema).
Volume policy follows precedent: idempotency.claimed at debug (one per webhook would be too loud at info for a busy installation, mirroring github.api.request at line 117 of src/utils/octokit-observability.ts); idempotency.duplicate_skipped at info (the operational signal of interest); idempotency.failed_open at warn (degraded mode that should page if sustained).
References
Internal:
src/webhook/idempotency.ts:36-74 — the claimDelivery function; the four code paths cited above.
src/utils/retry-log-fields.ts:34-96 — the canonical template for an event-discriminated .strict() Zod schema for a structured event family (the proposed idempotency-log-fields.ts would mirror this file directly).
src/utils/octokit-observability.ts:29-54 — second template, with the as const event-keys map plus .strict() schema pattern, including the volume-policy precedent (debug for the per-request line at line 117, warn for low-quota and rate-limit-error lines).
src/orchestrator/log-fields.ts:29-66 — third template, dispatcher offer events, also discriminated on event, also pairs each event with queue_wait_ms correlation fields (precedent for adding correlation scalars like deliveryId to the schema).
docs/operate/observability.md:22-47 — the canonical log-fields table; the proposed idempotency.* rows would land here alongside the existing retry.*, dispatcher.offer.*, and github.api.* rows.
CLAUDE.md — "Idempotency" bullet under "Key Concepts" describes the fail-open contract; the proposed reason discriminator gives operators the structured way to distinguish the two failure modes that the prose already names.
Webhook Idempotency and Deduplication (Hooklistener, 2026) — establishes the standard observability surface for a dedup layer ("idempotency hits per source to understand expected vs. anomalous patterns") that this proposal makes computable.
How to Implement Webhook Idempotency (Hookdeck) — confirms the SET-NX-EX pattern used here and recommends logging the unique event ID alongside processing status / deduplication actions for the audit trail (which the proposed idempotency.claimed + idempotency.duplicate_skipped events provide as a stable schema).
Handling webhook deliveries (GitHub Docs) — confirms the X-GitHub-Delivery ID is the canonical dedup key and is preserved across manual redeliveries, justifying the 3-day TTL the schema fields will accompany.
Add src/webhook/idempotency-log-fields.ts mirroring src/utils/retry-log-fields.ts: an IDEMPOTENCY_LOG_EVENTS = { claimed, duplicate_skipped, failed_open } as const map and a z.discriminatedUnion("event", [...]) schema whose failed_open branch carries reason: z.enum(["unavailable", "error"]) and the shared branch shape includes deliveryId: z.string().min(1) plus a ttl_seconds: z.number().int().positive() field on claimed so the TTL is greppable per emit.
Update src/webhook/idempotency.ts to (a) emit idempotency.claimed at debug on the line-61 success branch, (b) replace the line-47 warn with idempotency.failed_open at warn with reason: "unavailable", (c) replace the line-63 event: "dedup-skip" with event: "idempotency.duplicate_skipped" at info, and (d) replace the line-68 warn with idempotency.failed_open at warn with reason: "error". Keep the human-readable log message text alongside for log-grep ergonomics; the event field is the structured-query handle.
Add src/webhook/idempotency-log-fields.test.ts asserting each emit site's pino-event shape passes IdempotencyLogFieldsSchema.parse (the colocated-test pattern used by src/utils/retry-log-fields.test.ts and gated by bun run check:test-globs, see CLAUDE.md "Code Conventions").
Document the new fields in docs/operate/observability.md: add idempotency.{claimed, duplicate_skipped, failed_open} rows to the field table (lines 22-47) and a short "Idempotency log fields" subsection after "Retry log fields", noting the volume policy (debug / info / warn) and the reason discriminator semantics. The check:docs-citations script enforces the cited line numbers stay accurate.
Areas Evaluated
src/webhook/idempotency.ts — the four code paths of claimDelivery and their current log shape; the file's contract docstring at lines 5-15.
All five callers of claimDelivery: src/webhook/events/issue-comment.ts:92, src/webhook/events/issues.ts:100, src/webhook/events/pull-request.ts:184, src/webhook/events/review-comment.ts:117 (gated paths) and the doc-only mention in src/webhook/events/review.ts:14 (un-gated by design).
src/orchestrator/valkey.ts:28-58 — getValkeyClient / isValkeyHealthy semantics that drive the line-46 healthy-or-not branch in claimDelivery.
src/utils/retry-log-fields.ts and src/utils/octokit-observability.ts as schema templates; src/orchestrator/log-fields.ts as a third dispatcher-side template for an event-discriminated family.
src/logger.ts — the root logger config and the createChildLogger API the handlers already build; no logger-side change is needed for this proposal because the events would flow through the existing child logger.
Closed and open research-labelled observability issues (gh issue list --label research --state all) to confirm no overlap; the two idempotency-labelled prior issues (fix(idempotency): production webhook handlers bypass router dedup and feat(idempotency): close duplicate-processing window) are feature work on the correctness of the layer, not on its observability surface.
docs/operate/observability.md:22-47 log-fields table to identify the canonical insertion point for the new field rows.
Generated by the scheduled research action on 2026-06-14
Finding
The webhook idempotency claim function
claimDeliveryatsrc/webhook/idempotency.ts:36has three terminal outcomes but emits a structuredeventfield on only one of them, breaking parity with every other recently-added structured event family in this repo (retry.*issue #225,dispatcher.offer.*issue #161,github.api.*src/utils/octokit-observability.ts:29,pipeline.stageissue #166).Concretely, walking the four code paths:
src/webhook/idempotency.ts:46-49— Valkey unconfigured or unhealthy →log.warn({ deliveryId }, "claimDelivery: Valkey unavailable, proceeding (fail-open)"). Noeventtag, noreasondiscriminator.src/webhook/idempotency.ts:61—SET … NX EXreturned"OK"(claim won, the happy path). No log line at all. An operator cannot count successful claims, so the dedup-hit-rate denominator is invisible.src/webhook/idempotency.ts:62-66—SET … NX EXreturnednull(duplicate redelivery). Emitsevent: "dedup-skip"— the one structured tag, but the verb breaks the dotted-namespace convention every other observability issue has standardised on (retry.attempt_failed,pipeline.completed,dispatcher.offer.sent).src/webhook/idempotency.ts:67-72— BunRedisClient.sendthrew →log.warn(..., "claimDelivery: Valkey error, proceeding (fail-open)"). Noeventtag, noreasondiscriminator, and indistinguishable from outcome 1 in a structured query without parsing the free-text message.This is exactly the same gap that #225 closed for
retryWithBackoffand #207 closed for the dispatcher: every terminal outcome of a load-bearing chokepoint needs an enumerableeventso operator queries (event:idempotency.duplicate_skipped) can compute rates without resorting to message-field regex.The fix is the established template (mirroring
src/utils/retry-log-fields.ts): addsrc/webhook/idempotency-log-fields.tswith az.strictObjectIdempotencyLogFieldsSchemadiscriminated oneventover the four-tag familyidempotency.{claimed, duplicate_skipped, failed_open}(withreason: "unavailable" | "error"discriminator onfailed_open), wire it intoclaimDelivery's four emit sites, retire the legacydedup-skipliteral, and document the new fields under the "Idempotency log fields" heading indocs/operate/observability.md(the existing table at lines 22-47 is the canonical place).Diagram
flowchart TD WH["Webhook delivery<br/>X-GitHub-Delivery"] --> CD["claimDelivery<br/>idempotency.ts line 36"] CD --> VC{"Valkey configured<br/>AND healthy?"} VC -->|no| FOU["fail-open path<br/>line 47<br/>current: warn msg only<br/>proposed: idempotency.failed_open<br/>reason=unavailable"] VC -->|yes| SNX["SET key NX EX 259200<br/>lines 54-60"] SNX -->|reply OK| WON["claim won<br/>line 61<br/>current: NO LOG<br/>proposed: idempotency.claimed"] SNX -->|reply null| DUP["duplicate redelivery<br/>lines 62-65<br/>current: event=dedup-skip<br/>proposed: idempotency.duplicate_skipped"] SNX -->|throws| FOE["fail-open path<br/>lines 67-71<br/>current: warn msg only<br/>proposed: idempotency.failed_open<br/>reason=error"] WON --> HOK["handler proceeds<br/>side effects run"] FOU --> HOK FOE --> HOK DUP --> HSK["handler returns<br/>NO side effects"] classDef cTrigger fill:#1f4e79,color:#ffffff classDef cCall fill:#196f3d,color:#ffffff classDef cDecision fill:#7d6608,color:#ffffff classDef cOk fill:#196f3d,color:#ffffff classDef cDup fill:#1f4e79,color:#ffffff classDef cFailopen fill:#943126,color:#ffffff classDef cHandler fill:#4a235a,color:#ffffff class WH cTrigger class CD cCall class SNX cCall class VC cDecision class WON cOk class DUP cDup class FOU cFailopen class FOE cFailopen class HOK cHandler class HSK cHandlerRationale
claimDeliveryis one of two load-bearing dedup defences (the other being theidx_workflow_runs_inflightpartial-unique index, per the "Idempotency" section ofCLAUDE.md). Without structured outcome tags, the operator queries that would matter on an incident are impossible:count(event:idempotency.duplicate_skipped) / count(event:idempotency.{claimed,duplicate_skipped}). With outcome 2 emitting nothing, the denominator can only be reconstructed by counting downstream pipeline starts — fragile and lossy. A sudden spike in this rate would indicate GitHub redeliveries (manual operator replay, automatic retry, or webhook misconfiguration causing the same delivery to land multiple times) and is exactly the kind of operational signal observability is for.count(event:idempotency.failed_open) / total claims. With the unavailable and error paths sharing only a free-text message, an alert like "fail-open rate > 1% sustained for 10m" requiresmessage:~"fail-open"regex matching, which is brittle and breaks the field-vs-message-content separation theretry.*schema (src/utils/retry-log-fields.ts:48-95) deliberately enforces.reasondiscriminator matters becauseunavailable(Valkey unset/disconnected → infrastructure config) anderror(BunRedisClient.sendthrew mid-flight → transient network or server-side error) have completely different runbook responses but are currently the same warn line.The patch surface is tiny: one new file (the Zod schema, ~40 lines mirroring
retry-log-fields.ts), four edits inidempotency.ts(two new emit sites, two reshapes), and a co-located test asserting field parity (the existing pattern insrc/utils/retry-log-fields.test.ts). No new npm dependencies, no runtime cost on the hot path (the schema is a drift-prevention contract, not a runtime validator — same policy asRetryLogFieldsSchema).Volume policy follows precedent:
idempotency.claimedatdebug(one per webhook would be too loud atinfofor a busy installation, mirroringgithub.api.requestat line 117 ofsrc/utils/octokit-observability.ts);idempotency.duplicate_skippedatinfo(the operational signal of interest);idempotency.failed_openatwarn(degraded mode that should page if sustained).References
Internal:
src/webhook/idempotency.ts:36-74— theclaimDeliveryfunction; the four code paths cited above.src/utils/retry-log-fields.ts:34-96— the canonical template for anevent-discriminated.strict()Zod schema for a structured event family (the proposedidempotency-log-fields.tswould mirror this file directly).src/utils/octokit-observability.ts:29-54— second template, with theas constevent-keys map plus.strict()schema pattern, including the volume-policy precedent (debugfor the per-request line at line 117,warnfor low-quota and rate-limit-error lines).src/orchestrator/log-fields.ts:29-66— third template, dispatcher offer events, also discriminated onevent, also pairs each event withqueue_wait_mscorrelation fields (precedent for adding correlation scalars likedeliveryIdto the schema).docs/operate/observability.md:22-47— the canonical log-fields table; the proposedidempotency.*rows would land here alongside the existingretry.*,dispatcher.offer.*, andgithub.api.*rows.CLAUDE.md— "Idempotency" bullet under "Key Concepts" describes the fail-open contract; the proposedreasondiscriminator gives operators the structured way to distinguish the two failure modes that the prose already names.External:
idempotency.claimed+idempotency.duplicate_skippedevents provide as a stable schema).X-GitHub-DeliveryID is the canonical dedup key and is preserved across manual redeliveries, justifying the 3-day TTL the schema fields will accompany.event-name pattern the proposal extends (the closedretry.*issue feat(observability): add structured retry.* events #225 already citeshttp.request.resend_countas semantic precedent forattempt; anidempotency.duplicate_skippedevent slots into the same convention).Suggested Next Steps
src/webhook/idempotency-log-fields.tsmirroringsrc/utils/retry-log-fields.ts: anIDEMPOTENCY_LOG_EVENTS = { claimed, duplicate_skipped, failed_open } as constmap and az.discriminatedUnion("event", [...])schema whosefailed_openbranch carriesreason: z.enum(["unavailable", "error"])and the shared branch shape includesdeliveryId: z.string().min(1)plus attl_seconds: z.number().int().positive()field onclaimedso the TTL is greppable per emit.src/webhook/idempotency.tsto (a) emitidempotency.claimedatdebugon the line-61 success branch, (b) replace the line-47 warn withidempotency.failed_openatwarnwithreason: "unavailable", (c) replace the line-63event: "dedup-skip"withevent: "idempotency.duplicate_skipped"atinfo, and (d) replace the line-68 warn withidempotency.failed_openatwarnwithreason: "error". Keep the human-readable log message text alongside for log-grep ergonomics; theeventfield is the structured-query handle.src/webhook/idempotency-log-fields.test.tsasserting each emit site's pino-event shape passesIdempotencyLogFieldsSchema.parse(the colocated-test pattern used bysrc/utils/retry-log-fields.test.tsand gated bybun run check:test-globs, seeCLAUDE.md"Code Conventions").docs/operate/observability.md: addidempotency.{claimed, duplicate_skipped, failed_open}rows to the field table (lines 22-47) and a short "Idempotency log fields" subsection after "Retry log fields", noting the volume policy (debug/info/warn) and thereasondiscriminator semantics. Thecheck:docs-citationsscript enforces the cited line numbers stay accurate.Areas Evaluated
src/webhook/idempotency.ts— the four code paths ofclaimDeliveryand their current log shape; the file's contract docstring at lines 5-15.claimDelivery:src/webhook/events/issue-comment.ts:92,src/webhook/events/issues.ts:100,src/webhook/events/pull-request.ts:184,src/webhook/events/review-comment.ts:117(gated paths) and the doc-only mention insrc/webhook/events/review.ts:14(un-gated by design).src/orchestrator/valkey.ts:28-58—getValkeyClient/isValkeyHealthysemantics that drive the line-46 healthy-or-not branch inclaimDelivery.src/utils/retry-log-fields.tsandsrc/utils/octokit-observability.tsas schema templates;src/orchestrator/log-fields.tsas a third dispatcher-side template for anevent-discriminated family.src/logger.ts— the root logger config and thecreateChildLoggerAPI the handlers already build; no logger-side change is needed for this proposal because the events would flow through the existing child logger.research-labelled observability issues (gh issue list --label research --state all) to confirm no overlap; the twoidempotency-labelled prior issues (fix(idempotency): production webhook handlers bypass router dedupandfeat(idempotency): close duplicate-processing window) are feature work on the correctness of the layer, not on its observability surface.docs/operate/observability.md:22-47log-fields table to identify the canonical insertion point for the new field rows.Generated by the scheduled research action on 2026-06-14