[Feature] Generic outbound webhook delivery infrastructure

## Feature Description
 
Build a reusable outbound webhook delivery system that handles HMAC payload signing, retry with exponential backoff, dead letter queue, SSRF protection, and delivery logging. Existing alert notification code migrates onto it. The same primitive becomes available for future features that need to deliver events to external systems.
 
## Problem/Use Case
 
Outbound HTTP delivery exists today inside the alert notification path, but it's specific to that one feature. Several upcoming features will need the same plumbing: digest reports (#155), workflow integrations, custom event subscriptions. Building each one independently would duplicate retry logic, signing, error handling, and SSRF defense — and the result would be inconsistent quality (the first one done well, the rest "good enough").
 
There's also a security dimension: outbound HTTP from a server-side application is an SSRF vector if not handled carefully. Centralizing it in one well-tested module is much safer than scattering it across features.
 
## Proposed Solution
 
A `webhookDispatcher` module:
 
```ts
webhookDispatcher.enqueue({
  url: string,
  payload: unknown,
  organizationId: string,
  eventType: string,
  signingSecret?: string,
  headers?: Record<string, string>,
  metadata?: Record<string, unknown>,
})
```
 
What it does:
 
1. **SSRF protection** — full DNS resolution of the target host, reject if it resolves to a private/loopback/link-local address. Re-resolve at request time to defend against DNS rebinding.
2. **HMAC signing** — when a signing secret is provided, sign the body with `HMAC-SHA256` and add `X-Logtide-Signature` and `X-Logtide-Timestamp` headers. Standard, documented, easy for receivers to verify.
3. **Retry with backoff** — exponential backoff (e.g. 1s, 5s, 25s, 2m, 10m), max attempts configurable, only retry on transient failures (5xx, network errors, timeouts).
4. **Dead letter queue** — after final failure, the delivery lands in a DLQ table with full request/response info for inspection and manual replay.
5. **Delivery log** — every attempt (success or failure) is recorded with timestamp, status code, duration, response excerpt. Visible in a dashboard view.
6. **Per-organization concurrency limit** — prevent one tenant's slow webhook receiver from saturating the worker pool.
Backed by BullMQ with deterministic job IDs to make idempotency easy on the consumer side. Each event has a stable id; receivers can deduplicate.
 
## Alternatives Considered
 
- **Continue scattering webhook delivery per feature.** Worse SSRF posture, inconsistent retry behavior, no central observability. Rejected.
- **Use an external service (e.g. Hookdeck, Svix).** Adds a third-party dependency that breaks self-hosted/air-gapped deployments and contradicts the privacy-first philosophy of the project.
- **Build minimal version now, add observability/DLQ later.** Tempting, but the migration cost of moving alert notifications onto the new dispatcher is paid once. Better to land it complete.
## Implementation Details (Optional)
 
- SSRF guard: resolve DNS, check that no resolved A/AAAA record is in private space (RFC 1918, ULA, link-local, loopback, multicast, reserved). For the actual HTTP request, pin to the resolved IP to prevent TOCTOU rebinding. Allow an opt-in `allowPrivateNetworks` flag for trusted on-prem deployments.
- HMAC signing: `X-Logtide-Signature: t=<unix>,v1=<hex>`, where the signed string is `<unix>.<body>`. Document the verification snippet for receivers.
- Job IDs: `webhook:${organizationId}:${eventType}:${eventId}` — deterministic, deduplicates retries triggered by upstream errors.
- DLQ: a separate table `webhook_deliveries_failed` with the full job, last response, last error. A dashboard view lists DLQ entries per org with a "retry" button (which re-enqueues with a fresh job).
- Delivery log: capacity-bounded — keep last N attempts per webhook (configurable, default 1000). Deeper history would need a separate storage decision.
- Existing alert notification code becomes the first consumer. No behavioral regression — same delivery semantics, just centralized.
- Coordinates with the lifecycle hooks issue: `beforeWebhookDispatch` is the right place for downstream platforms to inspect or reject deliveries.
## Priority
 
- [ ] Critical - Blocking my usage of Logtide
- [x] High - Would significantly improve my workflow
- [ ] Medium - Nice to have
- [ ] Low - Minor enhancement
## Target Users
 
- Operators integrating Logtide alerts with external systems (PagerDuty, Slack, custom internal tools)
- Teams building automation around Logtide events (CI triggers, ticket creation, custom workflows)
- Future features requiring reliable event delivery (digest reports, custom event subscriptions)
## Contribution
 
- [ ] I would like to work on implementing this feature
 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Generic outbound webhook delivery infrastructure #218

Feature Description

Problem/Use Case

Proposed Solution

Alternatives Considered

Implementation Details (Optional)

Priority

Target Users

Contribution

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature] Generic outbound webhook delivery infrastructure #218

Description

Feature Description

Problem/Use Case

Proposed Solution

Alternatives Considered

Implementation Details (Optional)

Priority

Target Users

Contribution

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions