An Alertmanager webhook receiver that turns alert lifecycle transitions into immutable OpenTelemetry log events, with Redis/Valkey for deduplication and state tracking.
Many teams struggle with alert fatigue:
- alerts are not always treated as urgent incidents
- response times are slow or inconsistent
- people compensate by increasing
repeat_interval, silencing alerts, or ignoring them
Over time this can reduce trust in alerting and create noisy monitoring setups.
Alertmanager is excellent at reliable notification delivery and is designed around a simple assumption: if nobody acted, keep reminding. That behavior is exactly right for urgent incidents, but it can be noisy for signals that still matter without requiring immediate action.
This receiver keeps Alertmanager as the notification engine while turning alert lifecycle transitions into immutable OpenTelemetry log events. Those events can be stored in log backends such as VictoriaLogs, Parseable, or Loki and queried later with rich label-based filters.
That makes it easier to keep truly urgent alerts for human response while still retaining searchable history for lower-priority signals, review, reporting, and work planning.
Alertmanager (send_resolved: true)
│
│ POST /webhook
▼
alert-event-receiver
├─ reads alert.Fingerprint as fingerprint identity key
├─ checks Redis / Valkey for state
├─ emits OTLP log record (firing | resolved)
└─ updates state in Redis / Valkey
│
│ OTLP/gRPC or OTLP/HTTP
▼
OTel Collector (optional) → VictoriaLogs / Loki / Grafana Cloud / …
All configuration is via environment variables.
| Variable | Default | Description |
|---|---|---|
ADDRESS |
:9011 |
HTTP listen address (for example :9011 or 127.0.0.1:9011) |
| Variable | Default | Description |
|---|---|---|
REDIS_ADDR |
localhost:6379 |
Redis/Valkey address |
REDIS_PASSWORD |
(empty) | Password |
REDIS_DB |
0 |
Database number |
CLOSED_STATE_TTL |
24h |
TTL for closed-state tombstones (prevents duplicate resolved events) |
IDEMPOTENCY_TTL |
168h (7d) |
TTL for idempotency keys |
The OTel SDK reads its standard environment variables automatically. You only need to set the relevant ones for your deployment:
| Variable | Default | Description |
|---|---|---|
OTEL_EXPORTER_OTLP_PROTOCOL |
grpc |
grpc or http/protobuf |
OTEL_EXPORTER_OTLP_ENDPOINT |
http://localhost:4317 |
Collector or backend endpoint |
OTEL_EXPORTER_OTLP_HEADERS |
(empty) | Auth headers (e.g. Authorization=Bearer ...) |
OTEL_SERVICE_NAME |
alert-event-receiver |
Service name on emitted records |
OTEL_RESOURCE_ATTRIBUTES |
(empty) | Extra resource attributes (e.g. cluster=prod-eu1,tenant=platform) |
The receiver stores only working state in Redis/Valkey. It does not store the full alert history there. Full lifecycle history is emitted as OTLP log records.
There are two key families:
One hash per alert fingerprint:
alertstate:{fingerprint}
Example:
alertstate:4a4f0d2b7c9e1a23
Fields currently stored:
status→firingorclosedfirst_firing_at→ first known firing timestamp for the current lifecyclelast_seen_at→ last time a firing notification was seenstarts_at→ AlertmanagerstartsAtvaluealertname→ copied from alert labelslabel.*→ every alert label is stored with alabel.prefix
Example hash content:
status=firing
first_firing_at=2026-04-15T09:22:58Z
last_seen_at=2026-04-15T09:23:14Z
starts_at=2026-04-15T09:22:58Z
alertname=HighErrorRate
label.severity=warning
label.service=checkout
label.instance=checkout-7d8f9
Behavior:
- When an alert is
firing, the hash is written/updated and made persistent. - When an alert is
resolved, the same hash is markedstatus=closedand a TTL is applied usingCLOSED_STATE_TTL. - That short-lived closed tombstone prevents duplicate late
resolveddeliveries from creating extra resolved events.
One short-lived string key per transition delivery:
alertidemp:{fingerprint}:{transition}:{unix_timestamp}
Examples:
alertidemp:4a4f0d2b7c9e1a23:firing:1776244978
alertidemp:4a4f0d2b7c9e1a23:resolved:1776246061
Behavior:
- The key is written with Redis
SET NX. - If it already exists, the receiver treats the delivery as a duplicate and drops it.
- The TTL is controlled by
IDEMPOTENCY_TTL. - The stored value is currently just
1; the key name carries the useful information.
Deduplication happens in two layers:
The receiver uses alert.Fingerprint from the Alertmanager webhook payload as the identity of one alert instance.
That means:
- repeated notifications for the same alert instance reuse the same Redis state key
- a new fingerprint is treated as a different alert instance
- the receiver does not recompute identity from labels itself
For each transition, the receiver creates an idempotency key:
alertidemp:{fingerprint}:{transition}:{unix_timestamp}
Timestamp source:
firing→startsAt(ornowif missing)resolved→endsAt(ornowif missing)
The key is written with SET NX:
- if the key is new, the event is processed
- if the key already exists, the event is dropped as a duplicate delivery
This protects against webhook retries and repeated deliveries for the same transition.
The receiver also uses the alert state hash to suppress repeated lifecycle events:
- if state is already
firingand anotherfiringarrives, no new event is emitted - if state is
closedand a newfiringarrives, that is treated as a new cycle and a new firing event is emitted - if a
resolvedarrives without matching open state, the receiver still emits a resolved orphan event and writes a short-lived closed tombstone
When an alert is open (status=firing), the receiver removes any TTL from alertstate:{fingerprint}.
Reason:
- open alerts are active working state and should not disappear while the alert is still firing
Implication:
- if Redis is cleared or keys are evicted externally, the receiver may later treat a resolved event as an orphan
Applies to:
alertstate:{fingerprint}when the alert has been markedclosed
Default:
24h
Reason:
- keep a short-lived tombstone so repeated late
resolveddeliveries do not create duplicate resolved events - keep enough recent closed state to distinguish a genuine reopen from a duplicate late resolve
If you increase it:
- better protection against very late duplicate
resolveddeliveries - more Redis memory used by recently closed alerts
closedstate entries stay around longer in operational queries and metrics
If you decrease it:
- less Redis memory used by closed alerts
- greater chance that a late duplicate
resolvedarrives after the tombstone expired and is emitted again as a resolved orphan
Applies to:
alertidemp:{fingerprint}:{transition}:{unix_timestamp}
Default:
7d
Reason:
- suppress duplicate deliveries and retries over a longer time window than the closed-state tombstone alone
If you increase it:
- better protection against delayed retries or replayed webhook deliveries
- more Redis memory used by idempotency keys
If you decrease it:
- less Redis memory used by idempotency keys
- greater chance that the same transition is accepted again after the idempotency key expires
As a rule of thumb:
- increase
CLOSED_STATE_TTLif lateresolveddeliveries are common - increase
IDEMPOTENCY_TTLif webhook retries/replays can happen over long periods - decrease them only if Redis memory pressure matters more than long-window duplicate suppression
If you tune these values, the trade-off is simple:
- longer TTLs = more duplicate protection, more Redis retention
- shorter TTLs = less Redis retention, more risk of duplicate lifecycle events
Connect to Redis/Valkey:
redis-cli -h localhost -p 6379If you use a password:
redis-cli -h localhost -p 6379 -a "$REDIS_PASSWORD"If you use a non-default DB:
redis-cli -h localhost -p 6379 -n 2redis-cli --scan --pattern 'alertstate:*'redis-cli HGETALL 'alertstate:4a4f0d2b7c9e1a23'redis-cli HMGET 'alertstate:4a4f0d2b7c9e1a23' status first_firing_at last_seen_at starts_at alertnameredis-cli TTL 'alertstate:4a4f0d2b7c9e1a23'Interpretation:
-1→ key exists and has no TTL (typically an open/firing alert)-2→ key does not exist- positive integer → seconds until the closed tombstone expires
redis-cli --scan --pattern 'alertidemp:*'redis-cli GET 'alertidemp:4a4f0d2b7c9e1a23:resolved:1776246061'
redis-cli TTL 'alertidemp:4a4f0d2b7c9e1a23:resolved:1776246061'redis-cli --scan --pattern 'alertstate:4a4f0d2b7c9e1a23'
redis-cli --scan --pattern 'alertidemp:4a4f0d2b7c9e1a23:*'redis-cli HGETALL 'alertstate:4a4f0d2b7c9e1a23' | grep '^label\.'The example below shows a minimal lifecycle for one alert fingerprint.
{
"groupKey": "{}:{alertname=\"HighErrorRate\"}",
"receiver": "event-webhook",
"externalURL": "https://alertmanager.example",
"alerts": [
{
"status": "firing",
"fingerprint": "4a4f0d2b7c9e1a23",
"startsAt": "2026-04-15T09:22:58Z",
"labels": {
"alertname": "HighErrorRate",
"severity": "warning",
"service": "checkout",
"instance": "checkout-7d8f9"
},
"annotations": {
"summary": "Checkout error rate is high"
}
}
]
}Expected Redis writes:
alertstate:4a4f0d2b7c9e1a23hash withstatus=firingandlabel.*fieldsalertidemp:4a4f0d2b7c9e1a23:firing:1776244978with value1andIDEMPOTENCY_TTL
Inspect with redis-cli:
redis-cli HGETALL 'alertstate:4a4f0d2b7c9e1a23'
redis-cli TTL 'alertstate:4a4f0d2b7c9e1a23'
redis-cli GET 'alertidemp:4a4f0d2b7c9e1a23:firing:1776244978'
redis-cli TTL 'alertidemp:4a4f0d2b7c9e1a23:firing:1776244978'At this point, state key TTL should usually be -1 (open alert, persisted).
{
"groupKey": "{}:{alertname=\"HighErrorRate\"}",
"receiver": "event-webhook",
"externalURL": "https://alertmanager.example",
"alerts": [
{
"status": "resolved",
"fingerprint": "4a4f0d2b7c9e1a23",
"startsAt": "2026-04-15T09:22:58Z",
"endsAt": "2026-04-15T09:41:01Z",
"labels": {
"alertname": "HighErrorRate",
"severity": "warning",
"service": "checkout",
"instance": "checkout-7d8f9"
}
}
]
}Expected Redis writes/updates:
alertstate:4a4f0d2b7c9e1a23updated tostatus=closedalertstate:*gets a TTL fromCLOSED_STATE_TTLalertidemp:4a4f0d2b7c9e1a23:resolved:1776246061with value1andIDEMPOTENCY_TTL
Inspect with redis-cli:
redis-cli HMGET 'alertstate:4a4f0d2b7c9e1a23' status first_firing_at last_seen_at starts_at alertname
redis-cli TTL 'alertstate:4a4f0d2b7c9e1a23'
redis-cli GET 'alertidemp:4a4f0d2b7c9e1a23:resolved:1776246061'
redis-cli TTL 'alertidemp:4a4f0d2b7c9e1a23:resolved:1776246061'If you send the exact same transition again, the existing alertidemp:* key causes it to be dropped as a duplicate.
- If an alert is currently open, expect
status=firingandTTL = -1. - If an alert was recently resolved, expect
status=closedand a positive TTL. - If a duplicate delivery was suppressed, look for a matching
alertidemp:*key. - If no
alertstate:*key exists for a resolved alert, that can still be valid: the receiver emits a resolved orphan event and writes a short-lived closed tombstone.
- Go 1.23+
- Redis or Valkey
- An OTLP-compatible log backend (or OTel Collector)
go build -o alert_event_receiver ./cmd/server# Start Redis
docker run -d -p 6379:6379 redis:7
# Start an OTel Collector or any OTLP-compatible backend, then:
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317
export OTEL_EXPORTER_OTLP_PROTOCOL=grpc
export OTEL_RESOURCE_ATTRIBUTES="cluster=local,tenant=dev"
go run ./cmd/server| Path | Method | Description |
|---|---|---|
/webhook |
POST | Alertmanager webhook receiver |
/metrics |
GET | Prometheus metrics (RED pattern) |
/healthz |
GET | Health check |
| Metric | Labels | Description |
|---|---|---|
alertreceiver_webhook_requests_total |
status (2xx/4xx/5xx) |
Webhook request rate |
alertreceiver_webhook_duration_seconds |
status |
Webhook latency histogram |
alertreceiver_events_emitted_total |
transition (firing/resolved) |
Events emitted via OTLP |
alertreceiver_emit_errors_total |
transition |
OTLP emit failures |
alertreceiver_redis_errors_total |
operation |
Redis operation errors |
alertreceiver_redis_ops_total |
operation, result |
Redis operation outcomes |
alertreceiver_redis_op_duration_seconds |
operation |
Redis operation latency histogram |
alertreceiver_redis_idemp_setnx_total |
result (set/exists/error) |
Idempotency SET NX outcomes |
alertreceiver_duplicates_dropped_total |
— | Events dropped by idempotency check |
alertreceiver_state_writes_total |
result |
Alert transition outcomes such as opened, reopened, closed, resolved_without_open_state, already_firing |
alertreceiver_state_entries |
status (firing/closed) |
Current Redis-backed alert state entries maintained from write-path updates |
alertreceiver_idempotency_keys_created_total |
transition |
Idempotency keys created successfully |
alertreceiver_resolved_orphans_total |
— | Resolved alerts seen without matching open state |
alertreceiver_closed_ttl_seconds |
— | TTL applied to closed alert tombstones |
Structured JSON written to stdout. All errors against Redis and the OTLP backend are logged with context (fingerprint, alertname, operation, error).
go test ./...cmd/server/ main — wires dependencies and starts HTTP server
internal/
config/ env var config loading
models/ Alertmanager webhook payload + LifecycleEvent types
processor/ alert state transition logic
state/ Redis/Valkey Store interface + implementation
telemetry/
logger.go JSON slog logger (stdout)
metrics.go Prometheus RED metrics
otel.go OTel SDK init (log provider)
emitter.go OTelEmitter — maps LifecycleEvent → OTel LogRecord
webhook/ HTTP handler for Alertmanager webhook
docs/
architecture.md Full design document
VictoriaMetrics Logs support direct OTLP ingestion over HTTP. Configuration example:
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:9428/insert/opentelemetry
export OTEL_EXPORTER_OTLP_PROTOCOL=http/protobuf
# Define the fields tha should be used as stream labels - default will be determine from the log resource attributes
export OTEL_EXPORTER_OTLP_HEADERS="VL-Stream-Fields=service.name,am.external_url,alert.alertname,alert.fingerprint,alert.status,alert.transition"
# Alternatively use the logs-specific var (takes precedence over OTEL_EXPORTER_OTLP_HEADERS):
# export OTEL_EXPORTER_OTLP_LOGS_HEADERS="VL-Stream-Fields=alert.alertname,alert.label.severity,alert.state_write_result,alert.status,alert.transition"VictoriaLogs queries use LogsQL.
_time:1h alert.label.service:checkout alert.transition:firing
alert.transition:resolved alert.duration_seconds:>900
Assuming one event per transition:
_time:24h event.kind:alert_transition
| stats by (alert.alertname, alert.label.service) count() as transitions
| sort by (transitions desc)
| limit 20
_time:7d alert.transition:resolved
| stats by (alert.alertname) avg(alert.duration_seconds) as avg_duration, count() as total
| sort by (avg_duration desc)
If there is a ticket_id label: alert.transition:resolved alert.label.ticket_id:""