feat: wire an opt-in Slack notifier and fold alert delivery into Run 2#41
Merged
Conversation
The observability overlay's Alertmanager routed only to a null receiver, so all 11 alert rules paged nobody. The earlier plan deferred wiring a real notifier to a post-Run-2 "~5-minute change" — but the alert delivery path is end-to-end-only-testable: a wrong webhook URL passes every config load and healthcheck, then fails silently in the real incident. Make Slack delivery an opt-in the burn-in itself can validate: - deploy/alertmanager.slack.yml: a Slack-delivering config; the webhook URL is read at notification time from a tmpfs file via api_url_file, never stored in the repo. - docker-compose.observability.yml: the alertmanager service gains an entrypoint that materializes INFLOOP_ALERT_SLACK_WEBHOOK_URL into that tmpfs file and selects the Slack config when set, the null-receiver alertmanager.yml when not — mirroring the prometheus scrape-token service. The local burn-in default is unchanged. - burn-in-preflight.test.ts: three static wiring checks, including a guard that the two configs' route blocks stay in sync. - docs: Run 2 Phase 3 now validates observed alert delivery; the production-readiness decision records the deferral as withdrawn. bun test 1255 pass / 1 skip / 0 fail; tsc clean; amtool check-config passes on both configs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
The observability overlay's Alertmanager routed only to a
nullreceiver, so all 11 alert rules paged nobody. The Run 2 plan deferred wiring a real notifier to a post-Run-2 "~5-minute change" — but the alert delivery path is end-to-end-only-testable: a wrong webhook URL passes every config-load and healthcheck, then fails silently in the real incident, which is worse than no alerting because operators trust it.This makes Slack delivery an opt-in the burn-in itself validates, folding it into Run 2 Phase 3 instead of deferring it.
Changes
deploy/alertmanager.slack.yml(new) — a Slack-delivering Alertmanager config. The webhook URL is read at notification time from a tmpfs file viaapi_url_file, never stored in the repo.deploy/docker-compose.observability.yml— thealertmanagerservice gains an entrypoint that materializesINFLOOP_ALERT_SLACK_WEBHOOK_URLinto that tmpfs file and selects the Slack config when set, the null-receiveralertmanager.ymlwhen not. Mirrors the siblingprometheusscrape-token service exactly. The local burn-in default is unchanged.deploy/alertmanager.yml— header comment rewritten; behaviour unchanged (still thenullreceiver default)..env.example— new optionalINFLOOP_ALERT_SLACK_WEBHOOK_URLentry.deploy/burn-in-preflight.test.ts— three static wiring checks, including a guard that the two configs'route:blocks stay in sync.burn-in-run2-plan.mdPhase 3 now validates observed alert delivery;production-readiness-decision.mdrecords the deferral as withdrawn;burn-in.mdPhase 3 + sign-off updated.Validation
bun test— 1255 pass / 1 skip / 0 fail (+3 new tests)bun run typecheck(tsc) — cleanamtool check-config— SUCCESS on bothalertmanager.ymlandalertmanager.slack.ymldocker compose configmerges cleanly; entrypoint branch logic verified both ways (unset → null receiver, set → Slack config)Follow-up after merge
The
burn-in-run2-candidatetag must be moved forward to the merge commit — it currently points at0425c9f, which lacksalertmanager.slack.yml, so a Run 2 operator doinggit checkout burn-in-run2-candidatewould not get this change. (Not done in this PR because the tag must point at a commit actually onmain.)🤖 Generated with Claude Code