Open-source alert management and incident response platform. Ingest alerts from any monitoring source, deduplicate them, auto-correlate into incidents, and manage the response — all from a single dashboard.
Think PagerDuty / OpsGenie, but open-source and self-hosted.
- JWT-based authentication — Secure login with username/password, 8-hour token expiry
- Role-based access control (RBAC) — Three roles: Admin (full access), User (read + acknowledge/resolve), Viewer (read-only)
- Default admin account — Auto-seeded on first startup with configurable credentials
- First-login password change — Admin account requires password change on first login
- API key backward compatibility — Webhook ingestion continues to use
X-API-Keyheader, existing integrations unaffected - User management — Admin panel to create, edit, and deactivate user accounts
- Flexible rotations — Hourly, daily, weekly, or custom rotation intervals
- Member management — Add team members to schedules with ordered rotation positions
- Timezone-aware handoffs — Configure handoff time and timezone per schedule
- Temporary overrides — Swap on-call duty for a time range with reason tracking
- "Who's On Call" view — Real-time display of the current on-call person per schedule
- Multi-level escalation — Define escalation levels with configurable timeouts (1-1440 minutes)
- Mixed targets — Each level can notify users directly or the current on-call from a schedule
- Repeat support — Policies can repeat through all levels N times before stopping
- Service-to-policy mapping — Map services to escalation policies using glob patterns (e.g.,
billing-*,*) - Priority ordering — When multiple mappings match, the lowest priority number wins
- Severity filtering — Optionally restrict mappings to specific severity levels
- 6 built-in webhook normalizers — Generic, Prometheus Alertmanager, Grafana, Splunk, Datadog, and Email ingest
- Pluggable architecture — Each provider has its own normalizer that maps vendor-specific payloads to Solace's internal format
- Auto-severity mapping — Provider-specific priority/severity levels are normalized to Solace's 5-level model (critical, high, warning, low, info)
- Fingerprint-based dedup — SHA256 hash of identity fields (source, name, service, host, labels) ensures identical alerts merge rather than duplicate
- Configurable dedup window — Default 5 minutes; identical alerts within the window increment
duplicate_count - Occurrence timeline — Every duplicate arrival is tracked with a timestamp for frequency analysis
- Automatic service-based grouping — Alerts from the same service within a configurable time window (default 10 min) are grouped into a single incident
- Severity auto-promotion — Incident severity always reflects the worst alert severity
- Auto-resolve — When all alerts in an incident resolve, the incident auto-resolves
- Full status workflow — Firing → Acknowledged → Resolved, plus Suppressed and Archived states
- Acknowledge & resolve — One-click actions from the dashboard or via API
- Bulk operations — Select multiple alerts and acknowledge or resolve them in one action
- Archive — Archive resolved alerts older than N days to keep the dashboard clean
- Incident timeline — Every action (created, alert added, severity changed, acknowledged, resolved) is recorded as a timestamped event
- Incident detail view — See all correlated alerts, event audit trail, and incident metadata in one place
- Cascade actions — Acknowledging/resolving an incident applies to all its alerts
- Slack — Block Kit formatted messages with severity color coding, alert counts, service info, and dashboard links
- Microsoft Teams — Adaptive Card messages via incoming webhook or Power Automate workflow URLs
- Email — HTML-formatted incident notifications via SMTP with correlated alert tables
- Generic Webhook (Outbound) — JSON payload with full incident and alert data, optional shared secret for HMAC verification, custom headers support
- PagerDuty — Events API v2 integration; triggers, resolves, and dedup keys sync incidents to PagerDuty services
- Per-channel filters — Filter notifications by severity and/or service
- Rate limiting — Per-channel, per-incident cooldown prevents notification spam
- Delivery logs — Every notification attempt is logged with status (pending/sent/failed) and error details
- Test button — Send a test notification through any channel from the UI
- Time-based suppression — Define start/end times for maintenance windows
- Flexible matchers — Match by service (list), severity (list), or label key-value pairs
- AND logic — All matchers must match for an alert to be suppressed
- CRUD management — Create, edit, and view active/expired windows from the UI
- Tags — Free-form string tags with add/remove from UI or API; stored as JSONB with GIN index for fast queries
- Investigation notes — Timestamped notes with author attribution and full CRUD
- External ticket linking — Link alerts to Jira, GitHub, or any URL; auto-prepends
https://if missing - Runbook URL — Editable from the alert detail panel; manually paste a URL or auto-attach via runbook rules
- Runbook rules — Pattern-based rules that auto-attach runbook URLs to incoming alerts. Define a service glob pattern (e.g.,
payment-*), an optional name pattern, and a URL template with variables ({service},{host},{name},{environment}). First matching rule wins (priority-ordered). "Save as Rule" checkbox on the alert panel creates a rule from the current alert in one click. - Raw payload — Full original webhook payload preserved for forensic inspection
- Configurable TTL — Firing alerts auto-resolve after a configurable time-to-live (default 24 hours, 0 to disable)
- Admin-only control — Only admins can adjust the TTL at runtime via Settings; env var
ALERT_TTL_SECONDSfor persistence across restarts - Smart exclusions — Acknowledged alerts are excluded from auto-expire (someone is working on it)
- Freshness tracking — Duplicate arrivals reset the expiry timer via
last_received_at, so actively recurring issues stay open - Auto-expired tag — Expired alerts are tagged
auto-expiredto distinguish from manual resolution - Cascade — Auto-expired alerts trigger incident resolution and WebSocket events like any other resolve
- Alert volume trends — Hourly area chart showing alert ingest rate over time
- MTTA/MTTR trends — Daily line chart tracking mean time to acknowledge and resolve
- Top noisy services — Bar chart ranking services by alert volume
- Severity breakdown — Distribution of alerts across severity levels
- Time range selector — Toggle between 7-day, 14-day, and 30-day views
- Integrated into Statistics — Expands the existing Statistics view, no separate navigation
- Dead-man switch — Register expected check-ins; if a service doesn't ping within the interval + grace period, Solace fires an alert
- HTTP health checks — Periodically GET a URL; if the response is non-2xx or times out, Solace fires an alert
- Automatic recovery — When a failed heartbeat recovers, Solace sends a resolved alert to close the incident
- Full pipeline integration — Heartbeat alerts go through the standard ingestion pipeline (dedup, correlation, notifications, escalation)
- CRUD management — Create, edit, delete, and monitor heartbeats from the Heartbeats tab
- Slug-based ping endpoint — Dead-man pings use
POST /api/v1/heartbeats/{slug}/pingwith API key auth
- Light and dark themes — Toggle between a high-contrast dark ops-console theme and a clean light theme; preference persisted in localStorage
- Real-time updates — WebSocket connection with automatic reconnect and fallback polling
- Keyboard shortcuts —
j/knavigation,aacknowledge,rresolve,Escclose,?help - Search & filter — Full-text search across name, service, host, tags with status/severity/service filters
- Sortable columns — Sort by time, severity, name, service, duplicate count, or status
- Pagination — Configurable page size with server-side pagination
- Stats bar — Live counts of alerts by status/severity, incident counts, MTTA, and MTTR
- Full REST API — Every feature is accessible via API (alerts, incidents, silences, notifications, on-call, stats, settings)
- OpenAPI docs — Auto-generated Swagger UI at
/docs - Health checks — Liveness (
/health) and readiness (/health/ready) endpoints for Kubernetes probes - WebSocket events — Real-time event stream for
alert.created,incident.updated,incident_created,severity_changed,incident_resolved - Dual auth — JWT Bearer tokens for user sessions,
X-API-Keyheader for webhook ingestion and external integrations
Prometheus ──┐
Grafana ─────┤ ┌─────────────┐ ┌────────────┐
Datadog ─────┼─▶ Webhook API ──▶ │ Normalizer │ ──▶ │ Dedup │
Splunk ──────┤ (X-API-Key) │ (pluggable) │ │ Engine │
Email ───────┤ └─────────────┘ └─────┬──────┘
Custom ──────┘ │
┌─────▼──────┐
│ Silence │
│ Check │
└─────┬──────┘
│
┌─────▼──────┐ ┌──────────────┐
│ Correlation │──▶ │ Notifications │
│ Engine │ └──────┬───────┘
└─────┬──────┘ │
│ ┌──────▼───────┐
│ │ Escalation │
│ │ Engine │
│ └──────┬───────┘
│ │
┌────────────▼───────────────────▼┐
│ PostgreSQL + Redis │
└────────────┬────────────────────┘
│
┌────────────▼────────────┐
│ React Dashboard (WS) │
│ JWT Auth + RBAC │
│ (Vite + Tailwind) │
└─────────────────────────┘
git clone https://github.com/springdom/solace.git
cd solace
docker compose up --build- Dashboard: http://localhost:3000
- API: http://localhost:8000
- API Docs: http://localhost:8000/docs
Default login: admin / admin (you'll be prompted to change the password on first login)
# Generic webhook
curl -X POST http://localhost:8000/api/v1/webhooks/generic \
-H "Content-Type: application/json" \
-d '{
"name": "HighCPU",
"severity": "critical",
"service": "payment-api",
"host": "web-01",
"description": "CPU usage above 95% for 10 minutes",
"tags": ["production", "us-east-1"]
}'
# Prometheus Alertmanager
curl -X POST http://localhost:8000/api/v1/webhooks/prometheus \
-H "Content-Type: application/json" \
-d '{
"version": "4",
"status": "firing",
"alerts": [{
"status": "firing",
"labels": {
"alertname": "DiskFull",
"instance": "db-01:9090",
"job": "postgres",
"severity": "critical"
},
"annotations": {
"summary": "Disk 95% full on db-01"
},
"startsAt": "2024-01-15T10:00:00.000Z",
"endsAt": "0001-01-01T00:00:00Z"
}]
}'
# Grafana unified alerting
curl -X POST http://localhost:8000/api/v1/webhooks/grafana \
-H "Content-Type: application/json" \
-d '{
"alerts": [{
"status": "firing",
"labels": { "alertname": "HighMemory", "grafana_folder": "Infrastructure" },
"annotations": { "summary": "Memory above 90%", "severity": "high" },
"startsAt": "2024-01-15T10:00:00.000Z",
"endsAt": "0001-01-01T00:00:00Z",
"values": { "B": 92.5 }
}]
}'
# Datadog monitor webhook
curl -X POST http://localhost:8000/api/v1/webhooks/datadog \
-H "Content-Type: application/json" \
-d '{
"id": "123456789",
"title": "CPU is high on web-01",
"text": "CPU utilization above threshold",
"alert_status": "triggered",
"priority": "P1",
"hostname": "web-01",
"org": { "name": "MyOrg" },
"tags": "env:production,service:payment-api"
}'
# Splunk webhook alert
curl -X POST http://localhost:8000/api/v1/webhooks/splunk \
-H "Content-Type: application/json" \
-d '{
"result": {
"host": "web-01",
"severity": "critical",
"service": "payment-api",
"message": "CPU usage above 95% for 10 minutes"
},
"sid": "scheduler_admin_HighCPU_at_17000000_132",
"search_name": "High CPU Usage Alert"
}'Alerts from the same service auto-group into a single incident:
# These two alerts will be correlated into ONE incident
curl -X POST http://localhost:8000/api/v1/webhooks/generic \
-H "Content-Type: application/json" \
-d '{"name":"HighCPU","severity":"critical","service":"payment-api","host":"web-01"}'
curl -X POST http://localhost:8000/api/v1/webhooks/generic \
-H "Content-Type: application/json" \
-d '{"name":"HighMemory","severity":"high","service":"payment-api","host":"web-02"}'
# This creates a SEPARATE incident (different service)
curl -X POST http://localhost:8000/api/v1/webhooks/generic \
-H "Content-Type: application/json" \
-d '{"name":"HighErrorRate","severity":"warning","service":"auth-service"}'# Slack
curl -X POST http://localhost:8000/api/v1/notifications/channels \
-H "Content-Type: application/json" \
-d '{
"name": "Ops Slack",
"channel_type": "slack",
"config": { "webhook_url": "https://hooks.slack.com/services/YOUR/HOOK/URL" },
"filters": { "severity": ["critical", "high"] }
}'
# Microsoft Teams
curl -X POST http://localhost:8000/api/v1/notifications/channels \
-H "Content-Type: application/json" \
-d '{
"name": "DevOps Teams",
"channel_type": "teams",
"config": { "webhook_url": "https://your-org.webhook.office.com/..." },
"filters": { "severity": ["critical"] }
}'
# PagerDuty
curl -X POST http://localhost:8000/api/v1/notifications/channels \
-H "Content-Type: application/json" \
-d '{
"name": "PagerDuty On-Call",
"channel_type": "pagerduty",
"config": { "routing_key": "YOUR_PAGERDUTY_INTEGRATION_KEY" },
"filters": { "severity": ["critical"] }
}'
# Generic outbound webhook
curl -X POST http://localhost:8000/api/v1/notifications/channels \
-H "Content-Type: application/json" \
-d '{
"name": "Automation Webhook",
"channel_type": "webhook",
"config": {
"webhook_url": "https://your-service.com/hooks/solace",
"secret": "optional-shared-secret",
"headers": { "X-Custom-Header": "value" }
}
}'| Method | Endpoint | Description |
|---|---|---|
POST |
/api/v1/auth/login |
Login with username/password, returns JWT |
GET |
/api/v1/auth/me |
Get current user profile |
POST |
/api/v1/auth/change-password |
Change password |
| Method | Endpoint | Description |
|---|---|---|
GET |
/api/v1/users |
List users |
POST |
/api/v1/users |
Create user |
PUT |
/api/v1/users/{id} |
Update user profile/role |
POST |
/api/v1/users/{id}/reset-password |
Reset user password |
DELETE |
/api/v1/users/{id} |
Deactivate user |
| Method | Endpoint | Description |
|---|---|---|
GET |
/health |
Liveness check |
GET |
/health/ready |
Readiness check (DB + Redis) |
| Method | Endpoint | Description |
|---|---|---|
POST |
/api/v1/webhooks/generic |
Generic webhook |
POST |
/api/v1/webhooks/prometheus |
Prometheus Alertmanager |
POST |
/api/v1/webhooks/grafana |
Grafana unified alerting |
POST |
/api/v1/webhooks/datadog |
Datadog monitor webhook |
POST |
/api/v1/webhooks/splunk |
Splunk saved search webhook |
POST |
/api/v1/webhooks/email_ingest |
Email-based alert ingestion |
| Method | Endpoint | Description |
|---|---|---|
GET |
/api/v1/alerts |
List alerts (filterable, sortable, paginated) |
GET |
/api/v1/alerts/{id} |
Get alert by ID |
POST |
/api/v1/alerts/{id}/acknowledge |
Acknowledge alert |
POST |
/api/v1/alerts/{id}/resolve |
Resolve alert |
PUT |
/api/v1/alerts/{id}/tags |
Replace all tags |
POST |
/api/v1/alerts/{id}/tags/{tag} |
Add a single tag |
DELETE |
/api/v1/alerts/{id}/tags/{tag} |
Remove a tag |
GET |
/api/v1/alerts/{id}/notes |
List investigation notes |
POST |
/api/v1/alerts/{id}/notes |
Add a note |
PUT |
/api/v1/alerts/notes/{id} |
Edit a note |
DELETE |
/api/v1/alerts/notes/{id} |
Delete a note |
GET |
/api/v1/alerts/{id}/history |
Get occurrence timeline |
PUT |
/api/v1/alerts/{id}/ticket |
Set external ticket URL |
PUT |
/api/v1/alerts/{id}/runbook |
Set runbook URL (optionally create rule) |
POST |
/api/v1/alerts/bulk/acknowledge |
Bulk acknowledge |
POST |
/api/v1/alerts/bulk/resolve |
Bulk resolve |
POST |
/api/v1/alerts/archive |
Archive old resolved alerts |
| Method | Endpoint | Description |
|---|---|---|
GET |
/api/v1/incidents |
List incidents (filterable, sortable, paginated) |
GET |
/api/v1/incidents/{id} |
Get incident with alerts + event timeline |
POST |
/api/v1/incidents/{id}/acknowledge |
Acknowledge incident + all alerts |
POST |
/api/v1/incidents/{id}/resolve |
Resolve incident + all alerts |
| Method | Endpoint | Description |
|---|---|---|
GET |
/api/v1/silences |
List silence windows (filterable by state) |
POST |
/api/v1/silences |
Create silence window |
GET |
/api/v1/silences/{id} |
Get silence window |
PUT |
/api/v1/silences/{id} |
Update silence window |
DELETE |
/api/v1/silences/{id} |
Delete silence window |
| Method | Endpoint | Description |
|---|---|---|
GET |
/api/v1/notifications/channels |
List all channels |
POST |
/api/v1/notifications/channels |
Create channel (slack/teams/email/webhook/pagerduty) |
GET |
/api/v1/notifications/channels/{id} |
Get channel |
PUT |
/api/v1/notifications/channels/{id} |
Update channel |
DELETE |
/api/v1/notifications/channels/{id} |
Delete channel |
POST |
/api/v1/notifications/channels/{id}/test |
Send test notification |
GET |
/api/v1/notifications/logs |
List delivery logs |
| Method | Endpoint | Description |
|---|---|---|
GET |
/api/v1/oncall/schedules |
List schedules (paginated, active_only filter) |
POST |
/api/v1/oncall/schedules |
Create schedule (admin) |
GET |
/api/v1/oncall/schedules/{id} |
Get schedule |
PUT |
/api/v1/oncall/schedules/{id} |
Update schedule (admin) |
DELETE |
/api/v1/oncall/schedules/{id} |
Delete schedule (admin) |
GET |
/api/v1/oncall/schedules/{id}/current |
Get who is currently on call |
POST |
/api/v1/oncall/schedules/{id}/overrides |
Create temporary override (admin) |
DELETE |
/api/v1/oncall/overrides/{id} |
Delete override (admin) |
| Method | Endpoint | Description |
|---|---|---|
GET |
/api/v1/oncall/policies |
List escalation policies |
POST |
/api/v1/oncall/policies |
Create policy (admin) |
GET |
/api/v1/oncall/policies/{id} |
Get policy |
PUT |
/api/v1/oncall/policies/{id} |
Update policy (admin) |
DELETE |
/api/v1/oncall/policies/{id} |
Delete policy (admin) |
| Method | Endpoint | Description |
|---|---|---|
GET |
/api/v1/oncall/mappings |
List service-to-policy mappings |
POST |
/api/v1/oncall/mappings |
Create mapping (admin) |
DELETE |
/api/v1/oncall/mappings/{id} |
Delete mapping (admin) |
| Method | Endpoint | Description |
|---|---|---|
GET |
/api/v1/runbooks/rules |
List runbook rules |
POST |
/api/v1/runbooks/rules |
Create rule (admin) |
PUT |
/api/v1/runbooks/rules/{id} |
Update rule (admin) |
DELETE |
/api/v1/runbooks/rules/{id} |
Delete rule (admin) |
| Method | Endpoint | Description |
|---|---|---|
GET |
/api/v1/stats |
Dashboard statistics (counts, MTTA, MTTR) |
GET |
/api/v1/stats/trends |
Time-series analytics (alert volume, MTTA/MTTR daily, top services) |
GET |
/api/v1/settings |
Application configuration (includes alert TTL) |
PUT |
/api/v1/settings/alert-ttl |
Update alert auto-expire TTL (admin only) |
| Method | Endpoint | Description |
|---|---|---|
GET |
/api/v1/heartbeats |
List all heartbeats |
POST |
/api/v1/heartbeats |
Create heartbeat (admin only) |
PUT |
/api/v1/heartbeats/{id} |
Update heartbeat (admin only) |
DELETE |
/api/v1/heartbeats/{id} |
Delete heartbeat (admin only) |
POST |
/api/v1/heartbeats/{slug}/ping |
Record dead-man check-in (API key auth) |
| Endpoint | Description |
|---|---|
GET /api/v1/ws?token={jwt_or_api_key} |
Real-time event stream |
All settings are configurable via environment variables:
| Variable | Default | Description |
|---|---|---|
DATABASE_URL |
postgresql+asyncpg://solace:solace@localhost:5432/solace |
PostgreSQL connection |
REDIS_URL |
redis://localhost:6379/0 |
Redis connection |
API_KEY |
"" |
API key for webhook ingestion (empty = no auth in dev) |
SECRET_KEY |
change-me-to-a-random-secret-key |
Secret for JWT signing |
ADMIN_USERNAME |
admin |
Default admin username (created on first startup) |
ADMIN_PASSWORD |
admin |
Default admin password |
ADMIN_EMAIL |
admin@solace.local |
Default admin email |
JWT_EXPIRE_MINUTES |
480 |
JWT token expiry (8 hours) |
DEDUP_WINDOW_SECONDS |
300 |
Window for deduplicating identical alerts (5 min) |
CORRELATION_WINDOW_SECONDS |
600 |
Window for correlating alerts into incidents (10 min) |
NOTIFICATION_COOLDOWN_SECONDS |
300 |
Per-channel, per-incident notification cooldown (5 min) |
SOLACE_DASHBOARD_URL |
http://localhost:3000 |
Dashboard URL (used in notification links) |
APP_ENV |
development |
Environment (development / production) |
LOG_LEVEL |
INFO |
Logging level |
ALERT_TTL_SECONDS |
86400 |
Auto-expire firing alerts after N seconds (0 = disabled, default 24h) |
ALERT_EXPIRE_CHECK_INTERVAL_SECONDS |
60 |
How often to check for expired alerts |
HEARTBEAT_CHECK_INTERVAL_SECONDS |
30 |
How often to run heartbeat monitoring checks |
SMTP_HOST |
"" |
SMTP server for email notifications |
SMTP_PORT |
587 |
SMTP port |
SMTP_USER |
"" |
SMTP username |
SMTP_PASSWORD |
"" |
SMTP password |
SMTP_USE_TLS |
true |
Enable STARTTLS |
SMTP_FROM_ADDRESS |
solace@localhost |
Sender address for email notifications |
Backend: Python 3.12+, FastAPI, async SQLAlchemy (asyncpg), Alembic, PostgreSQL, Redis, python-jose (JWT), passlib (bcrypt)
Frontend: React 18, TypeScript, Vite, Tailwind CSS, Zustand
Deployment: Docker Compose, Kubernetes-ready health probes
pip install -e ".[dev]"
pytest tests/ -vruff check backend/# Start PostgreSQL and Redis
# Create database: CREATE DATABASE solace;
# Run migrations
alembic upgrade head
# Start API server
uvicorn backend.main:app --reload --port 8000
# Start frontend (separate terminal)
cd frontend && npm install && npm run dev-
Multi-source webhook ingestion (Generic, Prometheus, Grafana, Datadog, Splunk, Email)
-
Fingerprint-based deduplication with configurable window
-
Service-based automatic incident correlation
-
Full alert lifecycle (firing, acknowledged, resolved, suppressed, archived)
-
Incident management with event audit trail
-
Notification channels: Slack, Microsoft Teams, Email, Webhook (outbound), PagerDuty
-
Notification filters, rate limiting, delivery logs, and test button
-
Silence / maintenance windows with flexible matchers
-
Alert tagging and investigation notes
-
External ticket URL linking (Jira, GitHub, etc.)
-
Runbook URL support with editable UI and auto-attach rules
-
Bulk acknowledge/resolve operations
-
Archive old resolved alerts
-
Dashboard stats (MTTA, MTTR, counts by status/severity)
-
Real-time WebSocket updates with fallback polling
-
Keyboard shortcuts for fast navigation
-
Light and dark theme toggle
-
JWT authentication with default admin account
-
Role-based access control (admin, user, viewer)
-
User management (create, edit, deactivate)
-
On-call scheduling (hourly/daily/weekly/custom rotations)
-
Temporary on-call overrides
-
Escalation policies with multi-level targets
-
Service-to-policy mapping with glob patterns and priority ordering
-
Alert auto-expire with configurable TTL (admin-controlled)
-
Analytics dashboard with time-series trends (alert volume, MTTA/MTTR, top services)
-
Heartbeat / dead-man monitoring with HTTP health checks
- Background escalation checker (auto-escalate if not ack'd in N minutes)
- SSO integration (Google, GitHub, SAML)
- SMS and voice call notifications (Twilio)
- Status pages (public incident status)
- Plugin system (custom normalizers, notification channels, enrichment hooks)
- Alert pattern detection and noise scoring
- Post-incident review and retrospectives
- Topology-aware correlation (service dependency graph)
Apache 2.0