Skip to content

CNV-80608: PR12 add alerts_effective_active_at_timestamp_seconds metric#16

Open
sradco wants to merge 1 commit into
alert-mgmt-restructured-11-orphan-gcfrom
alerts-effective-metric
Open

CNV-80608: PR12 add alerts_effective_active_at_timestamp_seconds metric#16
sradco wants to merge 1 commit into
alert-mgmt-restructured-11-orphan-gcfrom
alerts-effective-metric

Conversation

@sradco
Copy link
Copy Markdown
Owner

@sradco sradco commented Mar 8, 2026

Expose a Prometheus gauge metric whose value is the activeAt Unix
timestamp for every effective alert (firing, pending, silenced).

Labels include all alerts labels after relabeling plus enrichment labels
and alertstate.

Annotations are excluded since they are available from the alert rule definition.

Signed-off-by: Shirly Radco sradco@redhat.com
Co-authored-by: AI Assistant noreply@cursor.com

Comment thread pkg/server.go Outdated
Comment thread pkg/metrics/alerts_collector.go Outdated
@sradco sradco changed the title add alerts_effective_active_at_timestamp_seconds metric CNV-80608: PR16 - add alerts_effective_active_at_timestamp_seconds metric Mar 9, 2026
@sradco sradco force-pushed the alert-mgmt-15-orphan-arc-gc branch 2 times, most recently from 760bab8 to f62057b Compare March 10, 2026 10:37
@sradco sradco force-pushed the alerts-effective-metric branch from 5d89003 to efd5a1a Compare March 10, 2026 10:40
@sradco sradco force-pushed the alert-mgmt-15-orphan-arc-gc branch from f62057b to bad1af3 Compare March 10, 2026 10:45
@sradco sradco force-pushed the alerts-effective-metric branch 2 times, most recently from e8fff27 to 56481c7 Compare March 10, 2026 11:16
@sradco
Copy link
Copy Markdown
Owner Author

sradco commented Mar 10, 2026

Please note, I found a bug with the metric value. It was based on the StartAt label of Alertmanager, which I thought holds the alert start time, but it actually holds the time the alert was collected to Alertmanager, which is nt what we need.
I will update it to use the ActiveAt label from Prometheus instead.

@sradco sradco force-pushed the alert-mgmt-15-orphan-arc-gc branch from bad1af3 to 6ff63c9 Compare March 10, 2026 13:46
@sradco sradco force-pushed the alerts-effective-metric branch from 56481c7 to ba86543 Compare March 10, 2026 13:46
@sradco sradco force-pushed the alert-mgmt-15-orphan-arc-gc branch from 6ff63c9 to 6b2d009 Compare March 10, 2026 14:04
@sradco sradco force-pushed the alerts-effective-metric branch from ba86543 to 6cfc7e4 Compare March 10, 2026 14:04
@sradco sradco force-pushed the alert-mgmt-15-orphan-arc-gc branch from 6b2d009 to 1c95614 Compare March 11, 2026 09:06
@sradco sradco force-pushed the alerts-effective-metric branch from 6cfc7e4 to 5c81f95 Compare March 11, 2026 09:06
@sradco sradco force-pushed the alert-mgmt-15-orphan-arc-gc branch from 1c95614 to 6ace03c Compare March 11, 2026 09:27
@sradco sradco force-pushed the alerts-effective-metric branch 2 times, most recently from 6ec1d22 to 8ad2c3a Compare March 11, 2026 10:13
Comment thread pkg/k8s/prometheus_alerts.go Outdated
Comment on lines +357 to +358
enrichActiveAt(alerts, promAlerts)
return append(alerts, filterAlertsByState(promAlerts, "pending")...), nil
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not filter first then enrich?

Comment on lines 299 to 301
if promErr != nil {
return nil, promErr
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be done immediately after the failed call

promAlerts, promErr := pa.getAlertsViaProxy(ctx, namespace, promRouteName, source)

if amErr == nil {
if promErr == nil {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we handled the error, this check wouldnt be needed

if promErr == nil {
enrichActiveAt(amAlerts, promAlerts)
}
pending := filterAlertsByState(promAlerts, "pending")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are only filtering if no amErr

why would we return all promAlerts if no amErr
but return only the pending if amErr


// alertFingerprint builds a stable identity key from an alert's labels,
// excluding metadata labels injected by this plugin (source, backend).
func alertFingerprint(labels map[string]string) string {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whats the difference between using this implementation and the one for the id label?

Comment thread pkg/server.go
managementRouter := managementrouter.New(managementClient)
router.PathPrefix("/api/v1/alerting").Handler(managementRouter)

metricsHandler, err := managementClient.MetricsHandler(ctx, k8sconfig)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

like it

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe pkg/monitoring/management/metrics/alerts_collector.go

or even
pkg/management/metrics/alerts_collector.go

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in pkg/management/management.go

"github.com/openshift/monitoring-plugin/pkg/management/metrics" -> metrics.NewHandler(ctx, c, kubeConfig)

instead of

"github.com/openshift/monitoring-plugin/pkg/metrics" -> metrics.NewHandler(ctx, c, kubeConfig)

// backend, component, layer) plus "alertstate". Thanos-sourced alerts are
// filtered out to avoid duplicates. Annotations are excluded because they
// are available from the alert rule definition.
type AlertsCollector struct {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should only be instantiated if isLeader
and not receive a function to check

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should not belong in metrics package
metrics, specifically the new collector, depends on being the leader
not the other way around

leader election is a server responsability
server should manage the leader election
and then if takes the lead, instantiate the collector

@sradco sradco force-pushed the alert-mgmt-15-orphan-arc-gc branch from 6ace03c to 53c6cf9 Compare March 11, 2026 11:50
@sradco sradco force-pushed the alerts-effective-metric branch 6 times, most recently from ce1decd to dc78a65 Compare March 12, 2026 18:48
@sradco sradco changed the base branch from alert-mgmt-15-orphan-arc-gc to alert-mgmt-restructured-11-orphan-gc March 12, 2026 18:48
@sradco sradco changed the title CNV-80608: PR16 - add alerts_effective_active_at_timestamp_seconds metric CNV-80608: PR12 - add alerts_effective_active_at_timestamp_seconds metric Mar 12, 2026
@sradco sradco force-pushed the alert-mgmt-restructured-11-orphan-gc branch from f99316a to 2aeb592 Compare March 25, 2026 14:47
@sradco sradco force-pushed the alerts-effective-metric branch from dc78a65 to 0147856 Compare March 25, 2026 14:47
@sradco sradco force-pushed the alert-mgmt-restructured-11-orphan-gc branch from 0e3ceaa to 5f87dde Compare April 14, 2026 14:31
@sradco sradco force-pushed the alerts-effective-metric branch from 497e2db to 0398569 Compare April 14, 2026 14:31
@sradco sradco force-pushed the alert-mgmt-restructured-11-orphan-gc branch from 5f87dde to f5bda9f Compare April 23, 2026 08:17
@sradco sradco force-pushed the alerts-effective-metric branch from 0398569 to cb7aa1d Compare April 23, 2026 08:17
@sradco sradco force-pushed the alert-mgmt-restructured-11-orphan-gc branch from f5bda9f to 7276833 Compare April 28, 2026 11:44
@sradco sradco force-pushed the alerts-effective-metric branch from cb7aa1d to cfc4f9f Compare April 28, 2026 11:44
@sradco sradco force-pushed the alert-mgmt-restructured-11-orphan-gc branch from 7276833 to 528338b Compare April 28, 2026 11:58
@sradco sradco force-pushed the alerts-effective-metric branch from cfc4f9f to 3edd517 Compare April 28, 2026 11:58
@sradco sradco force-pushed the alert-mgmt-restructured-11-orphan-gc branch 4 times, most recently from b6094a8 to a262f88 Compare April 29, 2026 14:25
@sradco sradco force-pushed the alerts-effective-metric branch 2 times, most recently from db8d155 to 5c7c6d4 Compare April 29, 2026 14:52
@sradco sradco force-pushed the alert-mgmt-restructured-11-orphan-gc branch from a262f88 to f2f4bbe Compare April 29, 2026 14:52
@sradco sradco force-pushed the alerts-effective-metric branch from 5c7c6d4 to d24450d Compare April 30, 2026 08:44
@sradco sradco force-pushed the alert-mgmt-restructured-11-orphan-gc branch from f2f4bbe to 6014450 Compare April 30, 2026 08:44
@sradco sradco force-pushed the alerts-effective-metric branch from d24450d to 8cbd67c Compare May 6, 2026 11:39
@sradco sradco force-pushed the alert-mgmt-restructured-11-orphan-gc branch from 6014450 to 598abd6 Compare May 6, 2026 11:39
@sradco sradco force-pushed the alerts-effective-metric branch from 8cbd67c to a217801 Compare May 13, 2026 11:18
@sradco sradco force-pushed the alert-mgmt-restructured-11-orphan-gc branch from 598abd6 to 5b090e4 Compare May 13, 2026 11:18
@sradco sradco force-pushed the alerts-effective-metric branch from a217801 to 59c226e Compare May 13, 2026 11:32
@sradco sradco force-pushed the alert-mgmt-restructured-11-orphan-gc branch 2 times, most recently from 672bc3b to a24fa21 Compare May 13, 2026 13:06
@sradco sradco force-pushed the alerts-effective-metric branch from 59c226e to 31921ea Compare May 13, 2026 13:06
@sradco sradco force-pushed the alert-mgmt-restructured-11-orphan-gc branch from a24fa21 to ed02e88 Compare May 14, 2026 07:13
@sradco sradco force-pushed the alerts-effective-metric branch from 31921ea to e2a91c0 Compare May 14, 2026 07:13
@sradco sradco force-pushed the alert-mgmt-restructured-11-orphan-gc branch from ed02e88 to 6a001cd Compare May 14, 2026 07:16
@sradco sradco force-pushed the alerts-effective-metric branch from e2a91c0 to 66459e8 Compare May 14, 2026 07:16
Expose a Prometheus gauge metric whose value is the activeAt Unix
timestamp for every effective alert (firing, pending, silenced).

Labels include all alerts labels after relabeling plus enrichment labels
and alertstate.

Annotations are excluded since they are available from the alert rule definition.

Signed-off-by: Shirly Radco <sradco@redhat.com>
Co-authored-by: AI Assistant <noreply@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants