Skip to content

operator SLO: mount the CR-state ConfigMap into kube-state-metrics; enable alerting in prod #33

@stxkxs

Description

@stxkxs

Follow-up from the eks-agent-platform/gitops consolidation (nanohype/eks-agent-platform#33).

The operator chart (charts/operator 0.2.0) ships a CustomResourceStateMetrics ConfigMap (kube-state-metrics-customresource-config in ns eks-agent-platform-monitoring) that makes the kube_customresource_* metrics exist. It is inert until kube-state-metrics runs with --custom-resource-state-config-file pointing at it.

Without that mount, the seven agent persona dashboards (dashboards/base/platform/agent-*.yaml) and the operator-slo CR-status alerts silently no-data.

Fix: determine where kube-state-metrics runs in this catalog (it is not a standalone observability addon today — confirm whether it ships with grafana-agent / a prometheus stack / elsewhere) and wire the --custom-resource-state-config-file flag + the ConfigMap reference (or relocate the ConfigMap to the KSM namespace).

Also: flip slo.alerting.enabled: true in addons/ai-platform/operator/values-production.yaml once the pagerduty-platform + slack-webhook-{incidents,finance,ops,eng,platform} Secrets are provisioned in production (the AlertmanagerConfig receivers reference them).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions