Follow-up from the eks-agent-platform/gitops consolidation (nanohype/eks-agent-platform#33).
The operator chart (charts/operator 0.2.0) ships a CustomResourceStateMetrics ConfigMap (kube-state-metrics-customresource-config in ns eks-agent-platform-monitoring) that makes the kube_customresource_* metrics exist. It is inert until kube-state-metrics runs with --custom-resource-state-config-file pointing at it.
Without that mount, the seven agent persona dashboards (dashboards/base/platform/agent-*.yaml) and the operator-slo CR-status alerts silently no-data.
Fix: determine where kube-state-metrics runs in this catalog (it is not a standalone observability addon today — confirm whether it ships with grafana-agent / a prometheus stack / elsewhere) and wire the --custom-resource-state-config-file flag + the ConfigMap reference (or relocate the ConfigMap to the KSM namespace).
Also: flip slo.alerting.enabled: true in addons/ai-platform/operator/values-production.yaml once the pagerduty-platform + slack-webhook-{incidents,finance,ops,eng,platform} Secrets are provisioned in production (the AlertmanagerConfig receivers reference them).
Follow-up from the eks-agent-platform/gitops consolidation (nanohype/eks-agent-platform#33).
The operator chart (charts/operator 0.2.0) ships a
CustomResourceStateMetricsConfigMap (kube-state-metrics-customresource-configin nseks-agent-platform-monitoring) that makes thekube_customresource_*metrics exist. It is inert until kube-state-metrics runs with--custom-resource-state-config-filepointing at it.Without that mount, the seven agent persona dashboards (
dashboards/base/platform/agent-*.yaml) and the operator-slo CR-status alerts silently no-data.Fix: determine where kube-state-metrics runs in this catalog (it is not a standalone observability addon today — confirm whether it ships with grafana-agent / a prometheus stack / elsewhere) and wire the
--custom-resource-state-config-fileflag + the ConfigMap reference (or relocate the ConfigMap to the KSM namespace).Also: flip
slo.alerting.enabled: trueinaddons/ai-platform/operator/values-production.yamlonce thepagerduty-platform+slack-webhook-{incidents,finance,ops,eng,platform}Secrets are provisioned in production (the AlertmanagerConfig receivers reference them).