Skip to content

AKS + KEDA Kafka scaler: how to avoid alert noise from normal scale-down events without missing real pod failures? #5657

@anlkorkut

Description

@anlkorkut

We are using KEDA on AKS with the Kafka scaler to autoscale an application based on Kafka load.

Our problem is that normal scale-down events trigger AppDynamics alerts, and our production support team treats them as incidents even though they are expected autoscaling behavior.

A proposed workaround from our monitoring team is to alert only when the app drops to 0 or 1 replica, and ignore alerts for replica changes above 1. But that seems risky, because real failures at higher scale could be missed. For example, if the app drops from 4 replicas to 3 due to OOMKilled, crashes, restart loops, or unhealthy pods, that might look similar to a normal scale-down and not get surfaced properly.

We are looking for best-practice guidance for AKS + KEDA Kafka scaler on:

how to distinguish expected Kafka-driven scale-down from actual pod/application failures
what signals should be used for alerting instead of raw replica count changes
how to detect real issues such as:
OOMKilled
CrashLoopBackOff
repeated restarts
unhealthy / not-ready pods
partial degradation while replica count is still greater than 1

whether there is a recommended AKS/KEDA/HPA monitoring pattern to reduce false positives from autoscaling without hiding real incidents

Our goal is to avoid paging on normal autoscaling activity while still detecting genuine production issues.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions