AKS + KEDA Kafka scaler: how to avoid alert noise from normal scale-down events without missing real pod failures?

We are using KEDA on AKS with the Kafka scaler to autoscale an application based on Kafka load.

Our problem is that normal scale-down events trigger AppDynamics alerts, and our production support team treats them as incidents even though they are expected autoscaling behavior.

A proposed workaround from our monitoring team is to alert only when the app drops to 0 or 1 replica, and ignore alerts for replica changes above 1. But that seems risky, because real failures at higher scale could be missed. For example, if the app drops from 4 replicas to 3 due to OOMKilled, crashes, restart loops, or unhealthy pods, that might look similar to a normal scale-down and not get surfaced properly.

We are looking for best-practice guidance for AKS + KEDA Kafka scaler on:

how to distinguish expected Kafka-driven scale-down from actual pod/application failures
what signals should be used for alerting instead of raw replica count changes
how to detect real issues such as:
OOMKilled
CrashLoopBackOff
repeated restarts
unhealthy / not-ready pods
partial degradation while replica count is still greater than 1

whether there is a recommended AKS/KEDA/HPA monitoring pattern to reduce false positives from autoscaling without hiding real incidents

Our goal is to avoid paging on normal autoscaling activity while still detecting genuine production issues.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AKS + KEDA Kafka scaler: how to avoid alert noise from normal scale-down events without missing real pod failures? #5657

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

AKS + KEDA Kafka scaler: how to avoid alert noise from normal scale-down events without missing real pod failures? #5657

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions