-
Notifications
You must be signed in to change notification settings - Fork 146
Description
What you would like to be added?
Proposal: Add Horizontal Pod Autoscaling for the KAI Admission Service
KAI includes an admission service responsible for handling the pod mutation and validation webhooks for pods that are scheduled by KAI.
Today, if a user observes high load on the admission service, they can manually increase the number of replicas via the KAI configuration. While this works, manually tuning replica counts is often suboptimal and reactive. In most cases, configuring Horizontal Pod Autoscaling (HPA) for the admission service deployment would provide a more robust and adaptive solution.
This enhancement proposes adding optional support for HPA on the admission service, allowing it to scale automatically based on load.
Open questions and considerations
-
Autoscaling metrics
Which metrics best reflect the actual load on the admission service pods?
Examples to consider include:- CPU and/or memory utilization
- Request rate to the webhook endpoints
- Admission request latency
- In-flight or queued admission requests (if available)
-
Manual replica configuration
Users should still be able to explicitly configure a fixed number of replicas for the admission service via the KAI configuration.
We need to define how manual replica settings interact with HPA (e.g., mutually exclusive, or using the configured replica count as the HPA minimum).
Why is this needed?
Horizontal pod autoscaling will allow the deployment to automaticly add/reduce the number of admission pods according to the pod webhoolks load.