Add blog post on AKS Configurable Scheduler Profiles#5505
Add blog post on AKS Configurable Scheduler Profiles#5505colinmixonn wants to merge 44 commits intomasterfrom
Conversation
This blog post introduces AKS Configurable Scheduler Profiles, highlighting their benefits for optimizing resource utilization and improving scheduling strategies for web-distributed and AI workloads. It covers configuration examples for GPU utilization, pod distribution across topology domains, and memory-optimized scheduling.
Added a new tag for Scheduler with relevant details.
Updated blog post on AKS Configurable Scheduler Profiles to improve clarity and correctness, including sections on GPU utilization, pod distribution, and memory-optimized scheduling.
Corrected typos and improved clarity in the blog post about AKS Configurable Scheduler Profiles.
Updated the blog to clarify the objectives of configuring AKS Configurable Scheduler Profiles, improved section titles, and ensured consistency in terminology.
Clarified the objectives and improved the wording in the blog post about AKS Configurable Scheduler Profiles.
website/blog/2025-12-16-aks-config-scheduler-profiles-preview/index.md
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Pull request overview
This pull request adds a new blog post announcing the preview of AKS Configurable Scheduler Profiles, a feature that enables fine-grained control over pod scheduling strategies to optimize resource utilization and improve workload performance.
Key Changes
- Introduces a new "scheduler" tag to categorize blog posts related to pod placement and scheduling optimization
- Adds comprehensive blog post covering three main scheduling use cases: GPU bin-packing for AI workloads, pod distribution across topology domains for resilience, and memory-optimized scheduling with PVC-aware placement
- Provides YAML configuration examples and best practices for implementing custom scheduler profiles
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 20 comments.
| File | Description |
|---|---|
| website/blog/tags.yml | Adds new "scheduler" tag for categorizing posts about pod placement and scheduling techniques |
| website/blog/2025-12-16-aks-config-scheduler-profiles-preview/index.md | New blog post introducing AKS Configurable Scheduler Profiles with configuration examples for GPU utilization, topology distribution, and memory-optimized scheduling |
website/blog/2025-12-16-aks-config-scheduler-profiles-preview/index.md
Outdated
Show resolved
Hide resolved
website/blog/2025-12-16-aks-config-scheduler-profiles-preview/index.md
Outdated
Show resolved
Hide resolved
website/blog/2025-12-16-aks-config-scheduler-profiles-preview/index.md
Outdated
Show resolved
Hide resolved
website/blog/2025-12-16-aks-config-scheduler-profiles-preview/index.md
Outdated
Show resolved
Hide resolved
website/blog/2025-12-16-aks-config-scheduler-profiles-preview/index.md
Outdated
Show resolved
Hide resolved
website/blog/2025-12-16-aks-config-scheduler-profiles-preview/index.md
Outdated
Show resolved
Hide resolved
website/blog/2025-12-16-aks-config-scheduler-profiles-preview/index.md
Outdated
Show resolved
Hide resolved
website/blog/2025-12-16-aks-config-scheduler-profiles-preview/index.md
Outdated
Show resolved
Hide resolved
website/blog/2025-12-16-aks-config-scheduler-profiles-preview/index.md
Outdated
Show resolved
Hide resolved
website/blog/2025-12-16-aks-config-scheduler-profiles-preview/index.md
Outdated
Show resolved
Hide resolved
…index.md Co-authored-by: Diego Casati <diego.casati@gmail.com>
…index.md Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…index.md Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…index.md Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…index.md Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
website/blog/2025-12-16-aks-config-scheduler-profiles-preview/index.md
Outdated
Show resolved
Hide resolved
| scheduler: | ||
| label: Scheduler | ||
| permalink: /scheduler | ||
| description: Techniques and capabilities to control pod placement, improve resource efficiency, and tune scheduling for diverse AKS workloads. |
There was a problem hiding this comment.
Add a blank line after the new scheduler tag entry so tag blocks stay consistently separated (most tags in this file have an empty line between entries, e.g., after scaling).
| description: Techniques and capabilities to control pod placement, improve resource efficiency, and tune scheduling for diverse AKS workloads. | |
| description: Techniques and capabilities to control pod placement, improve resource efficiency, and tune scheduling for diverse AKS workloads. |
website/blog/2025-12-16-aks-config-scheduler-profiles-preview/index.md
Outdated
Show resolved
Hide resolved
| Lastly, you will find [best practices](#best-practices-and-configuration-considerations) to help guide how you consider both individual plugin configurations, your custom scheduler configuration, and your Deployment design holistically. | ||
|
|
||
| <!-- truncate --> | ||
|
|
There was a problem hiding this comment.
This post is missing the hero image right after <!-- truncate -->. The post directory currently contains only index.md, so nothing will render as the header/preview image. Add an image file to this folder and include it immediately after the truncate marker with descriptive alt text.
|  |
website/blog/2025-12-16-aks-config-scheduler-profiles-preview/index.md
Outdated
Show resolved
Hide resolved
website/blog/2025-12-16-aks-config-scheduler-profiles-preview/index.md
Outdated
Show resolved
Hide resolved
website/blog/2025-12-16-aks-config-scheduler-profiles-preview/index.md
Outdated
Show resolved
Hide resolved
| As a reminder, there are many parameters the scheduler considers across the [scheduling cycle][scheduling-framework/#interfaces] before a pod is placed on a node that impacts how a pod is assigned. This section is meant to help guide how you consider both individual plugin configurations, your custom scheduler configuration, and your Deployment design holistically. | ||
|
|
||
| 1. Ensure the intended deployment is assigned to the _correct_ scheduler profile. | ||
| 2. Ensure the custom scheduler profile complements the implementation of Deployments, StorageClasses, and PersistentVolumeClaim's. Misalignment can lead to pending pods and degraded workload performance, even when the scheduler is functioning as expected. |
There was a problem hiding this comment.
Avoid apostrophes for plurals. This should be PersistentVolumeClaims (or PersistentVolumeClaims (PVCs)), not PersistentVolumeClaim's.
…index.md Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…index.md Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
website/blog/2025-12-16-aks-config-scheduler-profiles-preview/index.md
Outdated
Show resolved
Hide resolved
website/blog/2025-12-16-aks-config-scheduler-profiles-preview/index.md
Outdated
Show resolved
Hide resolved
website/blog/2025-12-16-aks-config-scheduler-profiles-preview/index.md
Outdated
Show resolved
Hide resolved
website/blog/2025-12-16-aks-config-scheduler-profiles-preview/index.md
Outdated
Show resolved
Hide resolved
website/blog/2025-12-16-aks-config-scheduler-profiles-preview/index.md
Outdated
Show resolved
Hide resolved
| 10. Pair `PodTopologySpread` with Pod Disruption Budgets (PDBs) and multi‑replica strategies for HA during upgrades. | ||
|
|
There was a problem hiding this comment.
Use plural without an apostrophe: “Pod Disruption Budget's (PDB)” should be “pod disruption budgets (PDBs)” (or “PodDisruptionBudgets (PDBs)” if you want to match the Kubernetes resource name).
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
website/blog/2025-12-16-aks-config-scheduler-profiles-preview/index.md
Outdated
Show resolved
Hide resolved
| 5. Assign `priorityClassName` for workloads that should preempt others, this is critical if you use the DefaultPreemption plugin. | ||
| 6. If you use the `ImageLocality` plugin, use DaemonSets or node pre-pulling for latency-sensitive images, otherwise the benefit may be minimal. | ||
| 7. If your cluster is large, a low `PercentageOfNodesToScore` speeds scheduling by reducing the number of nodes scored, _but_ it may reduce optimal placement. | ||
| 8. If you enable a plugin in the `plugins:multipoint` section but do not define it in `pluginConfig`, AKS uses the default configuration for that plugin. |
There was a problem hiding this comment.
This reference to plugins:multipoint looks like a typo/inaccurate key name. In the earlier examples in this post the scheduler config uses plugins with a multiPoint section; please align this sentence with the actual kube-scheduler configuration field name so readers don’t copy a non-working configuration.
| 8. If you enable a plugin in the `plugins:multipoint` section but do not define it in `pluginConfig`, AKS uses the default configuration for that plugin. | |
| 8. If you enable a plugin in the `plugins.multiPoint` section but do not define it in `pluginConfig`, AKS uses the default configuration for that plugin. |
| --- | ||
| title: "AKS configurable scheduler profiles (preview)" | ||
| description: "Optimize AKS scheduling with configurable scheduler profiles that improve GPU utilization and align pod placement to your critical workloads at scale." | ||
| date: 2026-01-23 |
There was a problem hiding this comment.
The folder name (2025-12-16-aks-config-scheduler-profiles-preview) and the front matter date: 2026-01-23 don’t match. In this repo, the YYYY-MM-DD-... directory prefix typically matches the post date (for example, website/blog/2025-07-25-aks-lts-announcement/index.md:2-4). Please either rename the folder to 2026-01-23-... or change the front matter date to 2025-12-16 to keep URLs and chronology consistent.
| date: 2026-01-23 | |
| date: 2025-12-16 |
Updated the AKS scheduler profiles preview with new date and refined content. Removed outdated sections and optimized descriptions for clarity.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
| Lastly, you will find [best practices](#best-practices-and-configuration-considerations) to help guide how you consider both individual plugin configurations, your custom scheduler configuration, and your Deployment design holistically. | ||
|
|
||
| <!-- truncate --> | ||
|
|
There was a problem hiding this comment.
This post is missing a hero image after the <!-- truncate --> marker. The repo blog guidelines call for a hero image immediately after <!-- truncate --> (see .github/instructions/website.blog.instructions.md).
|  |
website/blog/2025-12-16-aks-config-scheduler-profiles-preview/index.md
Outdated
Show resolved
Hide resolved
website/blog/2025-12-16-aks-config-scheduler-profiles-preview/index.md
Outdated
Show resolved
Hide resolved
| 1. [Increase GPU utilization by bin packing GPU-backed nodes](#increase-gpu-utilization-by-bin-packing-gpu-backed-nodes) | ||
| 2. [Increase resilience by distributing pods across topology domains](#increase-resilience-by-distributing-pods-across-topology-domains) | ||
| 3. [Optimize data locality with memory and PVC-aware scheduling](#optimize-data-locality-with-memory-and-pvc-aware-scheduling) | ||
|
|
||
| Lastly, you will find [best practices](#best-practices-and-configuration-considerations) to help guide how you consider both individual plugin configurations, your custom scheduler configuration, and your Deployment design holistically. |
There was a problem hiding this comment.
The intro links to sections for topology-domain distribution and PVC-aware scheduling, but those headings aren’t present later in the post, so these anchor links will be broken. Either add the missing sections or update the objectives/links to match the current content.
| 1. [Increase GPU utilization by bin packing GPU-backed nodes](#increase-gpu-utilization-by-bin-packing-gpu-backed-nodes) | |
| 2. [Increase resilience by distributing pods across topology domains](#increase-resilience-by-distributing-pods-across-topology-domains) | |
| 3. [Optimize data locality with memory and PVC-aware scheduling](#optimize-data-locality-with-memory-and-pvc-aware-scheduling) | |
| Lastly, you will find [best practices](#best-practices-and-configuration-considerations) to help guide how you consider both individual plugin configurations, your custom scheduler configuration, and your Deployment design holistically. | |
| 1. Increase GPU utilization by bin packing GPU-backed nodes | |
| 2. Increase resilience by distributing pods across topology domains | |
| 3. Optimize data locality with memory and PVC-aware scheduling | |
| Lastly, you will find best practices and configuration considerations to help guide how you consider both individual plugin configurations, your custom scheduler configuration, and your Deployment design holistically. |
| Below you will find example configurations for common workload objectives. | ||
|
|
||
| :::note | ||
| Adjust VM SKUs in `NodeAffinity`, shift utilization curves or weights, and use the right zones for your cluster(s) in the configurations below. |
There was a problem hiding this comment.
The note says to "Adjust VM SKUs in NodeAffinity" and "use the right zones", but the configurations below don't include NodeAffinity or any zone-related settings. Consider updating the note to match the examples (or add the relevant fields to the examples).
| Adjust VM SKUs in `NodeAffinity`, shift utilization curves or weights, and use the right zones for your cluster(s) in the configurations below. | |
| Treat these examples as starting points. Adjust resource weights, utilization thresholds, and plugin parameters to match your VM SKUs, workload patterns, and cluster topology. |
| As a reminder, there are many parameters the scheduler considers across the [scheduling cycle][scheduling-framework/#interfaces] before a pod is placed on a node that impacts how a pod is assigned. This section is meant to help guide how you consider both individual plugin configurations, your custom scheduler configuration, and your Deployment design holistically. | ||
|
|
||
| 1. Ensure the intended deployment is assigned to the _correct_ scheduler profile. | ||
| 2. Ensure the custom scheduler profile compliments the implementation of Deployments, StorageClasses, and PersistentVolumeClaims. Misalignment can lead to pending pods and degraded workload performance, even when the scheduler is functioning as expected. |
There was a problem hiding this comment.
Spelling/word choice: "compliments" should be "complements" here (meaning "fits well with"), not "praises".
| 2. Ensure the custom scheduler profile compliments the implementation of Deployments, StorageClasses, and PersistentVolumeClaims. Misalignment can lead to pending pods and degraded workload performance, even when the scheduler is functioning as expected. | |
| 2. Ensure the custom scheduler profile complements the implementation of Deployments, StorageClasses, and PersistentVolumeClaims. Misalignment can lead to pending pods and degraded workload performance, even when the scheduler is functioning as expected. |
| ### ResourceToCapacity | ||
|
|
||
|
|
||
| **This scheduler configuration ensures workloads needing large memory footprints are placed on nodes that provide sufficient RAM and maintain proximity to their volumes, enabling fast, zone‑aligned PVC binding for optimal data locality.** |
There was a problem hiding this comment.
This paragraph describes PVC/data-locality behavior ("zone‑aligned PVC binding" / "proximity to their volumes"), but the configuration shown in this section only enables NodeResourcesFit. Either adjust the description to match the example, or update the example to include the relevant PVC/topology scheduling pieces.
| **This scheduler configuration ensures workloads needing large memory footprints are placed on nodes that provide sufficient RAM and maintain proximity to their volumes, enabling fast, zone‑aligned PVC binding for optimal data locality.** | |
| **This scheduler configuration prioritizes workloads with large memory footprints onto nodes that provide sufficient RAM, while still accounting for CPU and ephemeral storage utilization for balanced resource usage.** |
| ``` | ||
|
|
||
|
|
||
| ### ResourceToCapacity |
There was a problem hiding this comment.
The heading "ResourceToCapacity" doesn't align with the earlier section links/objectives (topology distribution, PVC-aware scheduling), which makes the post structure hard to follow. Consider renaming this heading to match the objective it’s meant to cover (or adjust the objectives).
| ### ResourceToCapacity | |
| ### PVC-aware scheduling with RequestedToCapacityRatio |
| apiVersion: kubescheduler.config.k8s.io/v1 | ||
| kind: KubeSchedulerConfiguration | ||
| profiles: | ||
| - schedulerName: cpu-binpacking-scheduler |
There was a problem hiding this comment.
In this example, schedulerName is cpu-binpacking-scheduler, but the surrounding text emphasizes memory/PVC-aware scheduling. Consider updating the scheduler name and/or surrounding explanation so readers can tell what this profile is intended to optimize.
| - schedulerName: cpu-binpacking-scheduler | |
| - schedulerName: memory-binpacking-scheduler |
Updated the section title and enhanced the description of AKS Configurable Scheduler Profiles to emphasize optimization and testing.
Removed the 'Best Practices and Configuration Considerations' section to streamline content and focus on next steps for AKS Configurable Scheduler Profiles.
| ### ResourceToCapacity | ||
|
|
||
|
|
||
| **This scheduler configuration ensures workloads needing large memory footprints are placed on nodes that provide sufficient RAM and maintain proximity to their volumes, enabling fast, zone‑aligned PVC binding for optimal data locality.** | ||
|
|
There was a problem hiding this comment.
The section title “ResourceToCapacity” doesn’t match the example below, which configures NodeResourcesFit with a RequestedToCapacityRatio scoring strategy. Rename the section so readers can connect it to the configured behavior.
| With AKS Configurable Scheduler Profiles, teams gain fine-grained control over pod placement strategies like bin-packing, topology distribution, and resource-based scoring that directly address the challenges of resilience and resource utilization for web-distributed workloads and AI workloads. By leveraging these advanced scheduling plugins, AKS users can ensure their workloads make full use of available GPU capacity, reduce idle time, and avoid costly overprovisioning. This not only improves ROI but also accelerates innovation by allowing more jobs to run concurrently and reliably. | ||
|
|
||
| - For best practices using the kube-scheduler visit [kube-scheduler best practices][best-practices-advanced-scheduler] | ||
| - Configure your workload specific scheduler using the [AKS Configurable Scheduler][concepts-scheduler-configuration] |
There was a problem hiding this comment.
Spelling: “compliments” should be “complements” (meaning ‘fits well with’).
Updated authors and tags formatting in the blog post.
| [scheduling-framework/#interfaces]: https://kubernetes.io/docs/concepts/scheduling-eviction/scheduling-framework/#interfaces | ||
| [memory-optimized-vm]: https://learn.microsoft.com/azure/virtual-machines/sizes/overview?tabs=breakdownseries%2Cgeneralsizelist%2Ccomputesizelist%2Cmemorysizelist%2Cstoragesizelist%2Cgpusizelist%2Cfpgasizelist%2Chpcsizelist#memory-optimized | ||
| [supported-in-tree-scheduling-plugins]: https://learn.microsoft.com/azure/aks/concepts-scheduler-configuration#supported-in-tree-scheduling-plugins |
There was a problem hiding this comment.
The reference link definitions for scheduling-framework/#interfaces and memory-optimized-vm are currently unused in the post. Either reference them in the content or remove them to avoid dead/unused link clutter.
| [scheduling-framework/#interfaces]: https://kubernetes.io/docs/concepts/scheduling-eviction/scheduling-framework/#interfaces | |
| [memory-optimized-vm]: https://learn.microsoft.com/azure/virtual-machines/sizes/overview?tabs=breakdownseries%2Cgeneralsizelist%2Ccomputesizelist%2Cmemorysizelist%2Cstoragesizelist%2Cgpusizelist%2Cfpgasizelist%2Chpcsizelist#memory-optimized | |
| [supported-in-tree-scheduling-plugins]: https://learn.microsoft.com/azure/aks/concepts-scheduler-configuration#supported-in-tree-scheduling-plugins |
Updated the blog post to clarify the objectives of configuring AKS Configurable Scheduler Profiles, changing the focus from general workload objectives to increased node utilization. Adjusted the list of objectives to reflect the new focus and improved clarity.
|
|
||
| The AKS default scheduler scores nodes for workload placement based on a _LeastAllocated_ strategy, to spread across the nodes in a cluster. However, this behavior can result in inefficient resource utilization, as nodes with higher allocation are not favored. You can use `NodeResourcesFit` to control how pods are assigned to nodes based on available resources (CPU, GPU, memory, etc.), including favoring nodes with high resource utilization, within the set configuration. | ||
|
|
||
| For example, scheduling pending jobs on nodes with a higher relative GPU utilization, users can reduce costs and increase GPU Utilization while maintaining performance. |
There was a problem hiding this comment.
Minor capitalization/consistency: “GPU Utilization” is capitalized mid-sentence here; it should be “GPU utilization”.
| For example, scheduling pending jobs on nodes with a higher relative GPU utilization, users can reduce costs and increase GPU Utilization while maintaining performance. | |
| For example, scheduling pending jobs on nodes with a higher relative GPU utilization, users can reduce costs and increase GPU utilization while maintaining performance. |
| A scheduler profile is a set of one or more in-tree scheduling plugins and configurations that dictate how to schedule a pod. Previously, the scheduler configuration wasn't accessible to users. Starting from Kubernetes version 1.33, you can now configure and set a scheduler profile for the AKS scheduler on your cluster. AKS supports 18 in-tree Kubernetes [scheduling plugins][supported-in-tree-scheduling-plugins]. The plugins can be generally grouped into three categories: | ||
|
|
||
| 1. Scheduling constraints and order-based plugins | ||
| 2. Node selection constraints scheduling plugins | ||
| 3. Resource and topology optimization scheduling plugins | ||
|
|
||
| Below you will find example configurations for common workload objectives. | ||
|
|
There was a problem hiding this comment.
The PR description mentions configuration examples for “pod distribution across topology domains”, but this post doesn’t include a concrete example for topology spread/distribution (only a brief mention of topology distribution). Either add the missing example/section (for example, using plugins like PodTopologySpread), or adjust the PR description and the post’s claims to match what’s included.
| [scheduling-framework/#interfaces]: https://kubernetes.io/docs/concepts/scheduling-eviction/scheduling-framework/#interfaces | ||
| [memory-optimized-vm]: https://learn.microsoft.com/azure/virtual-machines/sizes/overview?tabs=breakdownseries%2Cgeneralsizelist%2Ccomputesizelist%2Cmemorysizelist%2Cstoragesizelist%2Cgpusizelist%2Cfpgasizelist%2Chpcsizelist#memory-optimized |
There was a problem hiding this comment.
Two reference-style link definitions appear to be unused ([scheduling-framework/#interfaces] and [memory-optimized-vm]). The repo’s markdownlint config enables MD053 (“reference definitions should be needed”), so these can cause lint failures. Please either use these references in the text or remove them.
| [scheduling-framework/#interfaces]: https://kubernetes.io/docs/concepts/scheduling-eviction/scheduling-framework/#interfaces | |
| [memory-optimized-vm]: https://learn.microsoft.com/azure/virtual-machines/sizes/overview?tabs=breakdownseries%2Cgeneralsizelist%2Ccomputesizelist%2Cmemorysizelist%2Cstoragesizelist%2Cgpusizelist%2Cfpgasizelist%2Chpcsizelist#memory-optimized |
| Lastly, you will find [best practices](#best-practices-and-configuration-considerations) to help guide how you consider both individual plugin configurations, your custom scheduler configuration, and your Deployment design holistically. | ||
|
|
||
| <!-- truncate --> | ||
|
|
There was a problem hiding this comment.
The blog guidelines expect a hero image immediately after the <!-- truncate --> marker, but this post goes straight into an H2 heading. Please add a hero image (with descriptive alt text) after the truncate marker.
|  |
|
|
||
| ### ResourceToCapacity | ||
|
|
||
|
|
There was a problem hiding this comment.
There are multiple consecutive blank lines around the ### ResourceToCapacity heading, which is likely to trip the repo’s markdownlint rules (MD012: max 1 consecutive blank line). Please collapse these to a single blank line.
| ### ResourceToCapacity | |
| ### ResourceToCapacity |
| **This scheduler configuration ensures workloads needing large memory footprints are placed on nodes that provide sufficient RAM and maintain proximity to their volumes, enabling fast, zone‑aligned PVC binding for optimal data locality.** | ||
|
|
||
| ```yaml | ||
| apiVersion: aks.azure.com/v1alpha1 | ||
| kind: SchedulerConfiguration | ||
| metadata: | ||
| name: upstream | ||
| spec: | ||
| rawConfig: | | ||
| apiVersion: kubescheduler.config.k8s.io/v1 | ||
| kind: KubeSchedulerConfiguration | ||
| profiles: | ||
| - schedulerName: cpu-binpacking-scheduler |
There was a problem hiding this comment.
The paragraph describing this example claims volume proximity and “zone‑aligned PVC binding”, but the provided scheduler profile only configures NodeResourcesFit scoring (RequestedToCapacityRatio) and doesn’t configure any volume-binding/topology plugins. Please update the description to match what the config actually does (or add the missing plugin configuration that enables the described behavior).
This blog post introduces AKS Configurable Scheduler Profiles, highlighting their benefits for optimizing resource utilization and improving scheduling strategies for web-distributed and AI workloads. It covers configuration examples for GPU utilization, pod distribution across topology domains, and memory-optimized scheduling.