silogen · AVSuni · Apr 13, 2026 · Mar 24, 2026 · Mar 24, 2026 · Mar 25, 2026
diff --git a/apis/config/v1alpha1/kaiwoconfig_types.go b/apis/config/v1alpha1/kaiwoconfig_types.go
@@ -78,7 +78,8 @@ type KaiwoConfigSpec struct {
 	DynamicallyUpdateDefaultClusterQueue bool `json:"dynamicallyUpdateDefaultClusterQueue,omitempty"`
 
 	// DefaultTopologyName is the name of the default Kueue Topology used for Topology Aware Scheduling.
-	// Auto-generated ResourceFlavors reference this topology when DynamicallyUpdateDefaultClusterQueue is enabled.
+	// Auto-generated ResourceFlavors reference this topology to enable TAS capability.
+	// Workloads opt in to TAS by setting preferredTopologyLabel or requiredTopologyLabel.
 	// +kubebuilder:default="default-topology"
 	DefaultTopologyName string `json:"defaultTopologyName,omitempty"`
 }

diff --git a/apis/kaiwo/v1alpha1/common_types.go b/apis/kaiwo/v1alpha1/common_types.go
@@ -124,10 +124,11 @@ type CommonMetaSpec struct {
 	// Duration specifies the maximum duration over which the workload can run. This is useful for avoiding workloads running indefinitely.
 	Duration *metav1.Duration `json:"duration,omitempty"`
 
-	// PreferredTopologyLabel specifies the preferred topology label for scheduling the workload. This is used to influence how the workload is distributed across nodes in the cluster.
-	// If not specified, Kaiwo will use the default topology labels defined in the default topology of KaiwoQueueConfig starting at the host level.
+	// PreferredTopologyLabel specifies the preferred topology label for scheduling the workload (opt-in).
+	// When set, Kueue's Topology Aware Scheduling is activated for this workload.
 	// The levels are evaluated one-by-one going up from the level indicated by the label. If the PodSet cannot fit within a given topology label then the next topology level up is considered.
-	// If the PodSet cannot fit at the highest topology level, then it is distributed among multiple topology domains
+	// If the PodSet cannot fit at the highest topology level, then it is distributed among multiple topology domains.
+	// If not specified, no topology-aware scheduling is applied.
 	PreferredTopologyLabel string `json:"preferredTopologyLabel,omitempty"`
 
 	// RequiredTopologyLabel specifies the required topology label for scheduling the workload. This is used to ensure that the workload is scheduled on nodes that match the specified topology label.

diff --git a/config/crd/bases/config.kaiwo.silogen.ai_kaiwoconfigs.yaml b/config/crd/bases/config.kaiwo.silogen.ai_kaiwoconfigs.yaml
@@ -56,7 +56,8 @@ spec:
                 default: default-topology
                 description: |-
                   DefaultTopologyName is the name of the default Kueue Topology used for Topology Aware Scheduling.
-                  Auto-generated ResourceFlavors reference this topology when DynamicallyUpdateDefaultClusterQueue is enabled.
+                  Auto-generated ResourceFlavors reference this topology to enable TAS capability.
+                  Workloads opt in to TAS by setting preferredTopologyLabel or requiredTopologyLabel.
                 type: string
               dynamicallyUpdateDefaultClusterQueue:
                 default: false

diff --git a/config/crd/bases/kaiwo.silogen.ai_kaiwojobs.yaml b/config/crd/bases/kaiwo.silogen.ai_kaiwojobs.yaml
@@ -8751,9 +8751,10 @@ spec:
                 type: object
               preferredTopologyLabel:
                 description: |-
-                  PreferredTopologyLabel specifies the preferred topology label for scheduling the workload. This is used to influence how the workload is distributed across nodes in the cluster.
-                  If not specified, Kaiwo will use the default topology labels defined in the default topology of KaiwoQueueConfig starting at the host level.
+                  PreferredTopologyLabel specifies the preferred topology label for scheduling the workload (opt-in).
+                  When set, Kueue's Topology Aware Scheduling is activated for this workload.
                   The levels are evaluated one-by-one going up from the level indicated by the label. If the PodSet cannot fit within a given topology label then the next topology level up is considered.
+                  If the PodSet cannot fit at the highest topology level, then it is distributed among multiple topology domains.
                 type: string
               priorityClass:
                 description: WorkloadPriorityClass specifies the name of Kueue `WorkloadPriorityClass`

diff --git a/config/crd/bases/kaiwo.silogen.ai_kaiwoservices.yaml b/config/crd/bases/kaiwo.silogen.ai_kaiwoservices.yaml
@@ -8471,9 +8471,10 @@ spec:
                 type: object
               preferredTopologyLabel:
                 description: |-
-                  PreferredTopologyLabel specifies the preferred topology label for scheduling the workload. This is used to influence how the workload is distributed across nodes in the cluster.
-                  If not specified, Kaiwo will use the default topology labels defined in the default topology of KaiwoQueueConfig starting at the host level.
+                  PreferredTopologyLabel specifies the preferred topology label for scheduling the workload (opt-in).
+                  When set, Kueue's Topology Aware Scheduling is activated for this workload.
                   The levels are evaluated one-by-one going up from the level indicated by the label. If the PodSet cannot fit within a given topology label then the next topology level up is considered.
+                  If the PodSet cannot fit at the highest topology level, then it is distributed among multiple topology domains.
                 type: string
               priorityClass:
                 description: WorkloadPriorityClass specifies the name of Kueue `WorkloadPriorityClass`

diff --git a/crd/crds.yaml b/crd/crds.yaml
@@ -6511,7 +6511,8 @@ spec:
                 default: default-topology
                 description: |-
                   DefaultTopologyName is the name of the default Kueue Topology used for Topology Aware Scheduling.
-                  Auto-generated ResourceFlavors reference this topology when DynamicallyUpdateDefaultClusterQueue is enabled.
+                  Auto-generated ResourceFlavors reference this topology to enable TAS capability.
+                  Workloads opt in to TAS by setting preferredTopologyLabel or requiredTopologyLabel.
                 type: string
               dynamicallyUpdateDefaultClusterQueue:
                 default: false
@@ -15438,9 +15439,10 @@ spec:
                 type: object
               preferredTopologyLabel:
                 description: |-
-                  PreferredTopologyLabel specifies the preferred topology label for scheduling the workload. This is used to influence how the workload is distributed across nodes in the cluster.
-                  If not specified, Kaiwo will use the default topology labels defined in the default topology of KaiwoQueueConfig starting at the host level.
+                  PreferredTopologyLabel specifies the preferred topology label for scheduling the workload (opt-in).
+                  When set, Kueue's Topology Aware Scheduling is activated for this workload.
                   The levels are evaluated one-by-one going up from the level indicated by the label. If the PodSet cannot fit within a given topology label then the next topology level up is considered.
+                  If the PodSet cannot fit at the highest topology level, then it is distributed among multiple topology domains.
                 type: string
               priorityClass:
                 description: WorkloadPriorityClass specifies the name of Kueue `WorkloadPriorityClass`
@@ -51487,9 +51489,10 @@ spec:
                 type: object
               preferredTopologyLabel:
                 description: |-
-                  PreferredTopologyLabel specifies the preferred topology label for scheduling the workload. This is used to influence how the workload is distributed across nodes in the cluster.
-                  If not specified, Kaiwo will use the default topology labels defined in the default topology of KaiwoQueueConfig starting at the host level.
+                  PreferredTopologyLabel specifies the preferred topology label for scheduling the workload (opt-in).
+                  When set, Kueue's Topology Aware Scheduling is activated for this workload.
                   The levels are evaluated one-by-one going up from the level indicated by the label. If the PodSet cannot fit within a given topology label then the next topology level up is considered.
+                  If the PodSet cannot fit at the highest topology level, then it is distributed among multiple topology domains.
                 type: string
               priorityClass:
                 description: WorkloadPriorityClass specifies the name of Kueue `WorkloadPriorityClass`

diff --git a/docs/docs/admin/configuration.md b/docs/docs/admin/configuration.md
@@ -29,10 +29,16 @@ You can modify this automatically created configuration or create your own `kaiw
 *   **`resourceFlavors`**: Defines the types of hardware resources available in the cluster, corresponding to Kueue `ResourceFlavor` resources.
     *   `name`: A unique name for the flavor (e.g., `amd-mi300-8gpu`, `nvidia-a100-40gb`, `cpu-standard`).
     *   `nodeLabels`: A map of labels that nodes must possess to be considered part of this flavor. This is crucial for scheduling pods onto the correct hardware. Example: `{"kaiwo/nodepool": "amd-mi300-nodes"}`.
+    *   `topologyName`: (Optional) The name of a Kueue `Topology` (defined in `spec.topologies`) that this flavor references. When set, the flavor enables [Topology Aware Scheduling (TAS)](#topology-aware-scheduling-tas) for workloads that opt in. See the TAS section below for details.
     *   `taints`: (Optional) A list of Kubernetes taints associated with this flavor. Pods scheduled to this flavor will need corresponding tolerations. Kaiwo automatically adds tolerations for GPU taints if `ADD_TAINTS_TO_GPU_NODES` is enabled.
 
     !!! info "Auto-Discovery vs. Explicit Definition"
-        If `spec.resourceFlavors` is empty or omitted in the `kaiwo` `KaiwoQueueConfig`, the operator's startup logic attempts to **auto-discover** node pools and create corresponding flavors as described above. While convenient for initial setup, explicitly defining `resourceFlavors` in the `KaiwoQueueConfig` provides more precise control and is generally recommended for production environments. Explicitly defined flavors will override any auto-discovered ones during reconciliation.
+        If `spec.resourceFlavors` is empty or omitted in the `kaiwo` `KaiwoQueueConfig`, the operator's startup logic attempts to **auto-discover** node pools and create corresponding flavors as described above. Auto-discovered flavors automatically reference the default topology (configured via `KaiwoConfig.spec.defaultTopologyName`, defaulting to `default-topology`) to enable TAS capability. While convenient for initial setup, explicitly defining `resourceFlavors` in the `KaiwoQueueConfig` provides more precise control and is generally recommended for production environments. Explicitly defined flavors will override any auto-discovered ones during reconciliation.
+
+    !!! warning "ResourceFlavor Immutability"
+        Kueue makes `ResourceFlavor` specs **immutable** once `topologyName` is set. If you need to change the spec of such a flavor (e.g., changing or removing `topologyName`), the Kaiwo controller will automatically handle this by deleting and recreating the flavor. If the old flavor is still in use by a `ClusterQueue`, Kueue's `resource-in-use` finalizer may delay the deletion; the controller will converge on subsequent reconciliation cycles.
+
+*   **`topologies`**: Defines Kueue `Topology` resources that describe the physical or logical topology of the cluster (e.g., rack, block, host hierarchy). Topologies are referenced by `resourceFlavors` via the `topologyName` field. Each topology specifies a list of `levels`, where each level is a node label key (e.g., `kaiwo/topology-block`, `kaiwo/topology-rack`, `kubernetes.io/hostname`).
 
 *   **`clusterQueues`**: Defines the Kueue `ClusterQueue` resources managed by Kaiwo.
     *   `name`: The name of the `ClusterQueue` (e.g., `team-a-queue`, `default-gpu-queue`).
@@ -52,37 +58,41 @@ kind: KaiwoQueueConfig
 metadata:
   name: kaiwo # Must be named 'kaiwo' (or DEFAULT_KAIWO_QUEUE_CONFIG_NAME)
 spec:
+  topologies:
+    - name: gpu-topology
+      levels:
+        - kaiwo/topology-block
+        - kaiwo/topology-rack
+        - kubernetes.io/hostname
+
   resourceFlavors:
     - name: amd-mi300-8gpu
       nodeLabels:
-        kaiwo/nodepool: amd-mi300-nodes # Nodes with this label belong to this flavor
-        # Add other identifying labels if needed, e.g., topology.amd.com/gpu-count: '8'
-      # taints: # Optional, define if specific taints apply ONLY to these nodes
-      # - key: "amd.com/gpu"
-      #   operator: "Exists"
-      #   effect: "NoSchedule"
+        kaiwo/nodepool: amd-mi300-nodes
+      topologyName: gpu-topology  # Enables TAS for workloads using this flavor
     - name: cpu-high-mem
       nodeLabels:
         kaiwo/nodepool: cpu-high-mem-nodes
+        # No topologyName — TAS not available for this flavor
 
   clusterQueues:
-    - name: ai-research-queue # Name of the ClusterQueue
-      namespaces: # Auto-create/manage LocalQueues in these namespaces
+    - name: ai-research-queue
+      namespaces:
         - ai-research-ns-1
         - ai-research-ns-2
-      spec: # Standard Kueue ClusterQueueSpec
+      spec:
         queueingStrategy: BestEffortFIFO
         resourceGroups:
-          - coveredResources: ["cpu", "memory", "amd.com/gpu"] # Resources managed by this group
+          - coveredResources: ["cpu", "memory", "amd.com/gpu"]
             flavors:
-              - name: amd-mi300-8gpu # Reference to a defined resourceFlavor
+              - name: amd-mi300-8gpu
                 resources:
                   - name: "cpu"
-                    nominalQuota: "192" # Total CPU quota for this flavor in this queue
+                    nominalQuota: "192"
                   - name: "memory"
-                    nominalQuota: "1024Gi" # Total Memory quota
+                    nominalQuota: "1024Gi"
                   - name: "amd.com/gpu"
-                    nominalQuota: "8" # Total GPU quota
+                    nominalQuota: "8"
           - coveredResources: ["cpu", "memory"]
             flavors:
               - name: cpu-high-mem
@@ -91,8 +101,6 @@ spec:
                     nominalQuota: "256"
                   - name: "memory"
                     nominalQuota: "2048Gi"
-        # cohort: "gpu-cohort" # Optional: Group queues for borrowing/preemption
-        # preemption: ...
 
   workloadPriorityClasses:
     - name: high-priority
@@ -105,10 +113,15 @@ spec:
 
 The `KaiwoQueueConfigController` acts as a translator, continuously ensuring that the Kueue resources in your cluster accurately reflect the configuration defined in the single `kaiwo` `KaiwoQueueConfig` resource. It monitors this resource and automatically manages the lifecycle of the associated Kueue objects:
 
+*   **`spec.topologies` -> Kueue `Topology`:**
+    *   Each entry defines a Kueue `Topology` resource describing the cluster's physical or logical topology hierarchy.
+    *   Topologies are synced before ResourceFlavors to ensure flavors can reference them immediately.
+
 *   **`spec.resourceFlavors` -> Kueue `ResourceFlavor`:**
     *   Each entry in this list directly defines a Kueue `ResourceFlavor`.
-    *   The controller ensures a corresponding `ResourceFlavor` exists for each entry, creating or updating it as necessary based on the specified `name`, `nodeLabels`, and `taints`.
+    *   The controller ensures a corresponding `ResourceFlavor` exists for each entry, creating or updating it as necessary based on the specified `name`, `nodeLabels`, `topologyName`, and `taints`.
     *   If an entry is removed from this list, the controller deletes the corresponding `ResourceFlavor`.
+    *   For flavors with `topologyName` set, Kueue makes the spec immutable. If a spec change is needed, the controller handles this transparently via delete-and-recreate.
 
 *   **`spec.clusterQueues` -> Kueue `ClusterQueue` and `LocalQueue`:**
     *   Each entry in this list defines a Kueue `ClusterQueue`. The controller translates the structure into a standard `ClusterQueueSpec` and ensures the resource exists and matches the definition. Removing an entry deletes the corresponding `ClusterQueue`.
@@ -127,6 +140,28 @@ The `KaiwoQueueConfigController` acts as a translator, continuously ensuring tha
 
 The controller updates the `status.status` field of the `KaiwoQueueConfig` resource (`Pending`, `Ready`, or `Failed`) to indicate the current state of synchronization between the desired configuration and the actual Kueue resources in the cluster. This continuous reconciliation keeps the Kueue setup aligned with the central `KaiwoQueueConfig`.
 
+### Topology Aware Scheduling (TAS)
+
+Kaiwo integrates with Kueue's [Topology Aware Scheduling](https://kueue.sigs.k8s.io/docs/concepts/topology_aware_scheduling/) to place workload pods close together in the cluster topology (e.g., same rack, same network block), which can improve performance for distributed training workloads.
+
+**How it works:**
+
+TAS in Kaiwo is a two-layer opt-in system:
+
+1.  **Infrastructure layer (admin):** ResourceFlavors must reference a `Topology` via the `topologyName` field to *enable* TAS capability. Without this, workloads cannot use TAS even if they request it. Auto-discovered flavors (when `dynamicallyUpdateDefaultClusterQueue` is enabled) automatically reference the default topology.
+2.  **Workload layer (user):** Individual workloads opt in to TAS by setting `preferredTopologyLabel` or `requiredTopologyLabel` in their spec. If neither is set, the workload is scheduled normally without topology constraints, even if the underlying flavor supports TAS.
+
+**Configuration:**
+
+1.  Define a `Topology` in `spec.topologies` with the appropriate hierarchy of node labels.
+2.  Reference that topology in your `ResourceFlavor` via `topologyName`.
+3.  Ensure the nodes in the cluster are labeled with the topology labels (e.g., `kaiwo/topology-rack`, `kaiwo/topology-block`, `kubernetes.io/hostname`).
+
+Users then activate TAS on individual workloads by setting `preferredTopologyLabel` or `requiredTopologyLabel`. See the [Scheduling guide](/scientist/scheduling#topology-aware-scheduling-tas) for workload-level configuration.
+
+!!! note "Default Topology"
+    The default topology name is configured in `KaiwoConfig` via `spec.defaultTopologyName` (defaults to `default-topology`). Auto-generated ResourceFlavors always reference this topology. The operator also creates this default topology with levels `kaiwo/topology-block`, `kaiwo/topology-rack`, and `kubernetes.io/hostname`.
+
 ## KaiwoConfig CRD
 
 The Kaiwo Operator's runtime configuration is managed through the `KaiwoConfig` Custom Resource Definition (CRD). This approach allows Kubernetes administrators to dynamically adjust operator behavior without requiring a restart. The operator always retrieves the most recent configuration values during each reconcile loop.

diff --git a/docs/docs/reference/crds/config.kaiwo.silogen.ai.md b/docs/docs/reference/crds/config.kaiwo.silogen.ai.md
@@ -74,7 +74,7 @@ _Appears in:_
 | `defaultClusterQueueName` _string_ | DefaultClusterQueueName is the name of the default cluster queue that is used for workloads that don't explicitly specify a cluster queue. | kaiwo |  |
 | `defaultClusterQueueCohortName` _string_ | DefaultClusterQueueCohortName is the name of the default cohort that is used for the default cluster queue.<br />ClusterQueues in the same cohort can share resources. | kaiwo |  |
 | `dynamicallyUpdateDefaultClusterQueue` _boolean_ | DynamicallyUpdateDefaultClusterQueue defines whether the Kaiwo operator should dynamically update default "kaiwo" clusterqueue.<br />If set to true, the operator will make sure that the default clusterqueue is always up to date and reflects total resources available.<br />If nodes are added or removed, the operator will update the default clusterqueue to reflect the current state of the cluster. | false |  |
-| `defaultTopologyName` _string_ | DefaultTopologyName is the name of the default Kueue Topology used for Topology Aware Scheduling.<br />Auto-generated ResourceFlavors reference this topology when DynamicallyUpdateDefaultClusterQueue is enabled. | default-topology |  |
+| `defaultTopologyName` _string_ | DefaultTopologyName is the name of the default Kueue Topology used for Topology Aware Scheduling.<br />Auto-generated ResourceFlavors reference this topology to enable TAS capability.<br />Workloads opt in to TAS by setting preferredTopologyLabel or requiredTopologyLabel. | default-topology |  |
 
 
 #### KaiwoGpuPreemptionConfig