Summary
Audit the pinned versions of every runtime component referenced by AICR recipes and update them to current stable upstream releases where appropriate.
Motivation
Component versions in recipes/registry.yaml, recipes/overlays/*.yaml, and recipes/components/*/values.yaml drift over time as upstream projects ship security fixes, bug fixes, and Kubernetes compatibility updates. Without a periodic refresh, AICR-generated bundles deploy stale versions that may miss CVE fixes or fail against newer Kubernetes versions.
Scope
Check and (where appropriate) update versions for every component declared in recipes/registry.yaml, including (non-exhaustive list — registry is authoritative):
- GPU stack:
gpu-operator, nvidia-device-plugin, dcgm-exporter
- Schedulers / CRDs:
kai-scheduler, kueue, volcano
- Inference:
dynamo-platform, kgateway
- Training:
kubeflow-trainer
- Observability:
kube-prometheus-stack, prometheus-adapter
- Platform:
nfd, cert-manager, node-feature-discovery
- Any other components in
recipes/registry.yaml not listed here
Also check overlay-level pins in recipes/overlays/*.yaml and any version constraints embedded in mixins (recipes/mixins/*.yaml).
Definition of done
Out of scope
- Adding new components (handled in separate issues).
- Upgrading recipes to a different Kubernetes minor (separate compatibility effort).
- Removing components.
Notes
- Compatibility constraint: AICR targets Kubernetes 1.33+. Verify each upgraded component supports this floor.
- Constraints in recipes (
spec.constraints[*].name = "K8s.server.version") should be re-evaluated when upgrading.
- For Helm-installed components, confirm the upstream chart version matches the app version expected by
nodeScheduling and any value overrides we set.
Summary
Audit the pinned versions of every runtime component referenced by AICR recipes and update them to current stable upstream releases where appropriate.
Motivation
Component versions in
recipes/registry.yaml,recipes/overlays/*.yaml, andrecipes/components/*/values.yamldrift over time as upstream projects ship security fixes, bug fixes, and Kubernetes compatibility updates. Without a periodic refresh, AICR-generated bundles deploy stale versions that may miss CVE fixes or fail against newer Kubernetes versions.Scope
Check and (where appropriate) update versions for every component declared in
recipes/registry.yaml, including (non-exhaustive list — registry is authoritative):gpu-operator,nvidia-device-plugin,dcgm-exporterkai-scheduler,kueue,volcanodynamo-platform,kgatewaykubeflow-trainerkube-prometheus-stack,prometheus-adapternfd,cert-manager,node-feature-discoveryrecipes/registry.yamlnot listed hereAlso check overlay-level pins in
recipes/overlays/*.yamland any version constraints embedded in mixins (recipes/mixins/*.yaml).Definition of done
recipes/registry.yamlcomments or a short note indocs/contributor/data.md).make qualifypasses locally on the updated recipes.make kwok-test-all).docs/user/component-catalog.mdis regenerated/updated if any displayName, repository, or default chart changes.Out of scope
Notes
spec.constraints[*].name = "K8s.server.version") should be re-evaluated when upgrading.nodeSchedulingand any value overrides we set.