Skip to content

Releases: NVIDIA/aicr

v0.13.0

16 May 00:18
Immutable release. Only release title and notes can be modified.
v0.13.0
134b23d

Choose a tag to compare

This release focuses on scaling out our recipe matrix, evidence-based recipe validation, additional deployer targets, and hardened component supply chain.

Highlights

Recipe Evidence — New capability to capture evidence during cluster validation allows users and contributors alike to verify that the recipe actually deployed and delivered the expected performance characteristics without access to the validating cluster. aicr validate now emits a Recipe Evidence v1 bundle, and a new aicr evidence verify command validates that evidence from either a local directory or a signed OCI image. This new capability closes the loop between recipe authorship, deployment, and audit.

New Deployers — The bundler command now supports Helmfile and Flux alongside the existing Argo CD and raw-Helm targets. AICR also adds a URL-portable argocd-helm bundle option so users can apply a single manifest without local chart access. Helm vendoring is also supported for air-gapped environments (option for image mirroring is still coming — see NVIDIA/aicr#743).

Overlays & Components

  • Added deployment validation to EKS GB200
  • Added Slinky platform support with Slurm operator
  • Added Talos Linux support via new os-talos mixin and bundler preManifestFiles
  • Updated AKS H100 Dynamo to match working cluster state
  • Migrated GB200 kernel-module-params to preManifestFiles
  • Fixed AKS H100 RDMA network operator dependency and metrics

Other Improvements

  • New doc site is now live at docs.nvidia.com/aicr with per-release versioning
  • diff command to help detect configuration drift between recipes and live state
  • Unified file-based config across snapshot, recipe, bundle, and validate to enable easier reproducibility
  • Reliable cluster identity based on snapshot measurements to enable easier over-time correlation
  • storage-class support on bundle command for registry-driven storage-class injection

Supply Chain — New CycloneDX 1.6 BOM generator publishes a per-recipe container image inventory as an in-repo artifact, with strict validation that rejects bare scalar image references missing a tag, digest, or registry host. A growing number of component chart versions now also explicitly digest-pin image references.

Thanks to @ayuskauskas, @dims, @dtzar, @faganihajizada, @haarchri, @Jont828, @lockwobr, @njhensley, @pdmack, @sanjeevrg89, @xdu31, @yuanchen8911, and @mchmarny.

Changelog

New Features

  • (tools) Add install-rc helper for latest RC binary by @mchmarny
  • (cli) Add --config support to snapshot command by @mchmarny
  • (recipes) Update AKS H100 Dynamo recipe to match working cluster state by @Jont828
  • (bom) Add CycloneDX 1.6 image BOM generator by @mchmarny
  • (ci) Add self-hosted renovate alongside dependabot by @njhensley
  • (recipes) Pin nfd and k8s-ephemeral-storage-metrics chart versions by @mchmarny
  • (bom) Publish container image inventory as a doc artifact by @mchmarny
  • (bundler) Add --storage-class flag for registry-driven injection by @dtzar
  • (recipes) Pin chart versions for NVIDIA-owned components (#748 Phase B) by @mchmarny
  • (recipes) Digest-pin explicit image references by @mchmarny
  • (cli) Unified --config flag for recipe and bundle by @mchmarny
  • (tools) Add s3c supply-chain presence checker by @mchmarny
  • (bundler) URL-portable argocd-helm bundle (#664, #665) by @lockwobr
  • (docs) Add versioned docs dropdown with CI content pinning by @pdmack
  • (tools) Add local Talos cluster + snapshot chainsaw test by @ayuskauskas
  • (fingerprint) Cluster identity projection from snapshot measurements by @njhensley
  • Add support for helm vendoring by @lockwobr
  • (oci) Expose URIScheme constant and Ensure/TrimScheme helpers by @njhensley
  • (cli) Add aicr diff for configuration drift detection by @sanjeevrg89
  • (config) Aicr validate --config support by @njhensley
  • (validator) Apply hybrid resource pattern to ValidatorCatalog by @xdu31
  • (recipe) Extract Validation as standalone type with hybrid resource pattern by @xdu31
  • Os-talos mixin + bundler preManifestFiles support by @ayuskauskas
  • (flux) Add bundle flux option by @haarchri
  • (evidence) Emit Recipe Evidence v1 bundle from aicr validate by @njhensley
  • (evidence) Aicr evidence verify (directory input) by @njhensley
  • (evidence) Aicr evidence verify (signed OCI bundles) by @njhensley
  • (recipes) Add deployment validation to GB200/EKS recipes by @njhensley
  • (bundler) Add helmfile deployer by @lockwobr
  • (recipes) Add Slinky slurm-operator as platform-slurm by @faganihajizada

Bug Fixes

  • (validator) Accept pre-release tags as release versions by @mchmarny
  • (bundler) Synthesize GKE ResourceQuota for critical-priority pods by @mchmarny
  • (bundler) Split helmfile bundle into CRD + main sub-helmfiles by @mchmarny
  • (bundler) Wire PreManifestFiles through flux deployer with terminal-aware dependsOn by @yuanchen8911
  • (bundler) Carry localformat createNamespace into helmfile.yaml by @yuanchen8911
  • (ci) Harden Fern docs CI and configure custom domain by @pdmack
  • (docs) Replace bare angle-bracket URL that breaks MDX parser by @pdmack
  • (recipes) Fully-qualify image refs in component manifests by @mchmarny
  • AKS H100 RDMA sets network operator as dependency and fix chart values/metrics by @Jont828
  • (recipes) Document aws-efa regional ECR override pattern by @mchmarny
  • (bom) Reject bare scalars without tag, digest, or registry host by @mchmarny
  • (validators) Bump aiperf-bench to python:3.13 to clear CVEs by @mchmarny
  • (recipes) Track nri-device-injector by tag, ignore tcpxo image by @njhensley
  • (api) Sync OpenAPI platform enum with Go criteria type by @mchmarny
  • (bundler) Suppress kubectl auth prompt in undeploy.sh post-flight by @mchmarny
  • (fern) Drop https scheme from instances URL by @pdmack
  • (recipes) Migrate GB200 kernel-module-params to preManifestFiles by @mchmarny
  • (validator) Write ValidationInput wire shape to ConfigMap by @njhensley
  • (validator) Make ExtractResult sidecar-safe by reading 'validator' container explicitly by @xdu31
  • (validator) Per-run RBAC names to prevent concurrent-run races by @yuanchen8911
  • (evidence) Fix a regression in cncf ai conformance evidence collection by @yuanchen8911
  • (ci) Populate frozen version content in preview build and surface fern errors by @pdmack
  • (validator) Surface skip reason in CTRF, treat missing constraint as skip by @ayuskauskas
  • fix(bundler) stratify helmfile bundle by DAG level by @lockwobr
  • (recipes) Fix stale kgateway-crds path in slinky-slurm-operator-crds comment by @yuanchen8911
  • (recipes) Align overlay network-operator pins to v26.1.1 by @yuanchen8911

Other Tasks

  • (demos) Add config-driven GKE CUJ with evidence verify by @mchmarny
  • Add top level THIRD_PARTY_NOTICES by @ayuskauskas
  • (bom) Wrap auto-generated image inventory with hand-written prose by @mchmarny
  • (recipes) Enforce sha256 specifically in digest-pin gate (CodeRabbit follow-up to #778) by @mchmarny
  • (adr) Add ADR-006 container image pinning policy by @mchmarny
  • (go) .go-version as single source of truth for Go toolchain by @mchmarny
  • (renovate) Hand workflow bumps to dependabot, disable dashboard by @njhensley
  • Update copyright headers to NVIDIA CORPORATION & AFFILIATES by @ayuskauskas
  • Update golang version by @lockwobr
  • (design) Add ADR-007 verifiable recipe test evidence by @njhensley
  • (tests) Use host aicr binary in snapshot deploy-agent test by @pdmack
  • (design) Add ADR-008 KWOK CI deployer matrix ...
Read more

v0.12.1

01 May 17:13
Immutable release. Only release title and notes can be modified.
v0.12.1
eec81c5

Choose a tag to compare

Changelog

New Features

Bug Fixes

Other Tasks

v0.12.0

24 Apr 23:31
Immutable release. Only release title and notes can be modified.
v0.12.0
7db4275

Choose a tag to compare

Changelog

New Features

Bug Fixes

Read more

v0.11.1

21 Mar 10:22
Immutable release. Only release title and notes can be modified.
v0.11.1
bc05c6b

Choose a tag to compare

Changelog

New Features

Bug Fixes

Other Tasks

v0.11.0

20 Mar 18:41
Immutable release. Only release title and notes can be modified.
v0.11.0
15d9554

Choose a tag to compare

Changelog

New Features

Bug Fixes

Other Tasks

v0.10.16

16 Mar 18:21
Immutable release. Only release title and notes can be modified.
v0.10.16
e07c9e6

Choose a tag to compare

Changelog

Bug Fixes

Other Tasks

  • 06c7428: refactor(validator): unify GKE NCCL to TrainJob+MPI, match EKS pattern (#403) (@xdu31)

v0.10.15

13 Mar 16:23
Immutable release. Only release title and notes can be modified.
v0.10.15
482dc78

Choose a tag to compare

Changelog

Other Tasks

v0.10.14

13 Mar 12:22
Immutable release. Only release title and notes can be modified.
v0.10.14
6c5b24e

Choose a tag to compare

Changelog

Bug Fixes

Other Tasks

v0.10.13

13 Mar 00:44
Immutable release. Only release title and notes can be modified.
v0.10.13
5ec8017

Choose a tag to compare

Changelog

New Features

Bug Fixes

  • a5d501b: fix(bundler): skip components with overrides.enabled: false (#382) (@xdu31)
  • 8550939: fix(install): cosign version grep fails silently due to pipefail (#384) (@lockwobr)
  • d802b3d: fix(test): update offline e2e to skip disabled aws-ebs-csi-driver (@mchmarny)
  • 9bb2c7b: fix(validator): remove helm-values check (Helm values stored in secrets, never available in snapshot) (#388) (@xdu31)

Other Tasks

v0.10.12

12 Mar 12:18
Immutable release. Only release title and notes can be modified.
v0.10.12
3c7f155

Choose a tag to compare

Changelog

Bug Fixes