From b6343b77a6b8bb98aab21900f985357960a7bb8e Mon Sep 17 00:00:00 2001 From: Yuan Chen Date: Wed, 13 May 2026 17:58:03 -0700 Subject: [PATCH] chore(docs): catch up container-images.md BOM + document the regen rule MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Two coupled changes: 1. Pure BOM regen of docs/user/container-images.md via `make bom-docs` to reflect current registry + chart state. No code changes, no registry pins moved; this picks up drift accumulated since main's last BOM commit: - slinky-slurm-operator and slinky-slurm-operator-crds BOM rows added (registry entries landed in #866 without a corresponding BOM regen) - kubeflow-trainer's bundled pytorch image bumped upstream - busybox bumped upstream in some chart's rendered templates 2. Add the BOM-regen rule to contributor and agent docs, surfaced by the same #872 review that flagged the drift: - .claude/CLAUDE.md (auto-synced to AGENTS.md) — short rule in the "Common Tasks" section: after any change to recipes/registry.yaml, a component values file, or a chart pin, run `make bom-docs` and commit the regenerated container-images.md in the same PR - docs/contributor/data.md — new "Regenerating the BOM" subsection under Component Configuration covering when to run it, how it can surface upstream chart drift, and how to handle unrelated drift (split into a catch-up PR vs land it together) - Makefile — corrected the bom-check target's help text from "CI gate, opt-in locally" to "opt-in; not wired into qualify/ lint/merge gate" to match the actual enforcement story Surfaced as an unrelated diff in #872 (aws-efa v0.5.26 bump) when `make bom-docs` was rerun against the rebased branch. Splitting this catch-up out so #872's diff is scoped to aws-efa content. Honest enforcement story: an earlier revision of these docs (and an earlier revision of this commit message) claimed `make qualify` and CI catch stale BOMs. That was incorrect — verified directly against the Makefile: - `qualify` depends on test-coverage / lint / e2e / scan / license- check, none of which run `bom-check` - `lint` runs `bom-pinning-check` (chart-pin verification per ADR-006), not `bom-check` (BOM doc freshness) - The merge gate has no PR-time BOM-staleness check The corrected docs say so explicitly. Wiring `bom-check` into the gate is a desirable follow-up but intentionally out of scope here — it would change CI behavior for every PR and likely block unrelated PRs that happen to expose accumulated upstream drift, which deserves its own discussion. This also explains how #866 merged with a stale BOM: nothing in the existing gate would have caught it. Test plan: - `make qualify` passes (Go tests, golangci-lint + yamllint, agents- sync check, chart-pin verification, 20/20 chainsaw, vulnerability scan, license headers) - No code changes, no chart pins moved, no behavior change --- .claude/CLAUDE.md | 2 ++ AGENTS.md | 2 ++ Makefile | 2 +- docs/contributor/data.md | 14 ++++++++++++++ docs/user/container-images.md | 19 +++++++++++++++---- 5 files changed, 34 insertions(+), 5 deletions(-) diff --git a/.claude/CLAUDE.md b/.claude/CLAUDE.md index 9f9e5fb1b..72a68d8ec 100644 --- a/.claude/CLAUDE.md +++ b/.claude/CLAUDE.md @@ -267,6 +267,8 @@ slog.Error("operation failed", "error", err, "component", "gpu-collector") **Note:** A component must have either `helm` OR `kustomize` configuration, not both. +**After any change to `recipes/registry.yaml`, a component's values file, or a chart version pin (in registry, overlay, or mixin):** run `make bom-docs` and commit the regenerated `docs/user/container-images.md` in the same PR. The BOM is rendered fresh from each Helm chart's actual templates, so an unbumped pin can still pick up upstream image drift — running it locally is the only reliable way to know whether the doc needs an update. `make bom-check` verifies the committed BOM matches a fresh regen, but it is **opt-in only** — not wired into `make qualify`, `make lint`, or the merge gate today. Do not rely on either to catch a missed regen. + **Using mixins for shared OS/platform content:** ```yaml # Leaf overlay referencing mixins instead of duplicating content diff --git a/AGENTS.md b/AGENTS.md index 4126eec50..beadd6843 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -267,6 +267,8 @@ slog.Error("operation failed", "error", err, "component", "gpu-collector") **Note:** A component must have either `helm` OR `kustomize` configuration, not both. +**After any change to `recipes/registry.yaml`, a component's values file, or a chart version pin (in registry, overlay, or mixin):** run `make bom-docs` and commit the regenerated `docs/user/container-images.md` in the same PR. The BOM is rendered fresh from each Helm chart's actual templates, so an unbumped pin can still pick up upstream image drift — running it locally is the only reliable way to know whether the doc needs an update. `make bom-check` verifies the committed BOM matches a fresh regen, but it is **opt-in only** — not wired into `make qualify`, `make lint`, or the merge gate today. Do not rely on either to catch a missed regen. + **Using mixins for shared OS/platform content:** ```yaml # Leaf overlay referencing mixins instead of duplicating content diff --git a/Makefile b/Makefile index 6de95b1a7..f48c50369 100644 --- a/Makefile +++ b/Makefile @@ -291,7 +291,7 @@ bom-docs: ## Regenerates the auto-generated section of $(BOM_DOC_PATH) from the echo "Updated $(BOM_DOC_PATH) (prose preserved, auto-generated section refreshed)" .PHONY: bom-check -bom-check: ## Verifies $(BOM_DOC_PATH) is up to date with the live registry (CI gate, opt-in locally) +bom-check: ## Verifies $(BOM_DOC_PATH) is up to date with the live registry (opt-in; not wired into qualify/lint/merge gate) @set -e; \ $(MAKE) bom-docs; \ if ! git diff --quiet -- $(BOM_DOC_PATH); then \ diff --git a/docs/contributor/data.md b/docs/contributor/data.md index fc89986f3..3d38999a9 100644 --- a/docs/contributor/data.md +++ b/docs/contributor/data.md @@ -613,6 +613,20 @@ componentRefs: The system performs **topological sort** to compute deployment order, ensuring dependencies are deployed before dependents. The resulting order is exposed in `RecipeResult.DeploymentOrder`. +### Regenerating the BOM + +`docs/user/container-images.md` is an auto-generated bill of materials listing every container image AICR pulls across all components. It is regenerated by `make bom-docs`, which renders each Helm chart against its live OCI source and extracts image references from the rendered templates. + +**Run `make bom-docs` and commit the regenerated `docs/user/container-images.md` in the same PR whenever you:** + +- Add or remove a component in `recipes/registry.yaml` +- Bump a chart version (in `registry.yaml`, an overlay, or a mixin) +- Modify a component's `values.yaml` in a way that changes which images render (image repo override, subchart enable/disable, etc.) + +The regen can also surface drift from *upstream* chart updates — when a chart bumps an image inside its own templates without a registry pin change on our side. That drift will appear in the BOM diff whether you expected it or not. Land it as part of the same PR that triggered the regen, or split it out as a separate "BOM catch-up" PR if the unrelated diff would obscure the primary change. + +**Freshness is not gated.** `make bom-check` verifies the committed `docs/user/container-images.md` matches a fresh regen, but it is opt-in — neither `make qualify` nor `make lint` runs it today, and the merge gate has no PR-time BOM-staleness check (it only runs `bom-pinning-check`, which is the chart-pin verification per ADR-006). Run `make bom-docs` explicitly whenever you touch a component; do not rely on local qualify or CI to catch a missed regen. Wiring `bom-check` into the gate is a desirable follow-up. + ## Criteria Matching Algorithm The recipe system uses an **asymmetric rule matching algorithm** where recipe criteria (rules) match against user queries (candidates). diff --git a/docs/user/container-images.md b/docs/user/container-images.md index ac5e46c2e..6cb3e74cb 100644 --- a/docs/user/container-images.md +++ b/docs/user/container-images.md @@ -19,8 +19,8 @@ A machine-readable **CycloneDX 1.6 JSON** companion to this page is produced by ## Summary -- Components: **22** -- Unique images: **69** +- Components: **24** +- Unique images: **71** - Distinct registries: **11** Registries: `602401143452.dkr.ecr.us-west-2.amazonaws.com`, `cr.kgateway.dev`, `docker.io`, `gcr.io`, `ghcr.io`, `gke.gcr.io`, `nvcr.io`, `public.ecr.aws`, `quay.io`, `registry.k8s.io`, `us-docker.pkg.dev` @@ -51,6 +51,8 @@ Registries: `602401143452.dkr.ecr.us-west-2.amazonaws.com`, `cr.kgateway.dev`, ` | nvidia-dra-driver-gpu | helm | nvidia/nvidia-dra-driver-gpu | 25.12.0 | 1 | | nvsentinel | helm | nvsentinel | v1.3.0 | 6 | | prometheus-adapter | helm | prometheus-community/prometheus-adapter | 5.3.0 | 1 | +| slinky-slurm-operator | helm | slurm-operator | 1.1.0 | 2 | +| slinky-slurm-operator-crds | helm | slurm-operator-crds | 1.1.0 | 0 | ## Images by component @@ -141,7 +143,7 @@ _No images extracted._ ### kubeflow-trainer - `ghcr.io/kubeflow/trainer/trainer-controller-manager:v2.2.0` -- `pytorch/pytorch:2.9.1-cuda12.8-cudnn9-runtime@sha256:7b324d212a4450795b49edba9949b7cdc72429148a64e974334bfe5774d51385` +- `pytorch/pytorch:2.11.0-cuda12.8-cudnn9-runtime@sha256:eee11b3b3872a8c838e35ef48f08b2d5def2080902c7f666831310ca1a0ef2be` - `registry.k8s.io/jobset/jobset:v0.11.0` ### kueue @@ -150,7 +152,7 @@ _No images extracted._ ### network-operator -- `busybox:1.36@sha256:73aaf090f3d85aa34ee199857f03fa3a95c8ede2ffd4cc2cdb5b94e566b11662` +- `busybox:1.37@sha256:1487d0af5f52b4ba31c7e465126ee2123fe3f2305d638e7827681e7cf6c83d5e` - `nvcr.io/nvidia/cloud-native/network-operator:v26.1.1` - `nvcr.io/nvidia/doca/doca_telemetry:1.22.5-doca3.1.0-host` - `nvcr.io/nvidia/mellanox/doca-driver:doca3.2.0-25.10-1.2.8.0-2` @@ -190,6 +192,15 @@ _No images extracted._ - `registry.k8s.io/prometheus-adapter/prometheus-adapter:v0.12.0` +### slinky-slurm-operator + +- `ghcr.io/slinkyproject/slurm-operator-webhook:1.1.0` +- `ghcr.io/slinkyproject/slurm-operator:1.1.0` + +### slinky-slurm-operator-crds + +_No images extracted._ + ## How to read this list