Skip to content

chore(docs): catch up container-images.md BOM + document the regen rule#878

Merged
mchmarny merged 1 commit into
NVIDIA:mainfrom
yuanchen8911:chore/bom-docs-catchup
May 14, 2026
Merged

chore(docs): catch up container-images.md BOM + document the regen rule#878
mchmarny merged 1 commit into
NVIDIA:mainfrom
yuanchen8911:chore/bom-docs-catchup

Conversation

@yuanchen8911
Copy link
Copy Markdown
Contributor

@yuanchen8911 yuanchen8911 commented May 14, 2026

Summary

Two coupled changes, surfaced by Mark's review on #872:

  1. Pure BOM regen of docs/user/container-images.md via make bom-docs to reflect current registry + chart state.
  2. Document the BOM-regen rule in contributor + agent docs so this kind of drift doesn't accumulate silently.

Motivation / Context

Surfaced while rebasing #872 (aws-efa v0.5.26 bump) — running make bom-docs on the rebased branch pulled in three categories of drift that have accumulated since main's last BOM commit:

Change Cause
slinky-slurm-operator + slinky-slurm-operator-crds BOM rows added Registry entries landed in #866 without a corresponding BOM regen
pytorch image bump (kubeflow-trainer's bundled image) Upstream chart bumped the image inside its templates; we pin the chart version, not the image inside it
busybox image bump Same shape — chart-rendered template content shifted upstream

The fact that this accumulated without anyone noticing is itself the signal — the existing docs don't surface the regen requirement clearly enough. So this PR both lands the catch-up and updates the guidance.

Related: #866 (slinky-slurm registry entry), #872 (parent PR that triggered the discovery)

Type of Change

  • Documentation update

Component(s) Affected

  • Docs/examples — docs/user/container-images.md (auto-generated section), docs/contributor/data.md, .claude/CLAUDE.md (auto-synced to AGENTS.md)

Implementation Notes

File 1: docs/user/container-images.md — pure make bom-docs regen against current registry + chart contents. The tool preserves the prose section; only the auto-generated section changes.

File 2: .claude/CLAUDE.md (auto-synced to AGENTS.md) — one-line rule under the "Common Tasks" section: after any change to recipes/registry.yaml, a component values file, or a chart pin (registry / overlay / mixin), run make bom-docs and commit the regenerated docs/user/container-images.md in the same PR. Notes that the BOM is rendered fresh from each chart's actual templates, so even an unbumped pin can pick up upstream image drift — and that make bom-check exists for verifying freshness but is opt-in only (not wired into make qualify, make lint, or the merge gate today).

File 3: docs/contributor/data.md — new "Regenerating the BOM" subsection under "Component Configuration" covering:

  • When to run make bom-docs (registry add/remove, chart-pin bump, values-file change)
  • How the regen can surface unrelated upstream chart drift
  • How to handle unrelated drift in a PR (land it together or split into a catch-up PR like this one)
  • The enforcement story: make bom-check exists but is opt-in; neither make qualify nor make lint runs it, and the merge gate has no PR-time BOM-staleness check — contributors must run make bom-docs explicitly

Honest enforcement story. An earlier revision of these docs claimed make qualify and CI catch stale BOMs. That was incorrect (caught by external review). Verified directly against the Makefile:

  • qualify: test-coverage lint e2e scan license-check — does not depend on bom-check
  • lint runs bom-pinning-check (chart-pin verification per ADR-006), not bom-check (BOM doc freshness)
  • The merge gate has no equivalent

The corrected docs now say so explicitly. Wiring bom-check into make qualify / make lint / the merge gate is a desirable follow-up but intentionally out of scope here — it would change CI behavior for every PR and likely block unrelated PRs that happen to expose accumulated upstream drift, which deserves its own discussion. (Also flagged: Makefile:294's comment on the bom-check target still calls it "CI gate, opt-in locally" — corrected in this PR.)

Testing

make qualify   # Go tests, golangci-lint + yamllint, agents-sync check,
               # chart-pin verification (bom-pinning-check), 20/20 chainsaw,
               # vulnerability scan, license headers — all green

No code changes, no chart pins moved, no behavior change.

Risk Assessment

  • Low — Pure doc regen + doc additions. No registry pins moved, no Go code, no chart contents touched. The BOM diff is provably equivalent to running make bom-docs against current main; the rule additions are non-binding text.

Rollout notes: None — pure content catch-up.

Checklist

  • Tests pass locally (make qualify)
  • Linter passes (make lint)
  • I did not skip/disable tests to make CI green
  • I added/updated tests for new functionality (n/a — doc-only)
  • I updated docs if user-facing behavior changed (this PR is the doc update)
  • Changes follow existing patterns in the codebase
  • Commits are cryptographically signed (git commit -S)

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 14, 2026

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: 2b9c1bd6-4e27-41ac-9cef-948906fd6c86

📥 Commits

Reviewing files that changed from the base of the PR and between 78288d4 and b6343b7.

📒 Files selected for processing (5)
  • .claude/CLAUDE.md
  • AGENTS.md
  • Makefile
  • docs/contributor/data.md
  • docs/user/container-images.md

📝 Walkthrough

Walkthrough

The PR adds contributor-facing instructions to regenerate the container image BOM and regenerates the BOM documentation. docs/contributor/data.md, AGENTS.md, and .claude/CLAUDE.md now require running make bom-docs and committing docs/user/container-images.md when recipes/registry.yaml, component values, or chart pins change. The regenerated docs/user/container-images.md increases the BOM counts to 24 components and 71 images, adds slinky-slurm-operator and slinky-slurm-operator-crds, updates kubeflow-trainer’s pytorch to 2.11.0, and updates network-operator’s busybox to 1.37.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main changes: BOM catch-up regen and documentation of the regeneration rule.
Description check ✅ Passed The description is comprehensive and directly related to the changeset, clearly explaining the two coupled changes and their motivations.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@yuanchen8911 yuanchen8911 force-pushed the chore/bom-docs-catchup branch from 5bdbaad to 8f9a6c9 Compare May 14, 2026 01:15
@yuanchen8911 yuanchen8911 changed the title chore(docs): catch up container-images.md BOM against current state chore(docs): catch up container-images.md BOM + document the regen rule May 14, 2026
@yuanchen8911 yuanchen8911 requested a review from mchmarny May 14, 2026 01:25
@yuanchen8911 yuanchen8911 force-pushed the chore/bom-docs-catchup branch 2 times, most recently from 78288d4 to b111fc7 Compare May 14, 2026 01:37
@yuanchen8911 yuanchen8911 requested a review from a team as a code owner May 14, 2026 01:37
Two coupled changes:

1. Pure BOM regen of docs/user/container-images.md via `make bom-docs`
   to reflect current registry + chart state. No code changes, no
   registry pins moved; this picks up drift accumulated since main's
   last BOM commit:
   - slinky-slurm-operator and slinky-slurm-operator-crds BOM rows
     added (registry entries landed in NVIDIA#866 without a corresponding
     BOM regen)
   - kubeflow-trainer's bundled pytorch image bumped upstream
   - busybox bumped upstream in some chart's rendered templates

2. Add the BOM-regen rule to contributor and agent docs, surfaced by
   the same NVIDIA#872 review that flagged the drift:
   - .claude/CLAUDE.md (auto-synced to AGENTS.md) — short rule in the
     "Common Tasks" section: after any change to recipes/registry.yaml,
     a component values file, or a chart pin, run `make bom-docs` and
     commit the regenerated container-images.md in the same PR
   - docs/contributor/data.md — new "Regenerating the BOM" subsection
     under Component Configuration covering when to run it, how it can
     surface upstream chart drift, and how to handle unrelated drift
     (split into a catch-up PR vs land it together)
   - Makefile — corrected the bom-check target's help text from
     "CI gate, opt-in locally" to "opt-in; not wired into qualify/
     lint/merge gate" to match the actual enforcement story

Surfaced as an unrelated diff in NVIDIA#872 (aws-efa v0.5.26 bump) when
`make bom-docs` was rerun against the rebased branch. Splitting this
catch-up out so NVIDIA#872's diff is scoped to aws-efa content.

Honest enforcement story: an earlier revision of these docs (and an
earlier revision of this commit message) claimed `make qualify` and
CI catch stale BOMs. That was incorrect — verified directly against
the Makefile:
- `qualify` depends on test-coverage / lint / e2e / scan / license-
  check, none of which run `bom-check`
- `lint` runs `bom-pinning-check` (chart-pin verification per ADR-006),
  not `bom-check` (BOM doc freshness)
- The merge gate has no PR-time BOM-staleness check
The corrected docs say so explicitly. Wiring `bom-check` into the gate
is a desirable follow-up but intentionally out of scope here — it
would change CI behavior for every PR and likely block unrelated PRs
that happen to expose accumulated upstream drift, which deserves its
own discussion.

This also explains how NVIDIA#866 merged with a stale BOM: nothing in the
existing gate would have caught it.

Test plan:
- `make qualify` passes (Go tests, golangci-lint + yamllint, agents-
  sync check, chart-pin verification, 20/20 chainsaw, vulnerability
  scan, license headers)
- No code changes, no chart pins moved, no behavior change
@yuanchen8911 yuanchen8911 force-pushed the chore/bom-docs-catchup branch from b111fc7 to b6343b7 Compare May 14, 2026 01:38
@yuanchen8911 yuanchen8911 requested a review from lockwobr May 14, 2026 01:40
@mchmarny mchmarny merged commit 6557e0c into NVIDIA:main May 14, 2026
34 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants