fix(ci): centralize GPU CI runtime pins#710
Conversation
📝 WalkthroughWalkthroughThis change refactors how configuration values are passed through GitHub Actions workflows. The Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes 🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
mchmarny
left a comment
There was a problem hiding this comment.
Clean follow-up to #694. Centralizes the two GPU CI pins through load-versions while keeping .settings.yaml as the source. The optional action inputs preserve override flexibility, the env-based pass-through is consistent with how other pinned values flow today, and the script-side guards correctly handle both empty and literal "null" from yq. End-to-end signal is strong: GPU smoke, H100 inference, and H100 training all green on this commit. LGTM.
Summary
Centralizes the GPU CI runtime pins added in #694 behind the shared
load-versionsaction. The GPU Operator chart version and snapshot-agent CUDA image remain in.settings.yaml, and the consuming scripts now receive those values from composite-action inputs/env instead of reading.settings.yamldirectly.Motivation / Context
This is a small follow-up to merged PR #694. In review, we moved the GPU Operator chart version and snapshot-agent CUDA image into
.settings.yaml; this PR finishes that cleanup by making those pins follow the sameload-versionspath as other CI tool/image pins.This keeps
.settings.yamlas the source and.github/actions/load-versionsas the shared read path.Fixes: N/A
Related: #694
Type of Change
Component(s) Affected
cmd/aicr,pkg/cli)cmd/aicrd,pkg/api,pkg/server)pkg/recipe)pkg/bundler,pkg/component/*)pkg/collector,pkg/snapshotter)pkg/validator)pkg/errors,pkg/k8s)docs/,examples/).settings.yamlImplementation Notes
gpu_operator_chart_versionandsnapshot_agent_cuda_imageoutputs to.github/actions/load-versions.runtime-installto load the GPU Operator chart version throughload-versionsfor Helm-mode installs and pass it toinstall-gpu-operator-helm.shviaGPU_OPERATOR_CHART_VERSION.aicr-buildto load the snapshot-agent CUDA image throughload-versionswhenbuild_snapshot_agent=trueand pass it tobuild-snapshot-agent.shviaSNAPSHOT_AGENT_CUDA_IMAGE.Testing
Scoped CI checks passed locally. Full
make qualifywas not run because this is a CI-only composite-action wiring change with no Go, recipe, or user-facing behavior changes.Risk Assessment
Rollout notes: Existing GPU CI callers keep using the same pinned values. The scripts now fail clearly if their parent action does not provide the required env value.
Checklist
make testwith-race)make lint)git commit -S) — GPG signing info