Skip to content

fix(recipes): correct nvsentinel registry default to OCI source#725

Merged
yuanchen8911 merged 1 commit into
NVIDIA:mainfrom
yuanchen8911:chore/registry-nvsentinel-oci-default
Apr 30, 2026
Merged

fix(recipes): correct nvsentinel registry default to OCI source#725
yuanchen8911 merged 1 commit into
NVIDIA:mainfrom
yuanchen8911:chore/registry-nvsentinel-oci-default

Conversation

@yuanchen8911
Copy link
Copy Markdown
Contributor

@yuanchen8911 yuanchen8911 commented Apr 30, 2026

Summary

The nvsentinel entry in recipes/registry.yaml declares a defaultRepository (https://helm.ngc.nvidia.com/nvidia) and defaultChart (nvidia/nvsentinel) that don't actually serve the chart — the HTTPS NGC index doesn't carry nvsentinel; only oci://ghcr.io/nvidia/nvsentinel does. This PR aligns the registry defaults with the canonical OCI source that every overlay already uses.

Motivation / Context

Surfaced during the #715 cross-review and tracked as Phase-1 follow-up item #2 on #698. The mismatch is silent / dead-code today because every nvsentinel-using overlay (base.yaml, etc.) explicitly sets source: oci://ghcr.io/nvidia + chart nvsentinel — the registry default never resolves. But anyone relying on the registry defaults (e.g. an aicr bundle invocation that doesn't override the source) would hit the dead path.

The fix mirrors the same pattern the kai-scheduler entry uses post-#720: OCI registry path in defaultRepository, bare chart name in defaultChart.

   - name: nvsentinel
     ...
     helm:
-      defaultRepository: https://helm.ngc.nvidia.com/nvidia
-      defaultChart: nvidia/nvsentinel
+      defaultRepository: oci://ghcr.io/nvidia
+      defaultChart: nvsentinel
       defaultVersion: v1.3.0
       defaultNamespace: nvsentinel

Note on other NGC HTTPS entries

gpu-operator, network-operator, nodewright-operator, and nvidia-dra-driver-gpu also use defaultRepository: https://helm.ngc.nvidia.com/... — those are intentionally unchanged. The HTTPS NGC index genuinely serves those charts. nvsentinel is the only NVIDIA-published component that ships exclusively via OCI, which is why its registry default was broken.

Fixes: nvsentinel registry-default dead path
Related: #698 (Phase-1 follow-up item #2), #715 (Phase 1), #720 (Phase 2)

Type of Change

  • Bug fix (non-breaking change that fixes an issue)

Component(s) Affected

  • Recipe engine / data (pkg/recipe) — registry data only

Implementation Notes

  • 2-line registry data fix. No code, no overlays, no values.
  • Matches the kai-scheduler post-chore(recipes): bump kai-scheduler v0.14.1 and kubeflow-trainer 2.2.0 #720 pattern (OCI registry + bare chart name).
  • Verified the OCI URL pulls cleanly: helm pull oci://ghcr.io/nvidia/nvsentinel --version v1.3.0 succeeds.
  • End-to-end bundle generation now produces CHART='oci://ghcr.io/nvidia/nvsentinel' with empty REPO in the per-component upstream.env (correct OCI form).

Testing

make lint                                      # 0 issues
go test -count=1 ./pkg/recipe/...              # ok
helm pull oci://ghcr.io/nvidia/nvsentinel \
         --version v1.3.0                      # pulls cleanly

# End-to-end bundle generation
$ aicr recipe --service eks --accelerator h100 --intent training --os ubuntu -o recipe.yaml
$ aicr bundle -r recipe.yaml -o /tmp/bundle
$ cat /tmp/bundle/*-nvsentinel/upstream.env
CHART='oci://ghcr.io/nvidia/nvsentinel'
REPO=''
VERSION='v1.3.0'

Risk Assessment

  • Low — Single-file, 2-line registry data correction. The new defaults match what every overlay already uses, so re-bundled artifacts are byte-identical to today's. No code, no overlays, no values surfaces touched.

Rollout notes: No migration steps. The broken default is dead code today (overlays override it), so the fix has no behavior impact on existing bundles. Fresh bundles where the user doesn't override the source via overlay will now resolve correctly instead of failing.

Checklist

  • Tests pass locally
  • Linter passes (make lint)
  • I did not skip/disable tests to make CI green
  • I added/updated tests for new functionality (N/A — registry data fix matching existing overlay convention)
  • I updated docs if user-facing behavior changed (N/A — internal data fix)
  • Changes follow existing patterns in the codebase (matches kai-scheduler post-chore(recipes): bump kai-scheduler v0.14.1 and kubeflow-trainer 2.2.0 #720)
  • Commits are cryptographically signed (git commit -S)

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 30, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: 5577a3cf-4f2f-4108-825f-49bcf3259d29

📥 Commits

Reviewing files that changed from the base of the PR and between 5e8ca09 and eb80519.

📒 Files selected for processing (1)
  • recipes/registry.yaml

📝 Walkthrough

Walkthrough

The recipes/registry.yaml file was updated to change the Helm defaults for the nvsentinel component: helm.defaultRepository was changed from https://helm.ngc.nvidia.com/nvidia to oci://ghcr.io/nvidia, and helm.defaultChart was changed from nvidia/nvsentinel to nvsentinel. defaultVersion: v1.3.0 and defaultNamespace: nvsentinel remain unchanged. No other component configuration parameters were modified.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: correcting the nvsentinel registry default to use the OCI source, which is the primary fix in this changeset.
Description check ✅ Passed The description is comprehensive and directly related to the changeset, explaining the motivation, implementation, testing, and risk assessment for the registry default correction.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Review rate limit: 8/10 reviews remaining, refill in 11 minutes and 53 seconds.

Comment @coderabbitai help to get the list of available commands and usage tips.

@yuanchen8911 yuanchen8911 force-pushed the chore/registry-nvsentinel-oci-default branch from 74902e9 to 5e8ca09 Compare April 30, 2026 23:01
@yuanchen8911 yuanchen8911 changed the title fix(recipes): correct nvsentinel registry default to its actual OCI source fix(recipes): correct nvsentinel registry default to OCI source Apr 30, 2026
The `nvsentinel` registry entry declared:

    defaultRepository: https://helm.ngc.nvidia.com/nvidia
    defaultChart: nvidia/nvsentinel

But the chart isn't published to the HTTPS NGC index — only to the
OCI registry at `oci://ghcr.io/nvidia/nvsentinel`. The defaults are
silently ignored today: every nvsentinel-using overlay sets its own
`source: oci://ghcr.io/nvidia` + chart `nvsentinel`, so the broken
HTTPS default never resolves. But anyone relying on the registry
defaults (e.g. via `aicr bundle` without explicit overlay overrides
on this entry) would hit the dead path.

Update the defaults to match what every overlay already uses:

    defaultRepository: oci://ghcr.io/nvidia
    defaultChart: nvsentinel

Same shape as the kai-scheduler entry post-NVIDIA#720 (OCI registry path
in `defaultRepository`, bare chart name in `defaultChart`). Verified
locally:

  $ helm pull oci://ghcr.io/nvidia/nvsentinel --version v1.3.0
  Pulled.
  $ aicr bundle -r recipe.yaml -o /tmp/bundle
  ... generates upstream.env with
      CHART='oci://ghcr.io/nvidia/nvsentinel'
      REPO=''
      VERSION='v1.3.0'

Note: other NGC HTTPS entries in the registry (gpu-operator,
network-operator, nodewright-operator, nvidia-dra-driver-gpu) are
unchanged — those charts are genuinely served by the HTTPS NGC
index. nvsentinel is special because it ships only via OCI.

Refs: NVIDIA#698 (Phase 1 follow-up NVIDIA#2)
@yuanchen8911 yuanchen8911 force-pushed the chore/registry-nvsentinel-oci-default branch from 5e8ca09 to eb80519 Compare April 30, 2026 23:02
@yuanchen8911 yuanchen8911 marked this pull request as ready for review April 30, 2026 23:10
@yuanchen8911 yuanchen8911 requested a review from a team as a code owner April 30, 2026 23:10
@yuanchen8911 yuanchen8911 enabled auto-merge (squash) April 30, 2026 23:10
@yuanchen8911 yuanchen8911 merged commit 39c8c29 into NVIDIA:main Apr 30, 2026
83 of 84 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants