docs: post-#866 slurm platform-enum drift + nodeSelector catalog note by ArangoGutierrez · Pull Request #884 · NVIDIA/aicr

ArangoGutierrez · 2026-05-14T13:00:05Z

Summary

Single-commit follow-up to PR #866 (Slinky slurm-operator). Backfills platform enum drift on 7 surfaces (1 contributor doc, 3 Go godoc, 1 user-catalog row, 1 internal AICR snapshot-analysis skill, plus the new guard test), and surfaces the chart v1.1.0 nodeSelector silent-ignore limitation on the user-facing component-catalog row.

Motivation / Context

PR #866 added --platform slurm and updated the OpenAPI spec, pkg/recipe/criteria.go (the GetCriteriaPlatformTypes return value), docs/user/cli-reference.md, docs/user/api-reference.md, and docs/user/component-catalog.md. Per the project's enum-audit rule (CLAUDE.md "Documentation updates"), every surface that enumerates platform values should also have been updated.

Most surfaces still showed only kubeflow (pre-existing drift predating slurm — dynamo and nim were already missing). PR #896 (@yuanchen8911, merged 2026-05-14) fixed docs/README.md and docs/contributor/validations.md as part of a broader docs audit. PR #893 removed the site/docs/ vitepress mirror entirely. This PR catches the remaining surfaces:

Surface	Pre-fix	After
`docs/contributor/data.md:130`	`kubeflow`	`dynamo, kubeflow, nim, slurm`
`pkg/recipe/criteria.go:246` (godoc)	`(kubeflow)`	`(dynamo, kubeflow, nim, slurm)`
`pkg/recipe/doc.go:32` (struct shape comment)	`kubeflow, dynamo, nim, slurm, any` (non-alphabetical)	`dynamo, kubeflow, nim, slurm, any`
`pkg/recipe/doc.go:93-98` (constant listing)	`Kubeflow, Dynamo, NIM, Slurm, Any` (non-alphabetical)	alphabetical order
`pkg/api/doc.go:72` (REST API godoc)	`(dynamo, kubeflow, nim, any)` (missing slurm)	`(dynamo, kubeflow, nim, slurm, any)`
`.claude/skills/analyzing-snapshots/SKILL.md:278` (internal skill table)	`kubeflow, dynamo, nim`	`dynamo, kubeflow, nim, slurm`

Of these, pkg/api/doc.go:72 was the only one with factual content drift (missing slurm entirely); the rest are ordering or pre-existing-without-slurm drift.

Also appends a **Known limitation:** clause to the slinky-slurm-operator row in docs/user/component-catalog.md documenting that chart v1.1.0 silently ignores operator.nodeSelector and webhook.nodeSelector (per @coderabbitai review on #866). The same limitation is already inline in recipes/registry.yaml; this surfaces it on the rendered user catalog.

Adds pkg/recipe/doc_test.go::TestCriteriaPlatformConstantsMatchGetter — a guard test that asserts the CriteriaPlatform* constants and GetCriteriaPlatformTypes() stay in sync. Mechanically catches the exact class of drift this PR fixes if a future platform value is added to one but not the other.

Fixes: N/A
Related: #866, #896, SlinkyProject/slurm-operator#187

Type of Change

Bug fix (non-breaking change that fixes an issue)
New feature (non-breaking change that adds functionality)
Breaking change
Documentation update
Refactoring (no functional changes)
Build/CI/tooling

Component(s) Affected

CLI (cmd/aicr, pkg/cli)
API server (cmd/aicrd, pkg/api, pkg/server) — godoc comment only, no behavior change
Recipe engine / data (pkg/recipe) — godoc comment + guard test, no behavior change
Bundlers (pkg/bundler, pkg/component/*)
Collectors / snapshotter (pkg/collector, pkg/snapshotter)
Validator (pkg/validator)
Core libraries (pkg/errors, pkg/k8s)
Docs/examples (docs/, examples/)
Other: .claude/skills/

Implementation Notes

Single squashed commit on top of current upstream/main. Each edit is one line apart from the guard test (45 lines, license header included).

Originally planned but no longer needed

Mark's two "optional nits" from Add Slinky slurm-operator as platform-slurm #866 — chart-default verification pointer in recipes/registry.yaml, vacuous-pass guard comment in recipes/checks/slinky-slurm-operator/health-check.yaml — were already in main when this PR opened (Mark applied them himself before merging Add Slinky slurm-operator as platform-slurm #866).
docs/README.md and docs/contributor/validations.md — fixed by Yuan Chen's PR docs: fix audit findings across docs, demos, examples #896 while this PR was in draft.
site/docs/getting-started/index.md — entire site/docs/ tree removed by PR chore(docs): remove vitepress site in favor of fern docs #893 (vitepress → fern migration).

Internal review panel

This PR went through a Principal Engineer + QA Engineer + Devil's Advocate review panel after the first push as draft. Panel surfaced the three Go godoc surfaces (pkg/api/doc.go, pkg/recipe/doc.go x2) and the internal skill file (.claude/skills/analyzing-snapshots/SKILL.md) that the original audit missed, plus recommended the guard test in pkg/recipe/doc_test.go. All findings addressed.

Testing

unset GITLAB_TOKEN && golangci-lint run -c .golangci.yaml ./pkg/recipe/... ./pkg/api/...
# 0 issues.

go test -run TestCriteriaPlatformConstantsMatchGetter -v ./pkg/recipe/...
# === RUN   TestCriteriaPlatformConstantsMatchGetter
# --- PASS: TestCriteriaPlatformConstantsMatchGetter (0.00s)
# PASS
# ok  	github.com/NVIDIA/aicr/pkg/recipe	3.055s

make qualify full run on a prior iteration of this branch showed all stages PASS except for three flaky failures in pkg/bundler/deployer/helm (TestUndeployScript_Preflight*) that fail under parallel load with signal: killed and pass cleanly in isolation (14s). These tests spawn bash subprocesses that exec helm/kubectl; the OS kills them under CPU pressure during full-suite parallel test execution. Unrelated to this PR (different package, different files). Expecting CI server-side to pass with more headroom.

Coverage for pkg/recipe: 90.2% — new guard test adds coverage to the platform-type constants without removing any existing coverage.

Risk Assessment

Low — All edits are doc/godoc comment changes plus one new guard test. No code paths changed. Reversible by git revert.
Medium
High

Rollout notes: N/A — additive doc clarifications and a guard test.

Checklist

Tests pass locally (make test with -race) — pkg/recipe new test passes; pre-existing flakes in helm-deployer unrelated
Linter passes (golangci-lint run -c .golangci.yaml)
I did not skip/disable tests to make CI green
I added/updated tests for new functionality (pkg/recipe/doc_test.go)
I updated docs if user-visible behavior changed
Changes follow existing patterns in the codebase
Commits are cryptographically signed (git commit -S)

coderabbitai · 2026-05-14T13:02:06Z

📝 Walkthrough

Walkthrough

This PR documents the expanded support for workload platform/framework types in the Criteria field. Changes include: (1) updating field comments in criteria.go and doc.go to describe the platform type as generic and list example values (dynamo, kubeflow, nim, slurm); (2) expanding documentation of accepted platform values across the API endpoint documentation, contributor guides, and skill guides; (3) adding a known limitation note to the slinky-slurm-operator component catalog entry; and (4) introducing a unit test that validates the platform constants match the output of GetCriteriaPlatformTypes().

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

NVIDIA/aicr#866: Added the CriteriaPlatformSlurm constant and updated GetCriteriaPlatformTypes() to include "slurm", which this PR's test and documentation now align with.

Suggested labels

size/S

Suggested reviewers

mchmarny
njhensley
lockwobr

🚥 Pre-merge checks | ✅ 4

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and specifically describes the main change: addressing platform-enum drift from PR `#866` and adding a nodeSelector limitation note to the catalog.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Description check	✅ Passed	The pull request description thoroughly details the changes across multiple files and surfaces related to backfilling platform-enum drift following PR `#866`, plus documenting a known limitation in the component catalog.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

Generate code and open pull requests
Plan features and break down work
Investigate incidents and troubleshoot customer tickets together
Automate recurring tasks and respond to alerts with triggers
Summarize progress and report instantly

Built for teams:

Shared memory across your entire org—no repeating context
Per-thread sandboxes to safely plan and execute work
Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

@mchmarny

…g note Two follow-ups to PR NVIDIA#866 (Slinky slurm-operator), bundled into one post-merge cleanup PR. Platform-enum drift backfill — surfaces NVIDIA#866 did not catch ============================================================ PR NVIDIA#866 added `--platform slurm` to the OpenAPI spec, the Go enum (GetCriteriaPlatformTypes), and docs/user/{cli,api}-reference.md + component-catalog.md. Per the project's enum-audit rule (CLAUDE.md "Documentation updates"), every surface that enumerates platform values should be updated. The following surfaces still showed `kubeflow` only (pre-existing drift predating slurm — `dynamo` and `nim` were already missing): - docs/README.md:11 — glossary Criteria row - docs/contributor/data.md:130 — platform row in criteria table - docs/contributor/validations.md:99 — platform bullet - site/docs/getting-started/index.md:19 — glossary mirror - pkg/recipe/criteria.go:246 — Platform field godoc - pkg/recipe/doc.go:32 — Criteria struct shape comment (was non-alphabetical "kubeflow, dynamo, nim, slurm, any") - pkg/recipe/doc.go:93-98 — CriteriaPlatform* constant listing (was non-alphabetical order) - pkg/api/doc.go:72 — REST API doc was missing slurm entirely (only one of these with factual content drift; rest are ordering) - .claude/skills/analyzing-snapshots/SKILL.md:278 — internal AICR snapshot-analysis skill criteria table All now list `dynamo, kubeflow, nim, slurm` in alphabetical order matching pkg/recipe.GetCriteriaPlatformTypes() (the authoritative Go enum). Per-file delimiter convention preserved: slash-separated in glossary-style parens, comma-separated in tables/godoc. New guard test ============== pkg/recipe/doc_test.go::TestCriteriaPlatformConstantsMatchGetter asserts the CriteriaPlatform* constants and GetCriteriaPlatformTypes() stay in sync. Catches the exact class of drift this commit fixes if a future platform value is added to one but not the other. Catalog node-selector limitation note ===================================== Appends a `**Known limitation:**` clause to the slinky-slurm-operator row in docs/user/component-catalog.md, documenting that chart v1.1.0 silently ignores `operator.nodeSelector` and `webhook.nodeSelector` and linking SlinkyProject/slurm-operator#187 for the upstream fix. The same limitation is already in-line in recipes/registry.yaml; the catalog note surfaces it to readers of the rendered user docs. Originally planned but no longer needed ======================================= While auditing, I had also planned to add Mark's two "optional nits" from NVIDIA#866 — a chart-default verification pointer at defaultVersion in recipes/registry.yaml, and a vacuous-pass guard comment in recipes/checks/slinky-slurm-operator/health-check.yaml. Both turned out to already be in the merged version (Mark applied them himself before merge — see recipes/registry.yaml:550-555 and recipes/checks/slinky-slurm-operator/health-check.yaml:64-69). The .lychee.toml header change is the project's license-header check auto-applying the standard Apache 2.0 copyright block. Addresses ========= Mark (@mchmarny) review feedback on NVIDIA#866: - Doc-audit gap across user docs (cli-reference + api-reference landed at merge time; this PR catches the remaining surfaces) CodeRabbit review feedback on NVIDIA#866: - NodeSelector limitation note on slinky-slurm-operator catalog row Internal panel review (PE + QA + DA) on draft NVIDIA#884: - Extended audit to pkg/api/doc.go (factual miss), pkg/recipe/doc.go (ordering), .claude/skills/analyzing-snapshots/SKILL.md - Added guard test against future enum/getter drift - Squashed to single commit; rebased onto upstream/main HEAD Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>

ArangoGutierrez · 2026-05-15T07:50:58Z

Promoted draft scope after an internal Principal Engineer + QA Engineer + Devil's Advocate panel review.

Panel findings (all addressed in the squashed commit):

PE + DA both flagged: pkg/recipe/doc.go:32 and pkg/recipe/doc.go:93-98 listed platform constants in non-alphabetical order, inconsistent with GetCriteriaPlatformTypes() and this PR's stated audit principle.
DA flagged: pkg/api/doc.go:72 was missing slurm entirely (only surface with factual content drift, not just ordering).
DA flagged: .claude/skills/analyzing-snapshots/SKILL.md:278 had a stale criteria table; internal but in-tree.
PE nice-to-have, adopted: Added pkg/recipe/doc_test.go::TestCriteriaPlatformConstantsMatchGetter to mechanically catch future drift between the constants and the getter.
QA flagged: Original branch was 2 commits, not squashed; was BEHIND upstream/main against the latest tip. Both addressed (single commit on top of current main 6dfbb5dc-equivalent — rebase clean).

Items intentionally not addressed (rationale below):

pkg/bundler/deployer/helm test failures under make qualify parallel load — unrelated to this PR (different package). Tests pass cleanly in isolation. Expecting CI server-side to pass with more headroom; will investigate separately if CI flakes too.

@mchmarny

…g note Follow-up to PR NVIDIA#866 (Slinky slurm-operator). Backfills platform-enum drift on surfaces the original PR did not catch, plus surfaces the chart v1.1.0 nodeSelector silent-ignore limitation on the user-facing component-catalog row. PR NVIDIA#896 (yuanchen8911) already fixed docs/README.md and docs/contributor/validations.md as part of a broader docs audit. PR NVIDIA#893 removed the site/docs/ vitepress mirror entirely. Remaining surfaces this PR addresses: - docs/contributor/data.md:130 — platform row in criteria table - docs/user/component-catalog.md:36 — slinky-slurm-operator row, appended **Known limitation:** clause documenting chart v1.1.0 silent-ignore of operator.nodeSelector/webhook.nodeSelector (the same limitation is already inline in recipes/registry.yaml; this surfaces it on the rendered user catalog) - pkg/api/doc.go:72 — REST API godoc was missing slurm entirely - pkg/recipe/doc.go:32 — struct shape comment reordered to alphabetical to match GetCriteriaPlatformTypes() - pkg/recipe/doc.go:93-98 — CriteriaPlatform* constants reordered alphabetical - pkg/recipe/criteria.go:246 — Platform-field godoc updated - .claude/skills/analyzing-snapshots/SKILL.md:278 — internal AICR snapshot-analysis skill criteria table All now list 'dynamo, kubeflow, nim, slurm' alphabetically matching pkg/recipe.GetCriteriaPlatformTypes() (the authoritative Go enum). New guard test ============== pkg/recipe/doc_test.go::TestCriteriaPlatformConstantsMatchGetter asserts the CriteriaPlatform* constants and GetCriteriaPlatformTypes() stay in sync. Mechanically catches the exact class of drift this commit fixes if a future platform value is added to one but not the other. Addressed reviews ================= @mchmarny (NVIDIA#866) — Doc-audit gap. cli-reference and api-reference were fixed at merge time and NVIDIA#896 picked up README/validations; this PR catches the remaining surfaces (contributor/data, the three Go godoc files, and the internal skill table). @coderabbitai (NVIDIA#866) — NodeSelector limitation note on the slinky-slurm-operator catalog row. Internal PE + QA + DA panel review on draft NVIDIA#884 — extended audit to pkg/api/doc.go (factual miss), pkg/recipe/doc.go (ordering), .claude/skills/analyzing-snapshots/SKILL.md, plus the guard test in pkg/recipe/doc_test.go. Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>

github-actions · 2026-05-15T13:09:07Z

🌿 Preview your docs: https://nvidia-preview-chore-post-866-slurm-followups.docs.buildwithfern.com/aicr

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/recipe/doc_test.go`:
- Around line 27-45: The test TestCriteriaPlatformConstantsMatchGetter is a
single-case check and must be converted into a table-driven test with multiple
cases; replace the current body with a slice of test cases (structs) each
containing a name and expected []string (e.g., a case for the declared
constants, a case for an empty/modified set if relevant), iterate over cases
using t.Run, call GetCriteriaPlatformTypes() inside each subtest, compare
lengths and elements and fail via t.Fatalf/t.Errorf when mismatches occur;
reference the existing constants CriteriaPlatformDynamo,
CriteriaPlatformKubeflow, CriteriaPlatformNIM, CriteriaPlatformSlurm and the
getter GetCriteriaPlatformTypes() and keep the same comparison logic but
executed per test case within TestCriteriaPlatformConstantsMatchGetter.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: aecd6f0b-a2e5-4ae6-9425-f05789b772f3

📥 Commits

Reviewing files that changed from the base of the PR and between b1ef88a and 179a346.

📒 Files selected for processing (7)

.claude/skills/analyzing-snapshots/SKILL.md
docs/contributor/data.md
docs/user/component-catalog.md
pkg/api/doc.go
pkg/recipe/criteria.go
pkg/recipe/doc.go
pkg/recipe/doc_test.go

coderabbitai · 2026-05-15T13:10:42Z

+func TestCriteriaPlatformConstantsMatchGetter(t *testing.T) {
+	declared := []string{
+		string(CriteriaPlatformDynamo),
+		string(CriteriaPlatformKubeflow),
+		string(CriteriaPlatformNIM),
+		string(CriteriaPlatformSlurm),
+	}
+	sort.Strings(declared)
+
+	got := GetCriteriaPlatformTypes()
+	if len(got) != len(declared) {
+		t.Fatalf("len(GetCriteriaPlatformTypes())=%d, declared constants=%d", len(got), len(declared))
+	}
+	for i, want := range declared {
+		if got[i] != want {
+			t.Errorf("GetCriteriaPlatformTypes()[%d] = %q, want %q (declared)", i, got[i], want)
+		}
+	}
+}


🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Convert this to a table-driven test with multiple cases.

The guard is useful, but this file’s guideline requires table-driven tests with multiple test cases.

♻️ Proposed refactor

func TestCriteriaPlatformConstantsMatchGetter(t *testing.T) { declared := []string{ string(CriteriaPlatformDynamo), string(CriteriaPlatformKubeflow), string(CriteriaPlatformNIM), string(CriteriaPlatformSlurm), } sort.Strings(declared) - got := GetCriteriaPlatformTypes() - if len(got) != len(declared) { - t.Fatalf("len(GetCriteriaPlatformTypes())=%d, declared constants=%d", len(got), len(declared)) - } - for i, want := range declared { - if got[i] != want { - t.Errorf("GetCriteriaPlatformTypes()[%d] = %q, want %q (declared)", i, got[i], want) - } - } + got := GetCriteriaPlatformTypes() + tests := []struct { + name string + want []string + }{ + { + name: "getter matches declared constants", + want: declared, + }, + { + name: "getter output is alphabetically sorted", + want: func() []string { + sorted := append([]string(nil), got...) + sort.Strings(sorted) + return sorted + }(), + }, + } + + for _, tc := range tests { + t.Run(tc.name, func(t *testing.T) { + if len(got) != len(tc.want) { + t.Fatalf("len(GetCriteriaPlatformTypes())=%d, want=%d", len(got), len(tc.want)) + } + for i, want := range tc.want { + if got[i] != want { + t.Errorf("GetCriteriaPlatformTypes()[%d] = %q, want %q", i, got[i], want) + } + } + }) + } }

As per coding guidelines: "**/*_test.go: Always use table-driven tests with multiple test cases organized in a slice of structs".

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/recipe/doc_test.go` around lines 27 - 45, The test TestCriteriaPlatformConstantsMatchGetter is a single-case check and must be converted into a table-driven test with multiple cases; replace the current body with a slice of test cases (structs) each containing a name and expected []string (e.g., a case for the declared constants, a case for an empty/modified set if relevant), iterate over cases using t.Run, call GetCriteriaPlatformTypes() inside each subtest, compare lengths and elements and fail via t.Fatalf/t.Errorf when mismatches occur; reference the existing constants CriteriaPlatformDynamo, CriteriaPlatformKubeflow, CriteriaPlatformNIM, CriteriaPlatformSlurm and the getter GetCriteriaPlatformTypes() and keep the same comparison logic but executed per test case within TestCriteriaPlatformConstantsMatchGetter.

Pushing back on this one. Project CLAUDE.md says "Table-driven tests (required for multiple cases)" — emphasis on multiple. This test has one invariant: the CriteriaPlatform* constants match what GetCriteriaPlatformTypes() returns. The constants and the getter are package-level singletons, so the "empty/modified set" case in your suggestion would test the framework, not the invariant under guard.

Converting would add ~20 lines of boilerplate for zero additional behavior coverage. Happy to revisit if a reviewer disagrees.

@ArangoGutierrez, you're right — the guideline says "required for multiple cases", and this test has exactly one meaningful invariant. My proposed second case ("getter output is alphabetically sorted") was a tautology: it sorted got against got, exercising nothing real. The pushback is well-founded and the current implementation is appropriate for a single-invariant guard test. Withdrawing the suggestion.

(ᵔᴥᵔ)

✏️ Learnings added

Learnt from: ArangoGutierrez Repo: NVIDIA/aicr PR: 884 File: pkg/recipe/doc_test.go:27-45 Timestamp: 2026-05-15T14:08:57.038Z Learning: In NVIDIA/aicr, CLAUDE.md's table-driven test requirement applies only when there are genuinely multiple test cases ("Table-driven tests (required for multiple cases)"). Single-invariant guard tests — such as `TestCriteriaPlatformConstantsMatchGetter` in `pkg/recipe/doc_test.go`, which checks that `CriteriaPlatform*` constants match `GetCriteriaPlatformTypes()` — do not need to be converted to table-driven form. Do not flag such tests for table-driven refactoring when there is only one real invariant to validate.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

🧠 Learnings used

Learnt from: njhensley Repo: NVIDIA/aicr PR: 595 File: pkg/recipe/loader.go:30-35 Timestamp: 2026-04-16T18:20:41.689Z Learning: Do not flag uses of `context.WithTimeout(context.Background(), timeout)` when they match the repo’s documented “GOOD” pattern (per `CLAUDE.md`). In particular, treat it as intentional/acceptable in `pkg/serializer` implementations such as `HTTPReader.Download` and `fromConfigMapWithKubeconfig`. Do not require threading `context.Context` through `serializer.FromFileWithKubeconfig` (or other cross-package signature changes) as a guideline violation; if callers need external cancellation propagation, they should use the existing `DownloadWithContext` API rather than changing wide call chains and tests.

Learnt from: njhensley Repo: NVIDIA/aicr PR: 595 File: pkg/recipe/loader.go:56-60 Timestamp: 2026-04-16T18:20:53.204Z Learning: In this repository’s Go code, if an error was already created/wrapped with `pkg/errors` (via `aicrerrors.New(...)` or `aicrerrors.WrapWithContext(...)`) and includes the correct structured error code (e.g., `ErrCodeInvalidRequest`, `ErrCodeTimeout`, `ErrCodeInternal`), then returning it should use the bare form `return nil, err` (or propagate the error as-is). Do not require “missing context” when the source already has proper codes; do not double-wrap it. Avoid changing the error by using `errors.Wrapf` (which can override the structured code) or `fmt.Errorf("%w", err)` in these cases, which is documented as an anti-pattern in `CLAUDE.md`.

Learnt from: mchmarny Repo: NVIDIA/aicr PR: 728 File: pkg/recipe/criteria.go:801-820 Timestamp: 2026-05-01T23:59:30.754Z Learning: In NVIDIA/aicr, use the idiomatic helper `errors.PropagateOrWrap(err, errCode, message)` from `pkg/errors` when handling errors in Go code. This helper should be preferred over manual `errors.As` + conditional return logic: it propagates `err` as-is if it already carries a `*StructuredError` code, and otherwise wraps `err` using the provided fallback `errCode` and `message`. Apply this pattern especially when calling functions from `pkg/serializer`, `pkg/recipe`, or other packages that may already return coded structured errors.

Learnt from: mchmarny Repo: NVIDIA/aicr PR: 728 File: pkg/k8s/agent/job_watch_test.go:140-150 Timestamp: 2026-05-01T23:59:44.270Z Learning: In Go unit tests, do not require or suggest adding `return` statements immediately after `t.Fatal()` / `t.Fatalf()` calls. These functions internally terminate the current test goroutine via `runtime.Goexit()`, so any explicit `return` afterward is redundant/unreachable. Review should focus on the logic before `t.Fatal*`; the absence of a trailing `return` should not be treated as a problem.

Learnt from: mchmarny Repo: NVIDIA/aicr PR: 728 File: pkg/server/middleware_test.go:400-402 Timestamp: 2026-05-01T23:59:43.871Z Learning: In Go test files, don’t require a `return` statement immediately after `t.Fatal(...)`/`t.Fatalf(...)` calls. `t.Fatal` calls `runtime.Goexit()`, so any following `return` is unreachable/redundant (though deferred functions will still run). If the lint/static checks already pass without the explicit `return`, avoid flagging the missing `return` in future reviews (e.g., in this repo’s Go tests).

Learnt from: njhensley Repo: NVIDIA/aicr PR: 844 File: pkg/fingerprint/from_measurements.go:200-220 Timestamp: 2026-05-11T21:09:20.802Z Learning: In this repo’s Go code, do not flag or request changes for `int64`→`int` casts when the value originates from a Go `int` and remains in-process (e.g., it round-trips via `measurement.Reading.Any()` but never crosses a system boundary like serialization/network/DB). Only validate/guard numeric ranges at system boundaries; adding `math.MaxInt32/MinInt32` clamping for such internal-only casts is an anti-pattern (per CLAUDE.md), since the scenario is guaranteed by internal/framework behavior and 64-bit Linux targets keep `int` effectively 64 bits.

Learnt from: kaponco Repo: NVIDIA/aicr PR: 857 File: pkg/recipe/metadata.go:567-597 Timestamp: 2026-05-13T06:00:04.625Z Learning: In NVIDIA/aicr’s overlay model, supported behavior does NOT include converting a ComponentTypeDirect component into ComponentTypeHelm or ComponentTypeKustomize (and not the reverse in those directions) via overlays. In `mergeComponentRef`, switch cases that appear to support Helm↔Direct or Kustomize↔Direct type changes (e.g., clearing `SourceFile`/`Olm`) are defensive dead code. Reviewers should not flag the absence of these transitions as a missing validation or bug, and should treat Direct→Helm/Kustomize overrides as outside the supported overlay model.

faganihajizada

Looks good! thanks!

ArangoGutierrez · 2026-05-19T16:15:01Z

@mchmarny PTAL

ArangoGutierrez added the documentation label May 14, 2026

github-actions Bot added area/docs size/S labels May 14, 2026

mchmarny assigned ArangoGutierrez May 14, 2026

ArangoGutierrez force-pushed the chore/post-866-slurm-followups branch from 5a66d80 to b1ef88a Compare May 15, 2026 07:16

github-actions Bot added area/api size/M and removed size/S labels May 15, 2026

ArangoGutierrez force-pushed the chore/post-866-slurm-followups branch from b1ef88a to 179a346 Compare May 15, 2026 13:08

coderabbitai Bot reviewed May 15, 2026

View reviewed changes

ArangoGutierrez requested review from mchmarny and yuanchen8911 May 15, 2026 13:19

ArangoGutierrez marked this pull request as ready for review May 15, 2026 13:19

ArangoGutierrez requested a review from a team as a code owner May 15, 2026 13:19

faganihajizada approved these changes May 15, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: post-#866 slurm platform-enum drift + nodeSelector catalog note#884

docs: post-#866 slurm platform-enum drift + nodeSelector catalog note#884
ArangoGutierrez wants to merge 1 commit into
NVIDIA:mainfrom
ArangoGutierrez:chore/post-866-slurm-followups

ArangoGutierrez commented May 14, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented May 14, 2026 •

edited

Loading

Walkthrough

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Uh oh!

ArangoGutierrez commented May 15, 2026

Uh oh!

github-actions Bot commented May 15, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 15, 2026 •

edited

Loading

Uh oh!

ArangoGutierrez May 15, 2026

Uh oh!

coderabbitai Bot May 15, 2026

Uh oh!

faganihajizada left a comment

Uh oh!

ArangoGutierrez commented May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ArangoGutierrez commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation / Context

Type of Change

Component(s) Affected

Implementation Notes

Originally planned but no longer needed

Internal review panel

Testing

Risk Assessment

Checklist

Uh oh!

coderabbitai Bot commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Uh oh!

ArangoGutierrez commented May 15, 2026

Uh oh!

github-actions Bot commented May 15, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ArangoGutierrez May 15, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 15, 2026

Choose a reason for hiding this comment

Uh oh!

faganihajizada left a comment

Choose a reason for hiding this comment

Uh oh!

ArangoGutierrez commented May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ArangoGutierrez commented May 14, 2026 •

edited

Loading

coderabbitai Bot commented May 14, 2026 •

edited

Loading

coderabbitai Bot May 15, 2026 •

edited

Loading