Skip to content

docs: post-#866 slurm platform-enum drift + nodeSelector catalog note#884

Open
ArangoGutierrez wants to merge 1 commit into
NVIDIA:mainfrom
ArangoGutierrez:chore/post-866-slurm-followups
Open

docs: post-#866 slurm platform-enum drift + nodeSelector catalog note#884
ArangoGutierrez wants to merge 1 commit into
NVIDIA:mainfrom
ArangoGutierrez:chore/post-866-slurm-followups

Conversation

@ArangoGutierrez
Copy link
Copy Markdown
Contributor

@ArangoGutierrez ArangoGutierrez commented May 14, 2026

Summary

Single-commit follow-up to PR #866 (Slinky slurm-operator). Backfills platform enum drift on 7 surfaces (1 contributor doc, 3 Go godoc, 1 user-catalog row, 1 internal AICR snapshot-analysis skill, plus the new guard test), and surfaces the chart v1.1.0 nodeSelector silent-ignore limitation on the user-facing component-catalog row.

Motivation / Context

PR #866 added --platform slurm and updated the OpenAPI spec, pkg/recipe/criteria.go (the GetCriteriaPlatformTypes return value), docs/user/cli-reference.md, docs/user/api-reference.md, and docs/user/component-catalog.md. Per the project's enum-audit rule (CLAUDE.md "Documentation updates"), every surface that enumerates platform values should also have been updated.

Most surfaces still showed only kubeflow (pre-existing drift predating slurm — dynamo and nim were already missing). PR #896 (@yuanchen8911, merged 2026-05-14) fixed docs/README.md and docs/contributor/validations.md as part of a broader docs audit. PR #893 removed the site/docs/ vitepress mirror entirely. This PR catches the remaining surfaces:

Surface Pre-fix After
docs/contributor/data.md:130 kubeflow dynamo, kubeflow, nim, slurm
pkg/recipe/criteria.go:246 (godoc) (kubeflow) (dynamo, kubeflow, nim, slurm)
pkg/recipe/doc.go:32 (struct shape comment) kubeflow, dynamo, nim, slurm, any (non-alphabetical) dynamo, kubeflow, nim, slurm, any
pkg/recipe/doc.go:93-98 (constant listing) Kubeflow, Dynamo, NIM, Slurm, Any (non-alphabetical) alphabetical order
pkg/api/doc.go:72 (REST API godoc) (dynamo, kubeflow, nim, any) (missing slurm) (dynamo, kubeflow, nim, slurm, any)
.claude/skills/analyzing-snapshots/SKILL.md:278 (internal skill table) kubeflow, dynamo, nim dynamo, kubeflow, nim, slurm

Of these, pkg/api/doc.go:72 was the only one with factual content drift (missing slurm entirely); the rest are ordering or pre-existing-without-slurm drift.

Also appends a **Known limitation:** clause to the slinky-slurm-operator row in docs/user/component-catalog.md documenting that chart v1.1.0 silently ignores operator.nodeSelector and webhook.nodeSelector (per @coderabbitai review on #866). The same limitation is already inline in recipes/registry.yaml; this surfaces it on the rendered user catalog.

Adds pkg/recipe/doc_test.go::TestCriteriaPlatformConstantsMatchGetter — a guard test that asserts the CriteriaPlatform* constants and GetCriteriaPlatformTypes() stay in sync. Mechanically catches the exact class of drift this PR fixes if a future platform value is added to one but not the other.

Fixes: N/A
Related: #866, #896, SlinkyProject/slurm-operator#187

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change
  • Documentation update
  • Refactoring (no functional changes)
  • Build/CI/tooling

Component(s) Affected

  • CLI (cmd/aicr, pkg/cli)
  • API server (cmd/aicrd, pkg/api, pkg/server) — godoc comment only, no behavior change
  • Recipe engine / data (pkg/recipe) — godoc comment + guard test, no behavior change
  • Bundlers (pkg/bundler, pkg/component/*)
  • Collectors / snapshotter (pkg/collector, pkg/snapshotter)
  • Validator (pkg/validator)
  • Core libraries (pkg/errors, pkg/k8s)
  • Docs/examples (docs/, examples/)
  • Other: .claude/skills/

Implementation Notes

Single squashed commit on top of current upstream/main. Each edit is one line apart from the guard test (45 lines, license header included).

Originally planned but no longer needed

Internal review panel

This PR went through a Principal Engineer + QA Engineer + Devil's Advocate review panel after the first push as draft. Panel surfaced the three Go godoc surfaces (pkg/api/doc.go, pkg/recipe/doc.go x2) and the internal skill file (.claude/skills/analyzing-snapshots/SKILL.md) that the original audit missed, plus recommended the guard test in pkg/recipe/doc_test.go. All findings addressed.

Testing

unset GITLAB_TOKEN && golangci-lint run -c .golangci.yaml ./pkg/recipe/... ./pkg/api/...
# 0 issues.

go test -run TestCriteriaPlatformConstantsMatchGetter -v ./pkg/recipe/...
# === RUN   TestCriteriaPlatformConstantsMatchGetter
# --- PASS: TestCriteriaPlatformConstantsMatchGetter (0.00s)
# PASS
# ok  	github.com/NVIDIA/aicr/pkg/recipe	3.055s

make qualify full run on a prior iteration of this branch showed all stages PASS except for three flaky failures in pkg/bundler/deployer/helm (TestUndeployScript_Preflight*) that fail under parallel load with signal: killed and pass cleanly in isolation (14s). These tests spawn bash subprocesses that exec helm/kubectl; the OS kills them under CPU pressure during full-suite parallel test execution. Unrelated to this PR (different package, different files). Expecting CI server-side to pass with more headroom.

Coverage for pkg/recipe: 90.2% — new guard test adds coverage to the platform-type constants without removing any existing coverage.

Risk Assessment

  • Low — All edits are doc/godoc comment changes plus one new guard test. No code paths changed. Reversible by git revert.
  • Medium
  • High

Rollout notes: N/A — additive doc clarifications and a guard test.

Checklist

  • Tests pass locally (make test with -race) — pkg/recipe new test passes; pre-existing flakes in helm-deployer unrelated
  • Linter passes (golangci-lint run -c .golangci.yaml)
  • I did not skip/disable tests to make CI green
  • I added/updated tests for new functionality (pkg/recipe/doc_test.go)
  • I updated docs if user-visible behavior changed
  • Changes follow existing patterns in the codebase
  • Commits are cryptographically signed (git commit -S)

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 14, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

This PR documents the expanded support for workload platform/framework types in the Criteria field. Changes include: (1) updating field comments in criteria.go and doc.go to describe the platform type as generic and list example values (dynamo, kubeflow, nim, slurm); (2) expanding documentation of accepted platform values across the API endpoint documentation, contributor guides, and skill guides; (3) adding a known limitation note to the slinky-slurm-operator component catalog entry; and (4) introducing a unit test that validates the platform constants match the output of GetCriteriaPlatformTypes().

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

  • NVIDIA/aicr#866: Added the CriteriaPlatformSlurm constant and updated GetCriteriaPlatformTypes() to include "slurm", which this PR's test and documentation now align with.

Suggested labels

size/S

Suggested reviewers

  • mchmarny
  • njhensley
  • lockwobr
🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: addressing platform-enum drift from PR #866 and adding a nodeSelector limitation note to the catalog.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed The pull request description thoroughly details the changes across multiple files and surfaces related to backfilling platform-enum drift following PR #866, plus documenting a known limitation in the component catalog.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Comment @coderabbitai help to get the list of available commands and usage tips.

@ArangoGutierrez ArangoGutierrez force-pushed the chore/post-866-slurm-followups branch from 5a66d80 to b1ef88a Compare May 15, 2026 07:16
ArangoGutierrez added a commit to ArangoGutierrez/aicr that referenced this pull request May 15, 2026
…g note

Two follow-ups to PR NVIDIA#866 (Slinky slurm-operator), bundled into one
post-merge cleanup PR.

Platform-enum drift backfill — surfaces NVIDIA#866 did not catch
============================================================

PR NVIDIA#866 added `--platform slurm` to the OpenAPI spec, the Go enum
(GetCriteriaPlatformTypes), and docs/user/{cli,api}-reference.md +
component-catalog.md. Per the project's enum-audit rule (CLAUDE.md
"Documentation updates"), every surface that enumerates platform
values should be updated. The following surfaces still showed
`kubeflow` only (pre-existing drift predating slurm — `dynamo` and
`nim` were already missing):

- docs/README.md:11 — glossary Criteria row
- docs/contributor/data.md:130 — platform row in criteria table
- docs/contributor/validations.md:99 — platform bullet
- site/docs/getting-started/index.md:19 — glossary mirror
- pkg/recipe/criteria.go:246 — Platform field godoc
- pkg/recipe/doc.go:32 — Criteria struct shape comment (was
  non-alphabetical "kubeflow, dynamo, nim, slurm, any")
- pkg/recipe/doc.go:93-98 — CriteriaPlatform* constant listing
  (was non-alphabetical order)
- pkg/api/doc.go:72 — REST API doc was missing slurm entirely
  (only one of these with factual content drift; rest are ordering)
- .claude/skills/analyzing-snapshots/SKILL.md:278 — internal AICR
  snapshot-analysis skill criteria table

All now list `dynamo, kubeflow, nim, slurm` in alphabetical order
matching pkg/recipe.GetCriteriaPlatformTypes() (the authoritative Go
enum). Per-file delimiter convention preserved: slash-separated in
glossary-style parens, comma-separated in tables/godoc.

New guard test
==============

pkg/recipe/doc_test.go::TestCriteriaPlatformConstantsMatchGetter
asserts the CriteriaPlatform* constants and GetCriteriaPlatformTypes()
stay in sync. Catches the exact class of drift this commit fixes if
a future platform value is added to one but not the other.

Catalog node-selector limitation note
=====================================

Appends a `**Known limitation:**` clause to the slinky-slurm-operator
row in docs/user/component-catalog.md, documenting that chart v1.1.0
silently ignores `operator.nodeSelector` and `webhook.nodeSelector`
and linking SlinkyProject/slurm-operator#187 for the upstream fix.
The same limitation is already in-line in recipes/registry.yaml; the
catalog note surfaces it to readers of the rendered user docs.

Originally planned but no longer needed
=======================================

While auditing, I had also planned to add Mark's two "optional nits"
from NVIDIA#866 — a chart-default verification pointer at defaultVersion in
recipes/registry.yaml, and a vacuous-pass guard comment in
recipes/checks/slinky-slurm-operator/health-check.yaml. Both turned
out to already be in the merged version (Mark applied them himself
before merge — see recipes/registry.yaml:550-555 and
recipes/checks/slinky-slurm-operator/health-check.yaml:64-69).

The .lychee.toml header change is the project's license-header check
auto-applying the standard Apache 2.0 copyright block.

Addresses
=========

Mark (@mchmarny) review feedback on NVIDIA#866:
  - Doc-audit gap across user docs (cli-reference + api-reference
    landed at merge time; this PR catches the remaining surfaces)

CodeRabbit review feedback on NVIDIA#866:
  - NodeSelector limitation note on slinky-slurm-operator catalog row

Internal panel review (PE + QA + DA) on draft NVIDIA#884:
  - Extended audit to pkg/api/doc.go (factual miss), pkg/recipe/doc.go
    (ordering), .claude/skills/analyzing-snapshots/SKILL.md
  - Added guard test against future enum/getter drift
  - Squashed to single commit; rebased onto upstream/main HEAD

Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
@ArangoGutierrez
Copy link
Copy Markdown
Contributor Author

Promoted draft scope after an internal Principal Engineer + QA Engineer + Devil's Advocate panel review.

Panel findings (all addressed in the squashed commit):

  1. PE + DA both flagged: pkg/recipe/doc.go:32 and pkg/recipe/doc.go:93-98 listed platform constants in non-alphabetical order, inconsistent with GetCriteriaPlatformTypes() and this PR's stated audit principle.
  2. DA flagged: pkg/api/doc.go:72 was missing slurm entirely (only surface with factual content drift, not just ordering).
  3. DA flagged: .claude/skills/analyzing-snapshots/SKILL.md:278 had a stale criteria table; internal but in-tree.
  4. PE nice-to-have, adopted: Added pkg/recipe/doc_test.go::TestCriteriaPlatformConstantsMatchGetter to mechanically catch future drift between the constants and the getter.
  5. QA flagged: Original branch was 2 commits, not squashed; was BEHIND upstream/main against the latest tip. Both addressed (single commit on top of current main 6dfbb5dc-equivalent — rebase clean).

Items intentionally not addressed (rationale below):

  • pkg/bundler/deployer/helm test failures under make qualify parallel load — unrelated to this PR (different package). Tests pass cleanly in isolation. Expecting CI server-side to pass with more headroom; will investigate separately if CI flakes too.

…g note

Follow-up to PR NVIDIA#866 (Slinky slurm-operator). Backfills platform-enum
drift on surfaces the original PR did not catch, plus surfaces the
chart v1.1.0 nodeSelector silent-ignore limitation on the user-facing
component-catalog row.

PR NVIDIA#896 (yuanchen8911) already fixed docs/README.md and
docs/contributor/validations.md as part of a broader docs audit. PR
NVIDIA#893 removed the site/docs/ vitepress mirror entirely. Remaining
surfaces this PR addresses:

- docs/contributor/data.md:130 — platform row in criteria table
- docs/user/component-catalog.md:36 — slinky-slurm-operator row,
  appended **Known limitation:** clause documenting chart v1.1.0
  silent-ignore of operator.nodeSelector/webhook.nodeSelector
  (the same limitation is already inline in recipes/registry.yaml;
  this surfaces it on the rendered user catalog)
- pkg/api/doc.go:72 — REST API godoc was missing slurm entirely
- pkg/recipe/doc.go:32 — struct shape comment reordered to alphabetical
  to match GetCriteriaPlatformTypes()
- pkg/recipe/doc.go:93-98 — CriteriaPlatform* constants reordered
  alphabetical
- pkg/recipe/criteria.go:246 — Platform-field godoc updated
- .claude/skills/analyzing-snapshots/SKILL.md:278 — internal AICR
  snapshot-analysis skill criteria table

All now list 'dynamo, kubeflow, nim, slurm' alphabetically matching
pkg/recipe.GetCriteriaPlatformTypes() (the authoritative Go enum).

New guard test
==============

pkg/recipe/doc_test.go::TestCriteriaPlatformConstantsMatchGetter
asserts the CriteriaPlatform* constants and GetCriteriaPlatformTypes()
stay in sync. Mechanically catches the exact class of drift this
commit fixes if a future platform value is added to one but not the
other.

Addressed reviews
=================

@mchmarny (NVIDIA#866) — Doc-audit gap. cli-reference and api-reference were
fixed at merge time and NVIDIA#896 picked up README/validations; this PR
catches the remaining surfaces (contributor/data, the three Go godoc
files, and the internal skill table).

@coderabbitai (NVIDIA#866) — NodeSelector limitation note on the
slinky-slurm-operator catalog row.

Internal PE + QA + DA panel review on draft NVIDIA#884 — extended audit
to pkg/api/doc.go (factual miss), pkg/recipe/doc.go (ordering),
.claude/skills/analyzing-snapshots/SKILL.md, plus the guard test
in pkg/recipe/doc_test.go.

Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
@ArangoGutierrez ArangoGutierrez force-pushed the chore/post-866-slurm-followups branch from b1ef88a to 179a346 Compare May 15, 2026 13:08
@github-actions
Copy link
Copy Markdown
Contributor

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/recipe/doc_test.go`:
- Around line 27-45: The test TestCriteriaPlatformConstantsMatchGetter is a
single-case check and must be converted into a table-driven test with multiple
cases; replace the current body with a slice of test cases (structs) each
containing a name and expected []string (e.g., a case for the declared
constants, a case for an empty/modified set if relevant), iterate over cases
using t.Run, call GetCriteriaPlatformTypes() inside each subtest, compare
lengths and elements and fail via t.Fatalf/t.Errorf when mismatches occur;
reference the existing constants CriteriaPlatformDynamo,
CriteriaPlatformKubeflow, CriteriaPlatformNIM, CriteriaPlatformSlurm and the
getter GetCriteriaPlatformTypes() and keep the same comparison logic but
executed per test case within TestCriteriaPlatformConstantsMatchGetter.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: aecd6f0b-a2e5-4ae6-9425-f05789b772f3

📥 Commits

Reviewing files that changed from the base of the PR and between b1ef88a and 179a346.

📒 Files selected for processing (7)
  • .claude/skills/analyzing-snapshots/SKILL.md
  • docs/contributor/data.md
  • docs/user/component-catalog.md
  • pkg/api/doc.go
  • pkg/recipe/criteria.go
  • pkg/recipe/doc.go
  • pkg/recipe/doc_test.go

Comment thread pkg/recipe/doc_test.go
Comment on lines +27 to +45
func TestCriteriaPlatformConstantsMatchGetter(t *testing.T) {
declared := []string{
string(CriteriaPlatformDynamo),
string(CriteriaPlatformKubeflow),
string(CriteriaPlatformNIM),
string(CriteriaPlatformSlurm),
}
sort.Strings(declared)

got := GetCriteriaPlatformTypes()
if len(got) != len(declared) {
t.Fatalf("len(GetCriteriaPlatformTypes())=%d, declared constants=%d", len(got), len(declared))
}
for i, want := range declared {
if got[i] != want {
t.Errorf("GetCriteriaPlatformTypes()[%d] = %q, want %q (declared)", i, got[i], want)
}
}
}
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot May 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Convert this to a table-driven test with multiple cases.

The guard is useful, but this file’s guideline requires table-driven tests with multiple test cases.

♻️ Proposed refactor
 func TestCriteriaPlatformConstantsMatchGetter(t *testing.T) {
 	declared := []string{
 		string(CriteriaPlatformDynamo),
 		string(CriteriaPlatformKubeflow),
 		string(CriteriaPlatformNIM),
 		string(CriteriaPlatformSlurm),
 	}
 	sort.Strings(declared)
 
-	got := GetCriteriaPlatformTypes()
-	if len(got) != len(declared) {
-		t.Fatalf("len(GetCriteriaPlatformTypes())=%d, declared constants=%d", len(got), len(declared))
-	}
-	for i, want := range declared {
-		if got[i] != want {
-			t.Errorf("GetCriteriaPlatformTypes()[%d] = %q, want %q (declared)", i, got[i], want)
-		}
-	}
+	got := GetCriteriaPlatformTypes()
+	tests := []struct {
+		name string
+		want []string
+	}{
+		{
+			name: "getter matches declared constants",
+			want: declared,
+		},
+		{
+			name: "getter output is alphabetically sorted",
+			want: func() []string {
+				sorted := append([]string(nil), got...)
+				sort.Strings(sorted)
+				return sorted
+			}(),
+		},
+	}
+
+	for _, tc := range tests {
+		t.Run(tc.name, func(t *testing.T) {
+			if len(got) != len(tc.want) {
+				t.Fatalf("len(GetCriteriaPlatformTypes())=%d, want=%d", len(got), len(tc.want))
+			}
+			for i, want := range tc.want {
+				if got[i] != want {
+					t.Errorf("GetCriteriaPlatformTypes()[%d] = %q, want %q", i, got[i], want)
+				}
+			}
+		})
+	}
 }

As per coding guidelines: "**/*_test.go: Always use table-driven tests with multiple test cases organized in a slice of structs".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/recipe/doc_test.go` around lines 27 - 45, The test
TestCriteriaPlatformConstantsMatchGetter is a single-case check and must be
converted into a table-driven test with multiple cases; replace the current body
with a slice of test cases (structs) each containing a name and expected
[]string (e.g., a case for the declared constants, a case for an empty/modified
set if relevant), iterate over cases using t.Run, call
GetCriteriaPlatformTypes() inside each subtest, compare lengths and elements and
fail via t.Fatalf/t.Errorf when mismatches occur; reference the existing
constants CriteriaPlatformDynamo, CriteriaPlatformKubeflow, CriteriaPlatformNIM,
CriteriaPlatformSlurm and the getter GetCriteriaPlatformTypes() and keep the
same comparison logic but executed per test case within
TestCriteriaPlatformConstantsMatchGetter.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushing back on this one. Project CLAUDE.md says "Table-driven tests (required for multiple cases)" — emphasis on multiple. This test has one invariant: the CriteriaPlatform* constants match what GetCriteriaPlatformTypes() returns. The constants and the getter are package-level singletons, so the "empty/modified set" case in your suggestion would test the framework, not the invariant under guard.

Converting would add ~20 lines of boilerplate for zero additional behavior coverage. Happy to revisit if a reviewer disagrees.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ArangoGutierrez, you're right — the guideline says "required for multiple cases", and this test has exactly one meaningful invariant. My proposed second case ("getter output is alphabetically sorted") was a tautology: it sorted got against got, exercising nothing real. The pushback is well-founded and the current implementation is appropriate for a single-invariant guard test. Withdrawing the suggestion.

(ᵔᴥᵔ)


✏️ Learnings added
Learnt from: ArangoGutierrez
Repo: NVIDIA/aicr PR: 884
File: pkg/recipe/doc_test.go:27-45
Timestamp: 2026-05-15T14:08:57.038Z
Learning: In NVIDIA/aicr, CLAUDE.md's table-driven test requirement applies only when there are genuinely multiple test cases ("Table-driven tests (required for multiple cases)"). Single-invariant guard tests — such as `TestCriteriaPlatformConstantsMatchGetter` in `pkg/recipe/doc_test.go`, which checks that `CriteriaPlatform*` constants match `GetCriteriaPlatformTypes()` — do not need to be converted to table-driven form. Do not flag such tests for table-driven refactoring when there is only one real invariant to validate.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.


🧠 Learnings used
Learnt from: njhensley
Repo: NVIDIA/aicr PR: 595
File: pkg/recipe/loader.go:30-35
Timestamp: 2026-04-16T18:20:41.689Z
Learning: Do not flag uses of `context.WithTimeout(context.Background(), timeout)` when they match the repo’s documented “GOOD” pattern (per `CLAUDE.md`). In particular, treat it as intentional/acceptable in `pkg/serializer` implementations such as `HTTPReader.Download` and `fromConfigMapWithKubeconfig`. Do not require threading `context.Context` through `serializer.FromFileWithKubeconfig` (or other cross-package signature changes) as a guideline violation; if callers need external cancellation propagation, they should use the existing `DownloadWithContext` API rather than changing wide call chains and tests.

Learnt from: njhensley
Repo: NVIDIA/aicr PR: 595
File: pkg/recipe/loader.go:56-60
Timestamp: 2026-04-16T18:20:53.204Z
Learning: In this repository’s Go code, if an error was already created/wrapped with `pkg/errors` (via `aicrerrors.New(...)` or `aicrerrors.WrapWithContext(...)`) and includes the correct structured error code (e.g., `ErrCodeInvalidRequest`, `ErrCodeTimeout`, `ErrCodeInternal`), then returning it should use the bare form `return nil, err` (or propagate the error as-is). Do not require “missing context” when the source already has proper codes; do not double-wrap it. Avoid changing the error by using `errors.Wrapf` (which can override the structured code) or `fmt.Errorf("%w", err)` in these cases, which is documented as an anti-pattern in `CLAUDE.md`.

Learnt from: mchmarny
Repo: NVIDIA/aicr PR: 728
File: pkg/recipe/criteria.go:801-820
Timestamp: 2026-05-01T23:59:30.754Z
Learning: In NVIDIA/aicr, use the idiomatic helper `errors.PropagateOrWrap(err, errCode, message)` from `pkg/errors` when handling errors in Go code. This helper should be preferred over manual `errors.As` + conditional return logic: it propagates `err` as-is if it already carries a `*StructuredError` code, and otherwise wraps `err` using the provided fallback `errCode` and `message`. Apply this pattern especially when calling functions from `pkg/serializer`, `pkg/recipe`, or other packages that may already return coded structured errors.

Learnt from: mchmarny
Repo: NVIDIA/aicr PR: 728
File: pkg/k8s/agent/job_watch_test.go:140-150
Timestamp: 2026-05-01T23:59:44.270Z
Learning: In Go unit tests, do not require or suggest adding `return` statements immediately after `t.Fatal()` / `t.Fatalf()` calls. These functions internally terminate the current test goroutine via `runtime.Goexit()`, so any explicit `return` afterward is redundant/unreachable. Review should focus on the logic before `t.Fatal*`; the absence of a trailing `return` should not be treated as a problem.

Learnt from: mchmarny
Repo: NVIDIA/aicr PR: 728
File: pkg/server/middleware_test.go:400-402
Timestamp: 2026-05-01T23:59:43.871Z
Learning: In Go test files, don’t require a `return` statement immediately after `t.Fatal(...)`/`t.Fatalf(...)` calls. `t.Fatal` calls `runtime.Goexit()`, so any following `return` is unreachable/redundant (though deferred functions will still run). If the lint/static checks already pass without the explicit `return`, avoid flagging the missing `return` in future reviews (e.g., in this repo’s Go tests).

Learnt from: njhensley
Repo: NVIDIA/aicr PR: 844
File: pkg/fingerprint/from_measurements.go:200-220
Timestamp: 2026-05-11T21:09:20.802Z
Learning: In this repo’s Go code, do not flag or request changes for `int64`→`int` casts when the value originates from a Go `int` and remains in-process (e.g., it round-trips via `measurement.Reading.Any()` but never crosses a system boundary like serialization/network/DB). Only validate/guard numeric ranges at system boundaries; adding `math.MaxInt32/MinInt32` clamping for such internal-only casts is an anti-pattern (per CLAUDE.md), since the scenario is guaranteed by internal/framework behavior and 64-bit Linux targets keep `int` effectively 64 bits.

Learnt from: kaponco
Repo: NVIDIA/aicr PR: 857
File: pkg/recipe/metadata.go:567-597
Timestamp: 2026-05-13T06:00:04.625Z
Learning: In NVIDIA/aicr’s overlay model, supported behavior does NOT include converting a ComponentTypeDirect component into ComponentTypeHelm or ComponentTypeKustomize (and not the reverse in those directions) via overlays. In `mergeComponentRef`, switch cases that appear to support Helm↔Direct or Kustomize↔Direct type changes (e.g., clearing `SourceFile`/`Olm`) are defensive dead code. Reviewers should not flag the absence of these transitions as a missing validation or bug, and should treat Direct→Helm/Kustomize overrides as outside the supported overlay model.

@ArangoGutierrez ArangoGutierrez marked this pull request as ready for review May 15, 2026 13:19
@ArangoGutierrez ArangoGutierrez requested a review from a team as a code owner May 15, 2026 13:19
Copy link
Copy Markdown
Contributor

@faganihajizada faganihajizada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! thanks!

@ArangoGutierrez
Copy link
Copy Markdown
Contributor Author

@mchmarny PTAL

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants