feat(validation): scope strict perf floor to accelerator-bound recipes#1009
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Enterprise Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughThe change adds an Accelerator field to overlay classification and surfaces it in classification.String(). requiresPerformance() now returns false when Accelerator is empty or equals CriteriaAcceleratorAny (in addition to IsKind cases). classifyOverlay copies Criteria.Accelerator into classification.Accelerator. TestClassifyOverlay is expanded with an accelerator dimension and updated expectations so performance is required only for concrete, non-any accelerators. Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
02944c0 to
bb5ffbe
Compare
…VIDIA#1005) Refine pkg/recipe/validation_phase_floor_test.go::requiresPerformance to exempt accelerator-unbound classifications (Criteria.Accelerator unset or CriteriaAcceleratorAny). The perf threshold is accelerator-specific (H100 EKS >= 300 GB/s NCCL, GB200 EKS >= 40 / 500 GB/s NET / NVLS, etc.) so no meaningful constraint exists at the intent layer. Concrete-leaf descendants continue to carry their own threshold via per-phase replace. Drops strict-mode warnings: 10 -> 5 by exempting the 5 accelerator-unbound intermediates (aks-training, eks-training, gke-cos-training, lke-training, oke-training). NIM perf block deferred ----------------------- An earlier revision of this PR added an inference-perf block to recipes/overlays/h100-eks-ubuntu-inference-nim.yaml mirroring the H100 Dynamo placeholder. Codex's review flagged that the inference-perf check in validators/performance/inference_perf_constraint.go:195-197 short-circuits with "skipped - dynamo-platform not in recipe components" whenever the resolved recipe lacks dynamo-platform. NIM recipes declare k8s-nim-operator instead, so the placeholder would satisfy the floor-test letter but never actually run. Reverted that edit; tracking the validator extension in NVIDIA#1010. Remaining 5 strict-mode failures -------------------------------- - gb200-oke-ubuntu-inference-dynamo (OCI testbed, NVIDIA#1007) - h100-aks-ubuntu-inference-dynamo (Azure testbed, NVIDIA#1006) - h100-eks-ubuntu-inference-nim (validator support, NVIDIA#1010) - rtx-pro-6000-lke-training (Linode testbed, NVIDIA#1008) - rtx-pro-6000-lke-ubuntu-training (Linode testbed, NVIDIA#1008)
bb5ffbe to
60a0903
Compare
Summary
Refine
pkg/recipe/validation_phase_floor_test.go::requiresPerformanceto exempt accelerator-unbound classifications (Criteria.Acceleratorunset orCriteriaAcceleratorAny). The perf threshold is accelerator-specific (H100 EKS ≥ 300 GB/s NCCL, GB200 EKS ≥ 40 / 500 GB/s NET / NVLS, RTX Pro 6000 ≥ TBD) so no meaningful constraint exists at the intent layer; concrete-leaf descendants continue to carry their own threshold via per-phase replace.Drops
AICR_VALIDATION_FLOOR_STRICT=1warnings from 10 → 5 by exempting the 5 accelerator-unbound intermediates (aks-training,eks-training,gke-cos-training,lke-training,oke-training).Motivation / Context
Closes #1005. Original PR description claimed 10 → 4 by also adding an
inference-perfplaceholder toh100-eks-ubuntu-inference-nim. Codex's review (P2) flagged that theinference-perfcheck invalidators/performance/inference_perf_constraint.go:195-197short-circuits with"skipped - dynamo-platform not in recipe components"whenever the resolved recipe lacksdynamo-platform. NIM recipes declarek8s-nim-operatorinstead, so the placeholder would satisfy the floor-test letter but never actually run. The NIM block was reverted; the validator extension is tracked in #1010.Fixes: #1005
Related: #969, #1001, #1006, #1007, #1008, #1010
Type of Change
Component(s) Affected
cmd/aicr,pkg/cli)cmd/aicrd,pkg/api,pkg/server)pkg/recipe) — floor test refinementpkg/validator) — floor contract changeImplementation Notes
requiresPerformanceordering — accelerator-unset check sits after the Kind check (Kind is already returned-false earlier) and before the intent-specific branches. The Accelerator field usesCriteriaAcceleratorType; empty string covers the YAML-unset case andCriteriaAcceleratorAnycovers the explicitaccelerator: anycase (used in wildcard overlays, though those are already filtered out atenumerateGateableOverlays).TestClassifyOverlayupdated with the newacceleratorfield on existing cases (allCriteriaAcceleratorH100since their resolved-criteria origin all bind H100) and three new cases covering the exempted paths.gb200-eks-ubuntu-inference-dynamooverlay already has a realinference-perfblock (added out-of-band), and the NIM equivalent waits on feat(validator): support NIM workloads in inference-perf so NIM recipes can ship a real performance gate #1010.Remaining strict-mode failures after this PR (5)
gb200-oke-ubuntu-inference-dynamoh100-aks-ubuntu-inference-dynamoh100-eks-ubuntu-inference-nimrtx-pro-6000-lke-trainingrtx-pro-6000-lke-ubuntu-trainingTesting
Risk Assessment
Rollout notes: No migration. The floor-test contract is internal CI machinery; no user-facing API surface changes.
Checklist
make testwith-race)make lint)TestClassifyOverlaygit commit -S)