Summary
The inference-perf check in `validators/performance/inference_perf_constraint.go:193-197` short-circuits with `status: "skipped - dynamo-platform not in recipe components"` whenever the resolved recipe's `componentRefs` lacks `dynamo-platform`. A NIM recipe (`h100-eks-ubuntu-inference-nim` and any future NIM leaves) declares `k8s-nim-operator` instead, so the validator silently skips. Adding a placeholder `inference-perf` block to a NIM overlay would satisfy the floor-test letter but ship no real runtime gate.
Motivation / Context
Surfaced during Codex review of #1009. The original PR added a placeholder `inference-perf` block to `h100-eks-ubuntu-inference-nim`, which the strict floor accepted but the validator would never actually run. PR #1009 was updated to revert the NIM block — see #1009 commit history — leaving NIM coverage genuinely absent until this issue is closed.
Contributor docs (`docs/contributor/validations.md`, search for `inference-perf`) currently describe the check as inference + Dynamo plus a `DynamoGraphDeployment` workload. The skip behavior is intentional under that contract, but it means every NIM recipe ships without performance validation.
Proposed scope
Pick one of the following directions (file additional follow-ups if the chosen direction isn't a single PR):
Direction A — Extend `inference-perf` to NIM
- Detect `k8s-nim-operator` (or `dynamo-platform`) at runtime; pick the corresponding deployment path.
- For NIM, deploy a representative `NIMService` (or equivalent CR per the NIM operator schema) and benchmark against it instead of a `DynamoGraphDeployment`.
- Same output metrics (`inference-throughput`, `inference-ttft-p99`) so existing constraint names continue to work.
Direction B — Introduce a sibling check (e.g., `nim-inference-perf`)
- Mirrors the existing one but only runs when `k8s-nim-operator` is in components.
- NIM overlays declare `checks: [nim-inference-perf]` instead.
- Pros: keeps the two runtimes' deployment surfaces separate.
- Cons: requires constraint-name divergence if metrics differ.
Direction C — Generic harness with pluggable runtimes
- Refactor `inference-perf` to a small dispatch table keyed on detected runtime.
- Future runtimes (e.g., vLLM Production Stack, TRT-LLM) plug in by adding a deploy + collect implementation.
A is the smallest change for now; C is the cleanest if more runtimes are coming.
Done when
- An `inference-perf` (or sibling) gate produces real numbers for a representative NIM microservice on H100 / EKS / Ubuntu.
- `h100-eks-ubuntu-inference-nim.yaml` declares the matching `performance.checks` block with empirically-grounded thresholds (or smoke-test floors, with a comment to that effect).
- The strict-mode floor test (`AICR_VALIDATION_FLOOR_STRICT=1 go test ./pkg/recipe/... -run TestOverlayValidationPhaseFloor`) no longer flags `h100-eks-ubuntu-inference-nim`.
Out of scope (track separately)
- Multi-model NIM benchmarking — pick one model (e.g., Qwen3 or Llama 3.1) to establish baseline; extend later.
- AKS / GKE / OKE NIM leaves — file as testbed availability lands.
Related
Summary
The
inference-perfcheck in `validators/performance/inference_perf_constraint.go:193-197` short-circuits with `status: "skipped - dynamo-platform not in recipe components"` whenever the resolved recipe's `componentRefs` lacks `dynamo-platform`. A NIM recipe (`h100-eks-ubuntu-inference-nim` and any future NIM leaves) declares `k8s-nim-operator` instead, so the validator silently skips. Adding a placeholder `inference-perf` block to a NIM overlay would satisfy the floor-test letter but ship no real runtime gate.Motivation / Context
Surfaced during Codex review of #1009. The original PR added a placeholder `inference-perf` block to `h100-eks-ubuntu-inference-nim`, which the strict floor accepted but the validator would never actually run. PR #1009 was updated to revert the NIM block — see #1009 commit history — leaving NIM coverage genuinely absent until this issue is closed.
Contributor docs (`docs/contributor/validations.md`, search for `inference-perf`) currently describe the check as inference + Dynamo plus a `DynamoGraphDeployment` workload. The skip behavior is intentional under that contract, but it means every NIM recipe ships without performance validation.
Proposed scope
Pick one of the following directions (file additional follow-ups if the chosen direction isn't a single PR):
Direction A — Extend `inference-perf` to NIM
Direction B — Introduce a sibling check (e.g., `nim-inference-perf`)
Direction C — Generic harness with pluggable runtimes
A is the smallest change for now; C is the cleanest if more runtimes are coming.
Done when
Out of scope (track separately)
Related