Skip to content

feat(validator): support NIM workloads in inference-perf so NIM recipes can ship a real performance gate #1010

@yuanchen8911

Description

@yuanchen8911

Summary

The inference-perf check in `validators/performance/inference_perf_constraint.go:193-197` short-circuits with `status: "skipped - dynamo-platform not in recipe components"` whenever the resolved recipe's `componentRefs` lacks `dynamo-platform`. A NIM recipe (`h100-eks-ubuntu-inference-nim` and any future NIM leaves) declares `k8s-nim-operator` instead, so the validator silently skips. Adding a placeholder `inference-perf` block to a NIM overlay would satisfy the floor-test letter but ship no real runtime gate.

Motivation / Context

Surfaced during Codex review of #1009. The original PR added a placeholder `inference-perf` block to `h100-eks-ubuntu-inference-nim`, which the strict floor accepted but the validator would never actually run. PR #1009 was updated to revert the NIM block — see #1009 commit history — leaving NIM coverage genuinely absent until this issue is closed.

Contributor docs (`docs/contributor/validations.md`, search for `inference-perf`) currently describe the check as inference + Dynamo plus a `DynamoGraphDeployment` workload. The skip behavior is intentional under that contract, but it means every NIM recipe ships without performance validation.

Proposed scope

Pick one of the following directions (file additional follow-ups if the chosen direction isn't a single PR):

Direction A — Extend `inference-perf` to NIM

  • Detect `k8s-nim-operator` (or `dynamo-platform`) at runtime; pick the corresponding deployment path.
  • For NIM, deploy a representative `NIMService` (or equivalent CR per the NIM operator schema) and benchmark against it instead of a `DynamoGraphDeployment`.
  • Same output metrics (`inference-throughput`, `inference-ttft-p99`) so existing constraint names continue to work.

Direction B — Introduce a sibling check (e.g., `nim-inference-perf`)

  • Mirrors the existing one but only runs when `k8s-nim-operator` is in components.
  • NIM overlays declare `checks: [nim-inference-perf]` instead.
  • Pros: keeps the two runtimes' deployment surfaces separate.
  • Cons: requires constraint-name divergence if metrics differ.

Direction C — Generic harness with pluggable runtimes

  • Refactor `inference-perf` to a small dispatch table keyed on detected runtime.
  • Future runtimes (e.g., vLLM Production Stack, TRT-LLM) plug in by adding a deploy + collect implementation.

A is the smallest change for now; C is the cleanest if more runtimes are coming.

Done when

  • An `inference-perf` (or sibling) gate produces real numbers for a representative NIM microservice on H100 / EKS / Ubuntu.
  • `h100-eks-ubuntu-inference-nim.yaml` declares the matching `performance.checks` block with empirically-grounded thresholds (or smoke-test floors, with a comment to that effect).
  • The strict-mode floor test (`AICR_VALIDATION_FLOOR_STRICT=1 go test ./pkg/recipe/... -run TestOverlayValidationPhaseFloor`) no longer flags `h100-eks-ubuntu-inference-nim`.

Out of scope (track separately)

  • Multi-model NIM benchmarking — pick one model (e.g., Qwen3 or Llama 3.1) to establish baseline; extend later.
  • AKS / GKE / OKE NIM leaves — file as testbed availability lands.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/validatorenhancementNew feature or requesttheme/validationConstraint evaluation, health checks, and conformance evidence
    No fields configured for Enhancement.

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions