Summary
Add an inference-perf performance-phase constraint to recipes/overlays/h100-aks-ubuntu-inference-dynamo.yaml once an Azure testbed is available to produce empirically-grounded thresholds.
Affected overlay
h100-aks-ubuntu-inference-dynamo — Dynamo inference platform on H100 / AKS / Ubuntu
This is the only AKS overlay flagged by the strict-mode floor today (AICR_VALIDATION_FLOOR_STRICT=1). The training counterparts (h100-aks-training, h100-aks-ubuntu-training, h100-aks-ubuntu-training-kubeflow) already carry NCCL thresholds (>= 100 GB/s, sized for Azure's IB topology).
Blocker
No Azure testbed available today for running inference-perf (Qwen/Qwen3-0.6B at concurrency=16/GPU) on AKS H100 nodes. Without testbed access we'd be picking thresholds blind; the reference pattern at recipes/overlays/h100-eks-ubuntu-inference-dynamo.yaml:65-79 describes them as "placeholder thresholds pending empirical tuning" — Azure deserves the same care.
Done when
- An Azure AKS H100 testbed produces baseline
inference-perf numbers (throughput tok/s, TTFT p99 ms) for Dynamo serving Qwen/Qwen3-0.6B at concurrency=16/GPU.
h100-aks-ubuntu-inference-dynamo.yaml gains a performance.checks: [inference-perf] block with the empirically-tuned (or smoke-floor) thresholds.
- The overlay disappears from the
AICR_VALIDATION_FLOOR_STRICT=1 floor test output.
Related
Summary
Add an
inference-perfperformance-phase constraint torecipes/overlays/h100-aks-ubuntu-inference-dynamo.yamlonce an Azure testbed is available to produce empirically-grounded thresholds.Affected overlay
h100-aks-ubuntu-inference-dynamo— Dynamo inference platform on H100 / AKS / UbuntuThis is the only AKS overlay flagged by the strict-mode floor today (
AICR_VALIDATION_FLOOR_STRICT=1). The training counterparts (h100-aks-training,h100-aks-ubuntu-training,h100-aks-ubuntu-training-kubeflow) already carry NCCL thresholds (>= 100 GB/s, sized for Azure's IB topology).Blocker
No Azure testbed available today for running
inference-perf(Qwen/Qwen3-0.6B at concurrency=16/GPU) on AKS H100 nodes. Without testbed access we'd be picking thresholds blind; the reference pattern atrecipes/overlays/h100-eks-ubuntu-inference-dynamo.yaml:65-79describes them as "placeholder thresholds pending empirical tuning" — Azure deserves the same care.Done when
inference-perfnumbers (throughput tok/s, TTFT p99 ms) for Dynamo serving Qwen/Qwen3-0.6B at concurrency=16/GPU.h100-aks-ubuntu-inference-dynamo.yamlgains aperformance.checks: [inference-perf]block with the empirically-tuned (or smoke-floor) thresholds.AICR_VALIDATION_FLOOR_STRICT=1floor test output.Related