Skip to content

feat(validation): add performance-phase constraints to AKS overlays once Azure testbed lands #1006

@yuanchen8911

Description

@yuanchen8911

Summary

Add an inference-perf performance-phase constraint to recipes/overlays/h100-aks-ubuntu-inference-dynamo.yaml once an Azure testbed is available to produce empirically-grounded thresholds.

Affected overlay

  • h100-aks-ubuntu-inference-dynamo — Dynamo inference platform on H100 / AKS / Ubuntu

This is the only AKS overlay flagged by the strict-mode floor today (AICR_VALIDATION_FLOOR_STRICT=1). The training counterparts (h100-aks-training, h100-aks-ubuntu-training, h100-aks-ubuntu-training-kubeflow) already carry NCCL thresholds (>= 100 GB/s, sized for Azure's IB topology).

Blocker

No Azure testbed available today for running inference-perf (Qwen/Qwen3-0.6B at concurrency=16/GPU) on AKS H100 nodes. Without testbed access we'd be picking thresholds blind; the reference pattern at recipes/overlays/h100-eks-ubuntu-inference-dynamo.yaml:65-79 describes them as "placeholder thresholds pending empirical tuning" — Azure deserves the same care.

Done when

  • An Azure AKS H100 testbed produces baseline inference-perf numbers (throughput tok/s, TTFT p99 ms) for Dynamo serving Qwen/Qwen3-0.6B at concurrency=16/GPU.
  • h100-aks-ubuntu-inference-dynamo.yaml gains a performance.checks: [inference-perf] block with the empirically-tuned (or smoke-floor) thresholds.
  • The overlay disappears from the AICR_VALIDATION_FLOOR_STRICT=1 floor test output.

Related

Metadata

Metadata

Assignees

No one assigned
    No fields configured for Enhancement.

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions