Summary
Add an NCCL all-reduce-bw performance constraint to the RTX Pro 6000 / LKE training overlays once a Linode LKE testbed is available to produce empirically-grounded thresholds.
Affected overlays
| Overlay |
Required performance constraint kind |
rtx-pro-6000-lke-training |
nccl-all-reduce-bw |
rtx-pro-6000-lke-ubuntu-training |
nccl-all-reduce-bw |
These are the only LKE overlays flagged by the strict-mode floor today (AICR_VALIDATION_FLOOR_STRICT=1). The inference counterparts (rtx-pro-6000-lke-inference, rtx-pro-6000-lke-ubuntu-inference) are not gated for performance — RTX Pro 6000 single-node inference is the primary use case there.
Blocker
No Linode LKE testbed available today for running NCCL benchmarks on multi-node RTX Pro 6000 instances. The constraint value depends on the actual interconnect Linode exposes for these nodes — could be plain Ethernet (sub-50 GB/s expected) or a higher-bandwidth fabric — and we shouldn't pick a threshold blind.
Design notes
- RTX Pro 6000 is a workstation-class card; multi-node NCCL throughput on LKE will likely be Ethernet-bound rather than RDMA-bound. Threshold should reflect realistic Linode networking (likely much lower than H100 EKS's 300 GB/s).
- If LKE multi-node training turns out to be impractical (no high-bandwidth interconnect), an alternative is to mark these overlays as
intent: training but single-node-only and exempt them from the multi-node NCCL floor. File a follow-up if that's the conclusion.
Done when
- Linode LKE testbed produces baseline NCCL all-reduce-bw numbers on multi-node RTX Pro 6000 instances.
- Both overlays gain a
performance.checks: [nccl-all-reduce-bw] block with an empirically-tuned constraint.
- The overlays disappear from the
AICR_VALIDATION_FLOOR_STRICT=1 floor test output.
Related
Summary
Add an NCCL all-reduce-bw performance constraint to the RTX Pro 6000 / LKE training overlays once a Linode LKE testbed is available to produce empirically-grounded thresholds.
Affected overlays
rtx-pro-6000-lke-trainingnccl-all-reduce-bwrtx-pro-6000-lke-ubuntu-trainingnccl-all-reduce-bwThese are the only LKE overlays flagged by the strict-mode floor today (
AICR_VALIDATION_FLOOR_STRICT=1). The inference counterparts (rtx-pro-6000-lke-inference,rtx-pro-6000-lke-ubuntu-inference) are not gated for performance — RTX Pro 6000 single-node inference is the primary use case there.Blocker
No Linode LKE testbed available today for running NCCL benchmarks on multi-node RTX Pro 6000 instances. The constraint value depends on the actual interconnect Linode exposes for these nodes — could be plain Ethernet (sub-50 GB/s expected) or a higher-bandwidth fabric — and we shouldn't pick a threshold blind.
Design notes
intent: trainingbut single-node-only and exempt them from the multi-node NCCL floor. File a follow-up if that's the conclusion.Done when
performance.checks: [nccl-all-reduce-bw]block with an empirically-tuned constraint.AICR_VALIDATION_FLOOR_STRICT=1floor test output.Related