Skip to content

feat(recipes): add concrete L40 / L40S service-bound overlays #1003

@yuanchen8911

Description

@yuanchen8911

Summary

The L40 accelerator is declared in pkg/recipe/criteria.go (CriteriaAcceleratorL40 = \"l40\") but has zero overlays in recipes/overlays/. A user running aicr recipe --accelerator l40 --service <any> cannot resolve a usable recipe.

Motivation / Context

Surfaced as an explicit "out of scope (track separately)" item in #969 — the validation phase coverage audit excluded L40 because no overlays exist to audit. Filing this so the gap has a dedicated tracker.

L40 / L40S are the workhorse inference cards (Ada Lovelace, 48GB) and are the preferred Cost/$TPS option on every hyperscaler for inference workloads. Their absence means AICR cannot serve the inference user segment without manual recipe authoring.

L40 / L40S cloud SKU availability

Cloud SKU
AWS EKS g6e.{2,4,8,12,16,24,48}xlarge (L40S, 1–8x)
GCP GKE g2-standard-* (L4 and L40S variants)
Azure AKS Standard_NV*ads_A10_v5 (A10 — note: not L40), L40S on newer NVads-v6 series as it lands
OCI OKE BM.GPU.L40S.4 (4x L40S 48GB)
Lambda Labs / CoreWeave / etc. L40S widely available

Note: This issue intentionally groups L40 and L40S because the criteria type is a single l40 enum value. If L40S vs L40 needs distinct overlays (different power/TDP, different NCCL profiles), file a separate issue to split the enum.

Suggested scope

Minimum viable for the first PR, extensions as follow-ups:

PR 1 (minimum): L40S on EKS (best-attested cloud):

  • l40-eks-inference.yaml (primary use case)
  • l40-eks-ubuntu-inference.yaml
  • l40-eks-ubuntu-inference-dynamo.yaml and / or l40-eks-ubuntu-inference-nim.yaml (matches the H100 inference pattern)
  • Per-accelerator constraint: Deployment.gpu-operator.version floor — L40 supported since v23.6; recommend >= v23.9.0 baseline
  • No NCCL bandwidth threshold needed for single-node inference (the primary L40 use case); revisit if multi-node training overlays are added

PR 2+: Same patterns for GKE, OKE; AKS once a non-A10 SKU is current there.

Training overlays are deferred. L40 is rarely chosen for multi-node training; if a use case emerges, file a follow-up issue with the NCCL threshold spec.

Each PR should:

Out of scope (file separately)

  • L40 multi-node training overlays — defer until requested.
  • L40 vs L40S enum split — if SKU-specific config diverges, file a separate pkg/recipe/criteria.go change.
  • A10 / A40 / L4 — same Ada/Ampere inference-card class but distinct SKUs; not declared in criteria.go today.

Related

Metadata

Metadata

Assignees

No one assigned
    No fields configured for Enhancement.

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions