RELOPS-2372: bump NVIDIA A10 GRID driver to 573.96 (vGPU 18.x)#1218
Merged
Conversation
Azure is retiring vGPU 17.x for NVadsA10_v5 VMs on 2026-06-15 ahead of the vGPU 20.x (R595.x) host driver rollout (Service Health tracking ID 0YSB-WGZ). 553.62 sits in the 17.x branch and must move to 18.x before the deadline to avoid worker interruptions when Azure ships the host update. 573.96 is the current Azure-redistributed GRID 18.6 build for Windows 11 25H2. It drops Server 2019 from the installer's OS list (vGPU 18.x does not support 2019), which is fine for the 25H2 pool. Also raised the kitchen serverspec check for the downloaded installer to assert a non-trivial size (>100 MB), so a placeholder or truncated file on the blob mirror is caught early. Full install verification still has to happen at worker-image-validation time because the GRID installer needs a reboot to complete. Open follow-ups tracked on RELOPS-2372: - Upload 573.96 .exe to the windows.ext_pkg_src Azure blob before merge - Confirm whether win11-64-24h2-gpu also runs on A10 hardware; if so, the gpu.* entry (538.15) needs the same upgrade
Serverspec's winrm backend raises NotImplementedError on file.size, so the size check has to go through Get-Item .Length instead.
Contributor
Author
Integration test results — win11-64-25h2-alphaWorkflow run: https://github.com/mozilla-platform-ops/worker-images/actions/runs/26103935100 Result: 52/54 passed, 2 failed Both failures are pre-existing and unrelated to this change:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
gpu_a10indata/os/Windows.yamlfrom GRID 553.62 (vGPU 17.x) to 573.96 (vGPU 18.x) ahead of Azure's 2026-06-15 deadline for NVadsA10_v5 VMs (Service Health tracking ID0YSB-WGZ). After that date, Azure begins rolling out the vGPU 20.x (R595.x) host driver, which is incompatible with anything older than 18.x.win116425h2azureto assert the downloaded installer is> 100 MB. The previous check only confirmed the file existed, which would still pass if the blob mirror served a placeholder or 0-byte file.Blockers before merge
573.96_grid_win10_win11_server2022_server2025_dch_64bit_international_azure_swl.exeinstaller must be uploaded to thewindows.ext_pkg_srcAzure blob mirror first. Source: Azure N-series Windows driver setup (direct link).file { $driver_exe: source => ... }will fail during converge.Test plan
windows.ext_pkg_srckitchen-windows.ymlconverge + verify onwin11-64-25h2win11-64-25h2-gpuworker image with the new drivernvidia-smireturns version573.96after first bootOut of scope
gpu.*entry (538.15) used bywin11-64-24h2-gpuviawin116424h2azure. Need to confirm whether that pool runs on A10 hardware. If it does, it needs the same upgrade. Tracked on RELOPS-2372.