Skip to content

Fix WebGPU EP crash on exit#27569

Open
fs-eire wants to merge 7 commits intomainfrom
fs-eire/fix-webgpu-crash-at-exit
Open

Fix WebGPU EP crash on exit#27569
fs-eire wants to merge 7 commits intomainfrom
fs-eire/fix-webgpu-crash-at-exit

Conversation

@fs-eire
Copy link
Contributor

@fs-eire fs-eire commented Mar 5, 2026

Description

Fixes crash on exit in some scenario.

  • When onnxruntime::webgpu::WebGpuContextFactory::Cleanup() is called:

    • This is when:

      • OrtEnv::~OrtEnv() is called (for embedded WebGPU EP), or
      • ReleaseEpFactory() is called (for WebGPU plugin EP)
    • This will cause:

      • destructor of onnxruntime::webgpu::WebGpuContextFactory will be called, so that:
        • default WGPU instance will be released
        • each of the active instance of WebGpuContext will be destructed
      • the destruction may call into TLS of DXC threads, hosted in dxcompiler.dll
      • dxcompiler.dll may be unloaded already (if it's called from _at_exit)
    • Solution

      • explicitly load the dxc DLLs and keep a reference to them
      • explicit implementation of destructor of onnxruntime::webgpu::WebGpuContextFactory to ensure resources release in the expected order
  • When onnxruntime::webgpu::WebGpuContextFactory::Cleanup() is NOT called:

    • This is when the process is being killed
    • This will cause:
      • unexpected order of DLL unload
    • Solution
      • intentionally leak static members in onnxruntime::webgpu::WebGpuContextFactory

@fs-eire fs-eire requested a review from Copilot March 5, 2026 22:57
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a crash-on-exit issue in the WebGPU execution provider caused by DLL unload ordering. When WebGpuContextFactory::Cleanup() runs, dependent DLLs like dxcompiler.dll may have already been unloaded, leading to crashes during resource destruction.

Changes:

  • Explicitly loads and holds references to dxil.dll and dxcompiler.dll to prevent premature unloading.
  • Changes default_instance_ from wgpu::Instance (C++ RAII wrapper) to raw WGPUInstance for explicit lifetime control.
  • Reorders cleanup in WebGpuContextFactory::Cleanup() to ensure resources are released before DLL handles.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.

File Description
onnxruntime/core/providers/webgpu/webgpu_context.h Reorders static members, changes default_instance_ to raw WGPUInstance, adds modules_ and modules_dxc_loaded_ fields.
onnxruntime/core/providers/webgpu/webgpu_context.cc Moves context map allocation inside instance creation block, adds DXC DLL loading logic, implements explicit cleanup with correct ordering.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

fs-eire and others added 5 commits March 5, 2026 15:56
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@fs-eire fs-eire marked this pull request as ready for review March 6, 2026 00:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants