Conversation
c4dd99c to
370cf47
Compare
upstream main already has xpu_platform_plugin function and "xpu" entry in builtin_platform_plugins dict since PR sgl-project#17920.
The all_to_all_4D method calls self._maybe_wait() on the output of ft_c.all_to_all_single, but the method was never defined, causing AttributeError at runtime during multi-GPU inference. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
This PR builds on the XPU platform foundation merged in sgl-project#17920, adding the runtime-level changes needed to actually run diffusion inference on Intel XPU (Arc Pro B-series, etc.) with tensor parallelism support.
sgl-project#17920 added the platform detection (
XpuPlatform), attention backend (xpu_backend.py), platform plugin registration, and basicsgl_kernelintegration. This PR addresses the remaining gaps discovered during end-to-end testing on Intel Arc Pro B60 GPUs.Test Results
Tested on Intel Arc Pro B60 with Z-Image-Turbo (BF16, 9-step turbo schedule, prompt="A golden retriever in the snow"):
TP=1:

TP=2:

Notes
current_platform.is_xpu()checks — no impact on CUDA/ROCm/NPU paths.