feat(dag): fallback to CPU transport for TorchTensorType(transport='n…#64239
feat(dag): fallback to CPU transport for TorchTensorType(transport='n…#64239caosfourn wants to merge 1 commit into
Conversation
There was a problem hiding this comment.
Code Review
This pull request implements a fallback mechanism to CPU/shared-memory transport with a warning when TorchTensorType requires an accelerator but is used outside of a Compiled Graph (i.e., _communicator_id is None). It also adds corresponding unit tests. A review comment points out that the warning message incorrectly refers to transport='nccl' instead of transport='accelerator', which is the correct option.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
79a793e to
1a7ba86
Compare
…ccl') in non-compiled graphs Signed-off-by: CaosFourN <huynhdnhannd@gmail.com>
1a7ba86 to
5c3bf9a
Compare
Description
This PR implements a fallback mechanism to CPU/Shared Memory transport for TorchTensorType(transport="nccl") when executed outside of Compiled Graphs (i.e. in traditional non-compiled DAGs).
Currently, specifying the
"nccl"or"accelerator"transport outside of compiled graphs leads to anAssertionError(or crashes) because the communicator group (communicator_id) and communicator context have not been initialized by the Compiled Graph compiler.To support debugging and rapid prototyping in non-compiled mode, this PR intercepts cases where no communicator has been set up inside
TorchTensorType.create_channel(), emits aUserWarning, and automatically falls back toSharedMemoryType().create_channel().Related issues
Related to #43328
Additional information
Implementation Details:
python/ray/experimental/channel/torch_tensor_type.py:
self._communicator_idandself._communicatorare bothNonewhenself.requires_accelerator()is true.SharedMemoryTypechannel creation.python/ray/dag/tests/experimental/test_non_compiled_nccl_dag.py:
UserWarningand fall back to a functional CPU execution path without crashing.Testing: