[CRL] Add Device API symmem and multicast support#471
Open
MC952-arch wants to merge 17 commits into
Open
Conversation
…ticast, add adaptor stubs
…/leak/adaptor issues
There was a problem hiding this comment.
Pull request overview
This PR introduces default-path “symmetric window” support (VMM-backed flat VA mapping plus optional multicast/NVLS), unifies the public flagcxWindow handle across vendor/default paths, and updates device/CCL adaptor plumbing plus new unit/perf tests to exercise the new behavior.
Changes:
- Add a new symmetric-heap implementation (
sym_heap) to backflagcxCommWindowRegister/Deregisteron the non-vendor path, including VMM flat VA mapping and multicast setup. - Unify window handling by defining
struct flagcxWindowinflagcx.h, splitting vendor window state intoflagcxInnerWindow, and wiring host→kernel window population in device traits. - Add new symmem unit/MPI tests and update one-sided perf tests to use the (now internal) one-sided registration entry point.
Reviewed changes
Copilot reviewed 54 out of 54 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| test/unittest/symmem/test_sym_window_struct.cpp | New local unit tests for window struct layout/flags and basic argument validation. |
| test/unittest/symmem/symmem_test.cpp | MPI fixture for symmetric-memory tests (handler/comm/streams/buffers setup). |
| test/unittest/symmem/Makefile | Adds symmem unit vs MPI test build/run targets. |
| test/unittest/symmem/main_mpi.cpp | MPI gtest runner main for symmem MPI tests. |
| test/unittest/symmem/include/symmem_test.hpp | Test fixture declarations/constants for symmem tests. |
| test/unittest/symmem/coll_sym_register.cpp | MPI tests for window register/deregister behavior and fields. |
| test/unittest/symmem/coll_sym_devmem.cpp | MPI tests for flagcxDevMemCreate/Destroy with/without windows. |
| test/unittest/symmem/coll_sym_access.cpp | MPI tests for cross-GPU access via flat VA mapping. |
| test/unittest/Makefile | Adds symmem to the unittest component list. |
| test/perf/host_api/test_put.cpp | Switches one-sided registration call to heteroComm + includes internal header. |
| test/perf/host_api/test_one_side_register.cpp | Same as above for the registration correctness perf test. |
| test/perf/host_api/test_get.cpp | Same as above for the get benchmark. |
| flagcx/kernels/custom_allreduce.cu | Makes PTX multimem + multicast kernel code available beyond vendor-only guards. |
| flagcx/include/flagcx.h | Defines unified struct flagcxWindow and removes public flagcxOneSideRegister declaration. |
| flagcx/flagcx.cc | Wires sym-heap fallback for window register/deregister; adds multicast capability detection and VMM alloc path for staged buffers; updates one-sided register signature usage. |
| flagcx/core/sym_heap.cc | New symmetric-heap implementation: VMM export/import, flat VA mapping, multicast setup, MR registration. |
| flagcx/core/ipcsocket.cc | Switches IPC socket type to UNIX datagram sockets. |
| flagcx/core/include/sym_heap.h | New header defining flagcxSymWindow and sym-heap register/deregister APIs. |
| flagcx/core/include/onesided.h | Exposes flagcxOneSideRegister(flagcxHeteroComm_t, ...) and includes comm type. |
| flagcx/core/include/dev_comm_state.h | Adds stagedVmmAlloc to track VMM-backed staged allocations for cleanup. |
| flagcx/adaptor/include/nvidia_adaptor.h | Moves vendor window definition to flagcxInnerWindow and removes flagcxWindow from this header. |
| flagcx/adaptor/include/flagcx_device_adaptor.h | Adds adaptor hooks for sym VMM export/import + multicast lifecycle. |
| flagcx/adaptor/include/flagcx_ccl_adaptor.h | Changes CCL adaptor window APIs to use flagcxInnerWindow_t instead of flagcxWindow_t. |
| flagcx/adaptor/include/device_api/nvidia_comm_traits.h | Adds host-side window population from unified flagcxWindow_t (vendor path). |
| flagcx/adaptor/include/device_api/flagcx_device.h | Updates device-memory/window capability comments to reflect symmetric vs IPC modes. |
| flagcx/adaptor/include/device_api/default_comm_traits.h | Extends default Window traits with symmetric (flat VA + multicast) vs asymmetric (IPC) modes and host-side population. |
| flagcx/adaptor/flagcx_device.cc | Unifies kernel Window allocation/population; adds symmetric-window handling and IPC fallback; updates comm properties multicast query. |
| flagcx/adaptor/device/tsmicro_adaptor.cc | Adds sym VMM/multicast stub functions to adaptor vtable. |
| flagcx/adaptor/device/tops_adaptor.cc | Adds sym VMM/multicast stub functions to adaptor vtable. |
| flagcx/adaptor/device/musa_adaptor.cc | Adds sym VMM/multicast stub functions to adaptor vtable. |
| flagcx/adaptor/device/mlu_adaptor.cc | Adds sym VMM/multicast stub functions to adaptor vtable. |
| flagcx/adaptor/device/maca_adaptor.cc | Adds sym VMM/multicast stub functions to adaptor vtable. |
| flagcx/adaptor/device/kunlunxin_adaptor.cc | Adds sym VMM/multicast stub functions to adaptor vtable. |
| flagcx/adaptor/device/ixcuda_adaptor.cc | Adds sym VMM/multicast stub functions to adaptor vtable. |
| flagcx/adaptor/device/hip_adaptor.cc | Adds sym VMM/multicast stub functions to adaptor vtable. |
| flagcx/adaptor/device/ducuda_adaptor.cc | Adds sym VMM/multicast stub functions to adaptor vtable. |
| flagcx/adaptor/device/cuda_adaptor.cc | Implements CUDA VMM export/import + flat VA map/unmap + multicast create/bind/teardown/free. |
| flagcx/adaptor/device/cann_adaptor.cc | Adds sym VMM/multicast stub functions to adaptor vtable. |
| flagcx/adaptor/ccl/xccl_adaptor.cc | Updates window API types to flagcxInnerWindow_t. |
| flagcx/adaptor/ccl/tccl_adaptor.cc | Updates window API types to flagcxInnerWindow_t. |
| flagcx/adaptor/ccl/rccl_adaptor.cc | Updates window API types to flagcxInnerWindow_t. |
| flagcx/adaptor/ccl/pccl_adaptor.cc | Updates window API types to flagcxInnerWindow_t. |
| flagcx/adaptor/ccl/nccl_adaptor.cc | Updates window API types and ensures inner-window allocation/cleanup on register failure. |
| flagcx/adaptor/ccl/musa_mccl_adaptor.cc | Updates window API types to flagcxInnerWindow_t. |
| flagcx/adaptor/ccl/mpi_adaptor.cc | Updates window API types to flagcxInnerWindow_t. |
| flagcx/adaptor/ccl/mccl_adaptor.cc | Updates window API types to flagcxInnerWindow_t. |
| flagcx/adaptor/ccl/ixnccl_adaptor.cc | Updates window API types to flagcxInnerWindow_t. |
| flagcx/adaptor/ccl/hccl_adaptor.cc | Updates window API types to flagcxInnerWindow_t. |
| flagcx/adaptor/ccl/gloo_adaptor.cc | Updates window API types to flagcxInnerWindow_t. |
| flagcx/adaptor/ccl/eccl_adaptor.cc | Updates window API types to flagcxInnerWindow_t. |
| flagcx/adaptor/ccl/dunccl_adaptor.cc | Updates window API types to flagcxInnerWindow_t. |
| flagcx/adaptor/ccl/cncl_adaptor.cc | Updates window API types to flagcxInnerWindow_t. |
| flagcx/adaptor/ccl/bootstrap_adaptor.cc | Updates window API types to flagcxInnerWindow_t. |
| adaptor_plugin/ccl/example/plugin.cc | Updates example plugin window API signatures to flagcxInnerWindow_t. |
Comments suppressed due to low confidence (1)
test/unittest/symmem/coll_sym_access.cpp:161
- The file ends with a section header comment about verifying multicast base, but no test follows. This dangling comment makes it unclear whether a multicast test was intended but omitted. Either add the missing test or remove the unused section header.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
f69ad39 to
e10faf3
Compare
9e9ed27 to
8c9db2d
Compare
8c9db2d to
7b62bfa
Compare
a8dfb97 to
9291684
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 59 out of 59 changed files in this pull request and generated 1 comment.
Comments suppressed due to low confidence (1)
flagcx/flagcx.cc:1273
- state->hasMulticast is set from reqs.intraMulticast (and on the default path, from symMulticastSupported) before ensuring the staged buffers actually have a multicast-capable window/mapping. If commWindowRegister returns flagcxNotSupported and comm->heteroComm is null, sendStagedWin stays null but hasMulticast can remain true, causing the custom allreduce to take the multicast path and kernels to dereference a null multicast pointer. Consider defaulting hasMulticast=false and only enabling it after staged window registration succeeds, with validation for both sym-heap (mcBase != nullptr) and vendor windows (or at least require sendStagedWin != nullptr).
5e084e6 to
5424883
Compare
5424883 to
f6c3036
Compare
3026a77 to
6cf31ef
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PR Category
CRL
PR Types
New Features
PR Description