Skip to content

[CRL] Add Device API symmem and multicast support#471

Open
MC952-arch wants to merge 17 commits into
flagos-ai:mainfrom
MC952-arch:symmetric-memory
Open

[CRL] Add Device API symmem and multicast support#471
MC952-arch wants to merge 17 commits into
flagos-ai:mainfrom
MC952-arch:symmetric-memory

Conversation

@MC952-arch
Copy link
Copy Markdown
Collaborator

PR Category

CRL

PR Types

New Features

PR Description

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces default-path “symmetric window” support (VMM-backed flat VA mapping plus optional multicast/NVLS), unifies the public flagcxWindow handle across vendor/default paths, and updates device/CCL adaptor plumbing plus new unit/perf tests to exercise the new behavior.

Changes:

  • Add a new symmetric-heap implementation (sym_heap) to back flagcxCommWindowRegister/Deregister on the non-vendor path, including VMM flat VA mapping and multicast setup.
  • Unify window handling by defining struct flagcxWindow in flagcx.h, splitting vendor window state into flagcxInnerWindow, and wiring host→kernel window population in device traits.
  • Add new symmem unit/MPI tests and update one-sided perf tests to use the (now internal) one-sided registration entry point.

Reviewed changes

Copilot reviewed 54 out of 54 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
test/unittest/symmem/test_sym_window_struct.cpp New local unit tests for window struct layout/flags and basic argument validation.
test/unittest/symmem/symmem_test.cpp MPI fixture for symmetric-memory tests (handler/comm/streams/buffers setup).
test/unittest/symmem/Makefile Adds symmem unit vs MPI test build/run targets.
test/unittest/symmem/main_mpi.cpp MPI gtest runner main for symmem MPI tests.
test/unittest/symmem/include/symmem_test.hpp Test fixture declarations/constants for symmem tests.
test/unittest/symmem/coll_sym_register.cpp MPI tests for window register/deregister behavior and fields.
test/unittest/symmem/coll_sym_devmem.cpp MPI tests for flagcxDevMemCreate/Destroy with/without windows.
test/unittest/symmem/coll_sym_access.cpp MPI tests for cross-GPU access via flat VA mapping.
test/unittest/Makefile Adds symmem to the unittest component list.
test/perf/host_api/test_put.cpp Switches one-sided registration call to heteroComm + includes internal header.
test/perf/host_api/test_one_side_register.cpp Same as above for the registration correctness perf test.
test/perf/host_api/test_get.cpp Same as above for the get benchmark.
flagcx/kernels/custom_allreduce.cu Makes PTX multimem + multicast kernel code available beyond vendor-only guards.
flagcx/include/flagcx.h Defines unified struct flagcxWindow and removes public flagcxOneSideRegister declaration.
flagcx/flagcx.cc Wires sym-heap fallback for window register/deregister; adds multicast capability detection and VMM alloc path for staged buffers; updates one-sided register signature usage.
flagcx/core/sym_heap.cc New symmetric-heap implementation: VMM export/import, flat VA mapping, multicast setup, MR registration.
flagcx/core/ipcsocket.cc Switches IPC socket type to UNIX datagram sockets.
flagcx/core/include/sym_heap.h New header defining flagcxSymWindow and sym-heap register/deregister APIs.
flagcx/core/include/onesided.h Exposes flagcxOneSideRegister(flagcxHeteroComm_t, ...) and includes comm type.
flagcx/core/include/dev_comm_state.h Adds stagedVmmAlloc to track VMM-backed staged allocations for cleanup.
flagcx/adaptor/include/nvidia_adaptor.h Moves vendor window definition to flagcxInnerWindow and removes flagcxWindow from this header.
flagcx/adaptor/include/flagcx_device_adaptor.h Adds adaptor hooks for sym VMM export/import + multicast lifecycle.
flagcx/adaptor/include/flagcx_ccl_adaptor.h Changes CCL adaptor window APIs to use flagcxInnerWindow_t instead of flagcxWindow_t.
flagcx/adaptor/include/device_api/nvidia_comm_traits.h Adds host-side window population from unified flagcxWindow_t (vendor path).
flagcx/adaptor/include/device_api/flagcx_device.h Updates device-memory/window capability comments to reflect symmetric vs IPC modes.
flagcx/adaptor/include/device_api/default_comm_traits.h Extends default Window traits with symmetric (flat VA + multicast) vs asymmetric (IPC) modes and host-side population.
flagcx/adaptor/flagcx_device.cc Unifies kernel Window allocation/population; adds symmetric-window handling and IPC fallback; updates comm properties multicast query.
flagcx/adaptor/device/tsmicro_adaptor.cc Adds sym VMM/multicast stub functions to adaptor vtable.
flagcx/adaptor/device/tops_adaptor.cc Adds sym VMM/multicast stub functions to adaptor vtable.
flagcx/adaptor/device/musa_adaptor.cc Adds sym VMM/multicast stub functions to adaptor vtable.
flagcx/adaptor/device/mlu_adaptor.cc Adds sym VMM/multicast stub functions to adaptor vtable.
flagcx/adaptor/device/maca_adaptor.cc Adds sym VMM/multicast stub functions to adaptor vtable.
flagcx/adaptor/device/kunlunxin_adaptor.cc Adds sym VMM/multicast stub functions to adaptor vtable.
flagcx/adaptor/device/ixcuda_adaptor.cc Adds sym VMM/multicast stub functions to adaptor vtable.
flagcx/adaptor/device/hip_adaptor.cc Adds sym VMM/multicast stub functions to adaptor vtable.
flagcx/adaptor/device/ducuda_adaptor.cc Adds sym VMM/multicast stub functions to adaptor vtable.
flagcx/adaptor/device/cuda_adaptor.cc Implements CUDA VMM export/import + flat VA map/unmap + multicast create/bind/teardown/free.
flagcx/adaptor/device/cann_adaptor.cc Adds sym VMM/multicast stub functions to adaptor vtable.
flagcx/adaptor/ccl/xccl_adaptor.cc Updates window API types to flagcxInnerWindow_t.
flagcx/adaptor/ccl/tccl_adaptor.cc Updates window API types to flagcxInnerWindow_t.
flagcx/adaptor/ccl/rccl_adaptor.cc Updates window API types to flagcxInnerWindow_t.
flagcx/adaptor/ccl/pccl_adaptor.cc Updates window API types to flagcxInnerWindow_t.
flagcx/adaptor/ccl/nccl_adaptor.cc Updates window API types and ensures inner-window allocation/cleanup on register failure.
flagcx/adaptor/ccl/musa_mccl_adaptor.cc Updates window API types to flagcxInnerWindow_t.
flagcx/adaptor/ccl/mpi_adaptor.cc Updates window API types to flagcxInnerWindow_t.
flagcx/adaptor/ccl/mccl_adaptor.cc Updates window API types to flagcxInnerWindow_t.
flagcx/adaptor/ccl/ixnccl_adaptor.cc Updates window API types to flagcxInnerWindow_t.
flagcx/adaptor/ccl/hccl_adaptor.cc Updates window API types to flagcxInnerWindow_t.
flagcx/adaptor/ccl/gloo_adaptor.cc Updates window API types to flagcxInnerWindow_t.
flagcx/adaptor/ccl/eccl_adaptor.cc Updates window API types to flagcxInnerWindow_t.
flagcx/adaptor/ccl/dunccl_adaptor.cc Updates window API types to flagcxInnerWindow_t.
flagcx/adaptor/ccl/cncl_adaptor.cc Updates window API types to flagcxInnerWindow_t.
flagcx/adaptor/ccl/bootstrap_adaptor.cc Updates window API types to flagcxInnerWindow_t.
adaptor_plugin/ccl/example/plugin.cc Updates example plugin window API signatures to flagcxInnerWindow_t.
Comments suppressed due to low confidence (1)

test/unittest/symmem/coll_sym_access.cpp:161

  • The file ends with a section header comment about verifying multicast base, but no test follows. This dangling comment makes it unclear whether a multicast test was intended but omitted. Either add the missing test or remove the unused section header.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread flagcx/flagcx.cc Outdated
Comment thread flagcx/flagcx.cc Outdated
Comment thread flagcx/core/sym_heap.cc
Comment thread flagcx/core/sym_heap.cc Outdated
Comment thread flagcx/adaptor/device/cuda_adaptor.cc
Comment thread adaptor_plugin/ccl/example/plugin.cc
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 58 out of 58 changed files in this pull request and generated 3 comments.

Comment thread flagcx/flagcx.cc Outdated
Comment thread flagcx/adaptor/device/cuda_adaptor.cc Outdated
Comment thread flagcx/core/sym_heap.cc Outdated
@MC952-arch MC952-arch force-pushed the symmetric-memory branch 2 times, most recently from 9e9ed27 to 8c9db2d Compare May 12, 2026 03:09
@MC952-arch MC952-arch requested a review from Copilot May 12, 2026 03:12
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 58 out of 58 changed files in this pull request and generated 5 comments.

Comment thread flagcx/include/flagcx.h Outdated
Comment thread flagcx/include/flagcx.h
Comment thread flagcx/adaptor/device/cuda_adaptor.cc
Comment thread flagcx/adaptor/include/flagcx_ccl_adaptor.h
Comment thread flagcx/flagcx.cc Outdated
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 59 out of 59 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (1)

flagcx/flagcx.cc:1273

  • state->hasMulticast is set from reqs.intraMulticast (and on the default path, from symMulticastSupported) before ensuring the staged buffers actually have a multicast-capable window/mapping. If commWindowRegister returns flagcxNotSupported and comm->heteroComm is null, sendStagedWin stays null but hasMulticast can remain true, causing the custom allreduce to take the multicast path and kernels to dereference a null multicast pointer. Consider defaulting hasMulticast=false and only enabling it after staged window registration succeeds, with validation for both sym-heap (mcBase != nullptr) and vendor windows (or at least require sendStagedWin != nullptr).

Comment thread flagcx/adaptor/flagcx_device.cc
@MC952-arch MC952-arch force-pushed the symmetric-memory branch 2 times, most recently from 5e084e6 to 5424883 Compare May 12, 2026 07:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants