-
Notifications
You must be signed in to change notification settings - Fork 44
feat(shmem): add MORI_MULTITHREAD_SUPPORT for SPMT #308
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
15 commits
Select commit
Hold shift + click to select a range
3b429e7
feat(shmem): add MORI_MULTITHREAD_SUPPORT for single-process multi-th…
jhchouuu 0ac5e0f
fix(shmem): SPMT review fixes — pybind dup, peer access, finalize leak
jhchouuu 80b6536
feat(jax): enable EP dispatch/combine in single-process multi-thread …
jhchouuu 82fb144
fix(jax): clear_ep_handle_cache must touch only the calling thread's …
jhchouuu 21c7326
refactor(spmt): centralize kMaxGpusPerNode, drop dead code, RAII devi…
jhchouuu 295d308
shmem: restore Finalized enum value and CheckStatusValid check
jhchouuu 367188b
examples: drop multithread_multi_gpu.cpp (superseded by Python SPMT t…
jhchouuu dfe9b79
fix(pre-commit): apply black/clang-format/cmake-format auto-fixes
jhchouuu 709851c
review: ctypes signatures, dedup ep_config parse, spdlog null guard
jhchouuu db3ae7d
test(jax-spmt): add real data verification (matches multi-process test)
jhchouuu be5d104
fix(pre-commit): apply black auto-fix on test_dispatch_combine_jax_spmt
jhchouuu 616c7b7
feat(shmem): add SDMA transport support for SPMT (single-process mult…
jhchouuu 36d9ad3
fix(pre-commit): fix ambiguous variable name in SDMA test
jhchouuu 02224f8
fix(pre-commit): apply clang-format auto-fix
jhchouuu bb571d9
refactor: remove SPMT JAX tests, fix HostName return type
jhchouuu File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,33 @@ | ||
| // Copyright © Advanced Micro Devices, Inc. All rights reserved. | ||
| // | ||
| // MIT License | ||
| // | ||
| // Permission is hereby granted, free of charge, to any person obtaining a copy | ||
| // of this software and associated documentation files (the "Software"), to deal | ||
| // in the Software without restriction, including without limitation the rights | ||
| // to use, copy, modify, merge, publish, distribute, sublicense, and/or sell | ||
| // copies of the Software, and to permit persons to whom the Software is | ||
| // furnished to do so, subject to the following conditions: | ||
| // | ||
| // The above copyright notice and this permission notice shall be included in all | ||
| // copies or substantial portions of the Software. | ||
| // | ||
| // THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR | ||
| // IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, | ||
| // FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE | ||
| // AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER | ||
| // LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, | ||
| // OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE | ||
| // SOFTWARE. | ||
| #pragma once | ||
|
|
||
| // Tiny header so any TU (device or host) can size per-GPU arrays without | ||
| // pulling in shmem/ops headers. Bump this when supporting nodes with > 8 GPUs | ||
| // (e.g. future MI400 platforms) — every per-GPU array sized by this constant | ||
| // will pick up the new value automatically. | ||
|
|
||
| namespace mori { | ||
|
|
||
| inline constexpr int kMaxGpusPerNode = 8; | ||
|
|
||
| } // namespace mori | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think with CPX it is 32. Not sure if it can be split even more than that.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we will have
3272 GPUs at rack level. It's best to not hardcode this or we should get this max number of GPU from the build script?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, CPX and rack level will have more than 8 GPUs. We have already tried this on CPX, but it was limited to a single card before... Additionally, the rack level is also included in our plan...
So currently, kMaxGpusPerNode equals to 8...