-
Notifications
You must be signed in to change notification settings - Fork 240
Reorganize graph tests and add device-side launch support #1512
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
/ok to test 6e30c53 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved from test_graph.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved from test_graph.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved from test_graph.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved from test_graph.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved from test_graph.py
6e30c53 to
5c63733
Compare
|
/ok to test 5c63733 |
- Reorganize graph tests into tests/graph/ subdirectory: - test_basic.py: basic graph construction tests - test_conditional.py: conditional node tests (if, if-else, switch, while) - test_advanced.py: child graphs, update, stream lifetime - test_options.py: debug print, complete options, build mode - test_capture_alloc.py: graph memory resource tests (moved from test_graph_mem.py) - test_device_launch.py: new device-side graph launch tests - Add Graph.handle property to expose CUgraphExec handle, consistent with Stream.handle and Event.handle patterns - Add tests/helpers/graph_kernels.py with shared kernel compilation helpers - Device-side graph launch tests require Hopper (sm_90+) architecture
5c63733 to
da6205c
Compare
|
/ok to test da6205c |
| def _find_cudadevrt_library(): | ||
| """Find the CUDA device runtime static library using pathfinder. | ||
| Uses load_nvidia_dynamic_lib("cudart") to locate the CUDA runtime, | ||
| then derives the static library path from that location. | ||
| Returns: | ||
| Path to libcudadevrt.a (Linux) or cudadevrt.lib (Windows), or None if not found. | ||
| """ | ||
| try: | ||
| loaded = load_nvidia_dynamic_lib("cudart") | ||
| except Exception: | ||
| return None | ||
|
|
||
| lib_dir = os.path.dirname(loaded.abs_path) | ||
|
|
||
| if IS_WINDOWS: | ||
| # Windows: .dll is in bin/, .lib is in lib/x64/ | ||
| base = os.path.dirname(lib_dir) | ||
| path = os.path.join(base, "lib", "x64", "cudadevrt.lib") | ||
| if os.path.isfile(path): | ||
| return path | ||
| else: | ||
| # Linux: .so and .a are both in lib/ | ||
| path = os.path.join(lib_dir, "libcudadevrt.a") | ||
| if os.path.isfile(path): | ||
| return path | ||
|
|
||
| return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure of the best way to locate the cudadevrt library (needed for device launch). I'm open to suggestions here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have the existing requirement that CUDA_HOME needs to be set when running the cuda-core tests: did you consider finding the .lib/.a files relative to CUDA_HOME? I'm guessing that's probably pretty simple.
In any case, could you please reference #716 from the helper function, and maybe add a comment under 716 to point back to the helper function, after this PR is merged?
If you want to keep the current implementation:
try:
loaded = load_nvidia_dynamic_lib("cudart")
except DynamicLibNotFoundError:
return NoneThat's far less likely to mask bugs.
|
| self._mnff.close() | ||
|
|
||
| @property | ||
| def handle(self) -> driver.CUgraphExec: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do we decide if we want to expose CUgraph or CUgraphExec? Both can find their use cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a great question that I tried to sidestep in this change. (I'm working on another change to add resource handles to the graph module.)
My understanding is that class GraphBuilder corresponds to CUgraph and class Graph corresponds to CUgraphExec.
Following the existing pattern, after moving to resource handles, we would update the internal names to clarify:
GraphBuilder._mnff.graph→GraphBuilder._mnff._h_graph(CUgraph)Graph._mnff.graph→Graph._mnff._h_graph_exec(CUgraphExec)
And the property types would be:
GraphBuilder.handle -> driver.CUgraphGraph.handle -> driver.CUgraphExec
FYI, I added this property to expose CUgraphExec for the following code in test_device_launch.py:
inner_graph_handle = int(inner_graph.handle)
...
launch(gb_outer, LaunchConfig(grid=1, block=1), launcher_kernel, inner_graph_handle)
Summary
This PR reorganizes the graph tests into a new
tests/graph/subdirectory, adds a publicGraph.handleproperty, and introduces tests for device-side graph launch.Closes #1331
Changes
Test Reorganization
tests/graph/subdirectory following the pattern oftests/memory_ipc/test_graph.pyinto logical groupings:test_basic.py: basic graph construction and topology teststest_conditional.py: conditional node tests (if, if-else, switch, while)test_advanced.py: child graphs, update, stream lifetimetest_options.py: debug print, complete options, build modetest_graph_mem.py(renamed totest_capture_alloc.py)tests/helpers/graph_kernels.pyAPI Addition
Graph.handleproperty to expose the underlyingCUgraphExechandleStream.handleandEvent.handleDevice-Side Graph Launch Tests
test_device_launch.pywith tests for device-side graph launchcudaGraphLaunch()from device code viacudaStreamGraphTailLaunchcuda.pathfinder.find_nvidia_header_directory()to locatelibcudadevrt.aTest Coverage
tests/graph/