-
Notifications
You must be signed in to change notification settings - Fork 74
Move Parallelism to __init__.py #5783
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
It can be used in tests as well as benchmarks.
|
!test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
This PR refactors the Parallelism enum by moving it from benchmark_utils.py to __init__.py to make it more accessible for both tests and benchmarks.
Major Issues Found:
- Two files (
test_overlap.pyandtest_deepseek_v3.py) still use the old absolute importfrom benchmark_utils importinstead of relative imports, which will cause import failures
Changes Made:
- Created new
__init__.pywithParallelismenum definition - Removed
Parallelismfrombenchmark_utils.py - Updated imports in
test_transformer.pyandtest_transformer_engine.pyto use relative imports
Confidence Score: 2/5
- This PR has critical import issues that will cause test failures
- The refactoring itself is well-executed for the 4 changed files, but two other files in the same directory (
test_overlap.pyandtest_deepseek_v3.py) still import frombenchmark_utilsusing absolute imports instead of relative imports. These files will fail at runtime when they try to importget_benchmark_fns, breaking the test suite. test_overlap.pyandtest_deepseek_v3.pyneed their imports updated to use relative imports (.benchmark_utils)
Important Files Changed
File Analysis
| Filename | Score | Overview |
|---|---|---|
| tests/python/multidevice/init.py | 5/5 | New file correctly defines Parallelism enum with proper documentation links |
| tests/python/multidevice/benchmark_utils.py | 5/5 | Correctly removed Parallelism class, kept only benchmark utility functions |
| tests/python/multidevice/test_transformer.py | 5/5 | Updated imports correctly using relative imports for Parallelism and get_benchmark_fns |
| tests/python/multidevice/test_transformer_engine.py | 5/5 | Updated imports correctly, properly separated third-party and local imports |
Sequence Diagram
sequenceDiagram
participant Init as __init__.py
participant BenchUtils as benchmark_utils.py
participant TestTrans as test_transformer.py
participant TestTransEngine as test_transformer_engine.py
participant TestOverlap as test_overlap.py (NOT UPDATED)
participant TestDeepSeek as test_deepseek_v3.py (NOT UPDATED)
Note over Init: Defines Parallelism enum
Note over BenchUtils: Defines get_benchmark_fns()
TestTrans->>Init: from . import Parallelism
TestTrans->>BenchUtils: from .benchmark_utils import get_benchmark_fns
TestTransEngine->>Init: from . import Parallelism
TestTransEngine->>BenchUtils: from .benchmark_utils import get_benchmark_fns
TestOverlap--xBenchUtils: from benchmark_utils import... (BROKEN)
Note over TestOverlap: Missing relative import dot
TestDeepSeek--xBenchUtils: from benchmark_utils import... (BROKEN)
Note over TestDeepSeek: Missing relative import dot
Additional Comments (2)
|
Description
|
| Relevant files | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| Enhancement |
|
PR Reviewer Guide
Here are some key observations to aid the review process:
| 🧪 No relevant tests |
| 🔒 No security concerns identified |
| ⚡ Recommended focus areas for review |
Formatting inconsistency
|
Test failures
-
(High, 96)
CUDA 'system not yet initialized' across nvFuser matmul & Hopper test suites on dlcluster_h100Test Name H100 Source ArgsortParameterizedWithBlockAndBatch.SharedMemoryRequirement/1024_3_1_1 ❌ Link ArgsortParameterizedWithBlockAndBatch.SharedMemoryRequirement/512_2_1_0 ❌ Link ArgsortParameterizedWithBlockAndBatch.SharedMemoryRequirement/512_3_0_1 ❌ Link BlockSizeAndItemsPerThread/ArgSortComprehensiveTest.ComprehensiveValidation/BlockSize64_ItemsPerThread1 ❌ Link ClusterReductionTest.SimpleFusionNotAllReduce/cluster_16_dtype_float ❌ Link ClusterReductionTest.SimpleFusionNotAllReduce/cluster_5_dtype_float ❌ Link ClusterReductionTest.SimpleFusionNotAllReduce/cluster_6_dtype_float ❌ Link FusionProfilerTest.Profile3Segments ❌ Link General/HopperPlusMatmulSchedulerTest.FusedMultiplySum/KK_512_256_128_MmaMacro_m128_n128_k16_tma_store ❌ Link General/HopperPlusMatmulSchedulerTest.FusedMultiplySumBiasNeg/MK_512_256_128_MmaMacro_m128_n128_k16_tma_store ❌ Link ... with 86 more test failures omitted. Check internal logs. -
(Low, 1)
bfloat16 cross_entropy numerical mismatch in Thunder vs. Torch (test_ops::test_core_vs_torch_consistency_cross_entropy)Test Name H100 Source thunder.tests.test_ops.test_core_vs_torch_consistency_cross_entropy_nvfuser_cuda_thunder.dtypes.bfloat16 ❌ -
(Low, 1)
Minor float32 numerical mismatch in thunder.tests.test_ops instance_norm nvFuser CUDA suiteTest Name GB200 Source thunder.tests.test_ops.test_core_vs_torch_consistency_instance_norm_nvfuser_cuda_thunder.dtypes.float32 ❌
It can be used in tests as well as benchmarks.