Skip to content

Conversation

@wujingyue
Copy link
Collaborator

It can be used in tests as well as benchmarks.

It can be used in tests as well as benchmarks.
@wujingyue wujingyue requested a review from Priya2698 January 9, 2026 06:52
@wujingyue
Copy link
Collaborator Author

!test

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

This PR refactors the Parallelism enum by moving it from benchmark_utils.py to __init__.py to make it more accessible for both tests and benchmarks.

Major Issues Found:

  • Two files (test_overlap.py and test_deepseek_v3.py) still use the old absolute import from benchmark_utils import instead of relative imports, which will cause import failures

Changes Made:

  • Created new __init__.py with Parallelism enum definition
  • Removed Parallelism from benchmark_utils.py
  • Updated imports in test_transformer.py and test_transformer_engine.py to use relative imports

Confidence Score: 2/5

  • This PR has critical import issues that will cause test failures
  • The refactoring itself is well-executed for the 4 changed files, but two other files in the same directory (test_overlap.py and test_deepseek_v3.py) still import from benchmark_utils using absolute imports instead of relative imports. These files will fail at runtime when they try to import get_benchmark_fns, breaking the test suite.
  • test_overlap.py and test_deepseek_v3.py need their imports updated to use relative imports (.benchmark_utils)

Important Files Changed

File Analysis

Filename Score Overview
tests/python/multidevice/init.py 5/5 New file correctly defines Parallelism enum with proper documentation links
tests/python/multidevice/benchmark_utils.py 5/5 Correctly removed Parallelism class, kept only benchmark utility functions
tests/python/multidevice/test_transformer.py 5/5 Updated imports correctly using relative imports for Parallelism and get_benchmark_fns
tests/python/multidevice/test_transformer_engine.py 5/5 Updated imports correctly, properly separated third-party and local imports

Sequence Diagram

sequenceDiagram
    participant Init as __init__.py
    participant BenchUtils as benchmark_utils.py
    participant TestTrans as test_transformer.py
    participant TestTransEngine as test_transformer_engine.py
    participant TestOverlap as test_overlap.py (NOT UPDATED)
    participant TestDeepSeek as test_deepseek_v3.py (NOT UPDATED)

    Note over Init: Defines Parallelism enum
    Note over BenchUtils: Defines get_benchmark_fns()
    
    TestTrans->>Init: from . import Parallelism
    TestTrans->>BenchUtils: from .benchmark_utils import get_benchmark_fns
    
    TestTransEngine->>Init: from . import Parallelism
    TestTransEngine->>BenchUtils: from .benchmark_utils import get_benchmark_fns
    
    TestOverlap--xBenchUtils: from benchmark_utils import... (BROKEN)
    Note over TestOverlap: Missing relative import dot
    
    TestDeepSeek--xBenchUtils: from benchmark_utils import... (BROKEN)
    Note over TestDeepSeek: Missing relative import dot
Loading

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Jan 9, 2026

Additional Comments (2)

tests/python/multidevice/test_overlap.py
Import will break after this PR. Should be from .benchmark_utils import get_benchmark_fns (note the relative import dot)


tests/python/multidevice/test_deepseek_v3.py
Import will break after this PR. Should be from .benchmark_utils import get_benchmark_fns (note the relative import dot)

@github-actions
Copy link

github-actions bot commented Jan 9, 2026

Description

  • Move Parallelism enum from benchmark_utils.py to init.py

  • Update import statements in test files to use relative imports

  • Create new init.py file to expose Parallelism enum

  • Remove duplicate enum definition and imports from benchmark_utils.py

Changes walkthrough

Relevant files
Enhancement
__init__.py
Create __init__.py with Parallelism enum                                 

tests/python/multidevice/init.py

  • Create new __init__.py file in multidevice tests directory
  • Define Parallelism enum with TENSOR_PARALLEL and SEQUENCE_PARALLEL
    values
  • Add documentation links for each parallelism type
  • +12/-0   
    benchmark_utils.py
    Remove Parallelism enum from benchmark_utils                         

    tests/python/multidevice/benchmark_utils.py

  • Remove Parallelism enum definition
  • Remove unused enum imports (auto, Enum)
  • Keep existing benchmark utility functions unchanged
  • +0/-8     
    test_transformer.py
    Update imports for relocated Parallelism enum                       

    tests/python/multidevice/test_transformer.py

  • Update imports to use relative import for Parallelism
  • Change from benchmark_utils import to . import Parallelism
  • Split imports to separate Parallelism and benchmark functions
  • +2/-1     
    test_transformer_engine.py
    Update imports for relocated Parallelism enum                       

    tests/python/multidevice/test_transformer_engine.py

  • Update imports to use relative import for Parallelism
  • Change from benchmark_utils import to . import Parallelism
  • Add unused enum import that should be removed
  • +5/-1     

    PR Reviewer Guide

    Here are some key observations to aid the review process:

    🧪 No relevant tests
    🔒 No security concerns identified
    ⚡ Recommended focus areas for review
    Formatting inconsistency

    Extra blank lines were added in the import section (lines 6, 7, 9, 11) which creates inconsistent formatting compared to the other files in the PR. Consider maintaining consistent formatting across all modified files.

    import torch
    import torch.distributed as dist
    
    import transformer_engine.pytorch as te

    Test failures

    • (High, 96) CUDA 'system not yet initialized' across nvFuser matmul & Hopper test suites on dlcluster_h100

      Test Name H100 Source
      ArgsortParameterizedWithBlockAndBatch.SharedMemoryRequirement/1024_3_1_1 Link
      ArgsortParameterizedWithBlockAndBatch.SharedMemoryRequirement/512_2_1_0 Link
      ArgsortParameterizedWithBlockAndBatch.SharedMemoryRequirement/512_3_0_1 Link
      BlockSizeAndItemsPerThread/ArgSortComprehensiveTest.ComprehensiveValidation/BlockSize64_ItemsPerThread1 Link
      ClusterReductionTest.SimpleFusionNotAllReduce/cluster_16_dtype_float Link
      ClusterReductionTest.SimpleFusionNotAllReduce/cluster_5_dtype_float Link
      ClusterReductionTest.SimpleFusionNotAllReduce/cluster_6_dtype_float Link
      FusionProfilerTest.Profile3Segments Link
      General/HopperPlusMatmulSchedulerTest.FusedMultiplySum/KK_512_256_128_MmaMacro_m128_n128_k16_tma_store Link
      General/HopperPlusMatmulSchedulerTest.FusedMultiplySumBiasNeg/MK_512_256_128_MmaMacro_m128_n128_k16_tma_store Link
      ... with 86 more test failures omitted. Check internal logs.
    • (Low, 1) bfloat16 cross_entropy numerical mismatch in Thunder vs. Torch (test_ops::test_core_vs_torch_consistency_cross_entropy)

      Test Name H100 Source
      thunder.tests.test_ops.test_core_vs_torch_consistency_cross_entropy_nvfuser_cuda_thunder.dtypes.bfloat16
    • (Low, 1) Minor float32 numerical mismatch in thunder.tests.test_ops instance_norm nvFuser CUDA suite

      Test Name GB200 Source
      thunder.tests.test_ops.test_core_vs_torch_consistency_instance_norm_nvfuser_cuda_thunder.dtypes.float32

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

    Labels

    None yet

    Projects

    None yet

    Development

    Successfully merging this pull request may close these issues.

    2 participants