-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the Bug
I'm trying to build the version of Apex used in NeMo v2.2.1 -- 810ffae -- as listed here:
I'm configuring the build with the options listed here:
--build-option=--cpp_ext --cuda_ext --fast_layer_norm --distributed_adam --deprecated_fused_adam --group_norm
The build is failing in a number of different environments with this error:
...
/usr/local/lib/python3.12/dist-packages/torch/include/ATen/Dispatch.h:199:48: error: cannot convert ‘const at::DeprecatedTypeProperties’ to ‘c10::ScalarType’
199 | at::ScalarType _st = ::detail::scalar_type(the_type); \
| ^~~~~~~~
| |
| const at::DeprecatedTypeProperties
Minimal Steps/Code to Reproduce the Bug
The error can be reliably reproduced inside nvcr.io/nvidia/pytorch:25.02-py3:
- Build log: output.txt
$> docker run -it nvcr.io/nvidia/pytorch:25.02-py3
root@3adeea6b839b:/# mkdir -p /repos
root@3adeea6b839b:/# cd /repos
root@3adeea6b839b:/repos# pip install packaging build
root@3adeea6b839b:/repos# git clone https://github.com/NVIDIA/apex
root@3adeea6b839b:/repos# cd apex
root@3adeea6b839b:/repos/apex# export apex_commit=810ffae374a2b9cb4b5c5e28eaeca7d7998fca0c
root@3adeea6b839b:/repos/apex# pip wheel -v --no-build-isolation \
--disable-pip-version-check \
--no-cache-dir \
--config-settings "--build-option=--cpp_ext --cuda_ext --fast_layer_norm --distributed_adam --deprecated_fused_adam --group_norm" . \
2>&1 | tee output
...
/usr/local/lib/python3.12/dist-packages/torch/include/ATen/Dispatch.h:199:48: error: cannot convert ‘const at::DeprecatedTypeProperties’ to ‘c10::ScalarType’
199 | at::ScalarType _st = ::detail::scalar_type(the_type); \
| ^~~~~~~~
| |
| const at::DeprecatedTypeProperties
/usr/local/lib/python3.12/dist-packages/torch/include/ATen/Dispatch.h:227:3: note: in expansion of macro ‘AT_DISPATCH_SWITCH’
227 | AT_DISPATCH_SWITCH( \
| ^~~~~~~~~~~~~~~~~~
/repos/apex/csrc/mlp.cpp:69:3: note: in expansion of macro ‘AT_DISPATCH_FLOATING_TYPES_AND_HALF’
69 | AT_DISPATCH_FLOATING_TYPES_AND_HALF(inputs[0].type(), "mlp_forward", [&] {
...
Expected Behavior
I expect the build to succeed and a wheel to be produced.
Environment
PyTorch version: 2.7.0a0+ecf3bae40a.nv25.02
Is debug build: False
CUDA used to build PyTorch: 12.8
ROCM used to build PyTorch: N/A
OS: Ubuntu 24.04.1 LTS (x86_64)
GCC version: (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0
Clang version: Could not collect
CMake version: version 3.31.4
Libc version: glibc-2.39
Python version: 3.12.3 (main, Jan 17 2025, 18:03:48) [GCC 13.3.0] (64-bit runtime)
Python platform: Linux-4.18.0-553.16.1.el8_10.x86_64-x86_64-with-glibc2.39
Is CUDA available: True
CUDA runtime version: 12.8.61
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: NVIDIA H100 PCIe
Nvidia driver version: 570.124.06
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.9.7.1
/usr/lib/x86_64-linux-gnu/libcudnn_adv.so.9.7.1
/usr/lib/x86_64-linux-gnu/libcudnn_cnn.so.9.7.1
/usr/lib/x86_64-linux-gnu/libcudnn_engines_precompiled.so.9.7.1
/usr/lib/x86_64-linux-gnu/libcudnn_engines_runtime_compiled.so.9.7.1
/usr/lib/x86_64-linux-gnu/libcudnn_graph.so.9.7.1
/usr/lib/x86_64-linux-gnu/libcudnn_heuristic.so.9.7.1
/usr/lib/x86_64-linux-gnu/libcudnn_ops.so.9.7.1
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
CPU:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 48 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 96
On-line CPU(s) list: 0-95
Vendor ID: AuthenticAMD
Model name: AMD EPYC 7413 24-Core Processor
CPU family: 25
Model: 1
Thread(s) per core: 2
Core(s) per socket: 24
Socket(s): 2
Stepping: 1
Frequency boost: enabled
CPU(s) scaling MHz: 66%
CPU max MHz: 3630.8101
CPU min MHz: 1500.0000
BogoMIPS: 5300.06
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 invpcid_single hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr wbnoinvd amd_ppin brs arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold v_vmsave_vmload vgif v_spec_ctrl umip pku ospke vaes vpclmulqdq rdpid overflow_recov succor smca fsrm
Virtualization: AMD-V
L1d cache: 1.5 MiB (48 instances)
L1i cache: 1.5 MiB (48 instances)
L2 cache: 24 MiB (48 instances)
L3 cache: 256 MiB (8 instances)
NUMA node(s): 2
NUMA node0 CPU(s): 0-23,48-71
NUMA node1 CPU(s): 24-47,72-95
Vulnerability Gather data sampling: Not affected
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Mmio stale data: Not affected
Vulnerability Reg file data sampling: Not affected
Vulnerability Retbleed: Not affected
Vulnerability Spec rstack overflow: Mitigation; Safe RET
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; Retpolines; IBPB conditional; IBRS_FW; STIBP always-on; RSB filling; PBRSB-eIBRS Not affected; BHI Not affected
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected
Versions of relevant libraries:
[pip3] mypy-extensions==1.0.0
[pip3] numpy==1.26.4
[pip3] nvidia-cudnn-frontend==1.10.0
[pip3] nvtx==0.2.5
[pip3] onnx==1.17.0
[pip3] optree==0.14.0
[pip3] pynvjitlink==0.3.0
[pip3] pytorch-triton==3.2.0+git0d4682f0b.nvinternal
[pip3] torch==2.7.0a0+ecf3bae40a.nv25.2
[pip3] torch_geometric==2.5.3
[pip3] torch_tensorrt==2.6.0a0
[pip3] torchprofile==0.0.4
[pip3] torchvision==0.22.0a0
[conda] Could not collect
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working