[pull] main from NVIDIA:main by pull[bot] · Pull Request #605 · phu0ngng/TransformerEngine

pull · 2026-05-13T04:32:04Z

See Commits and Changes for more details.

Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

@ptrendx

* [NVRTC] Warn on CUDA version mismatch after compilation failure When NVRTC kernel compilation fails, detect whether the linked NVRTC library and the CUDA headers used for compilation are from different CUDA versions, and if so emit an actionable note to stderr pointing the user toward NVTE_CUDA_INCLUDE_DIR / CUDA_HOME / LD_LIBRARY_PATH. The header version is obtained by compiling a tiny probe program that embeds CUDA_VERSION (from cuda.h) into a static_assert failure message, so the macro is resolved by the actual preprocessor rather than by parsing header text. All probe failures are silent; the check is purely informational and never causes a premature error. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Tim Moon <tmoon@nvidia.com> * Move CUDA header version check to CUDA runtime utils Still buggy, include_directory_version returns CUDA runtime version instead of header version. Signed-off-by: Tim Moon <tmoon@nvidia.com> * [NVRTC] Fix CUDA header version detection The NVRTC probe approach was broken: NVRTC pre-defines CUDART_VERSION to its own version before processing any includes, so the probe always returned the NVRTC version regardless of the headers on the include path. Fix by reading cuda_runtime_api.h as text and parsing the "#define CUDART_VERSION <integer>" line directly. This is immune to NVRTC's internal macro management, and the format has been stable across all CUDA versions. Also decode raw CUDA version integers to "major.minor" strings in the error message for readability. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Tim Moon <tmoon@nvidia.com> * [NVRTC] Add unit tests for CUDA header detection Test that the CUDA include directory is found and that its version matches the compile-time CUDART_VERSION. Also export transformer_engine::cuda::* symbols and tighten the rtc export pattern in the version script. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Tim Moon <tmoon@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Tweak version message Suggestion from @ptrendx Signed-off-by: Tim Moon <4406448+timmoon10@users.noreply.github.com> * Remove test Test required exposing CUDA utility functions externally, which is beyond the scope of this work. Signed-off-by: Tim Moon <tmoon@nvidia.com> --------- Signed-off-by: Tim Moon <tmoon@nvidia.com> Signed-off-by: Tim Moon <4406448+timmoon10@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Przemyslaw Tredak <ptredak@nvidia.com>

…tion (#2666) * Replace the make_empty implementation to use C++ implementation for the known quantizers Signed-off-by: Przemek Tredak <ptredak@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix lint Signed-off-by: Przemek Tredak <ptredak@nvidia.com> * Handle the device passed as string Signed-off-by: Przemek Tredak <ptredak@nvidia.com> * Fix Signed-off-by: Przemek Tredak <ptredak@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Przemek Tredak <ptredak@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixes Signed-off-by: Przemek Tredak <ptredak@nvidia.com> * Replace the make_empty implementation to use C++ implementation for the known quantizers Signed-off-by: Przemek Tredak <ptredak@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix lint Signed-off-by: Przemek Tredak <ptredak@nvidia.com> * Handle the device passed as string Signed-off-by: Przemek Tredak <ptredak@nvidia.com> * Fix Signed-off-by: Przemek Tredak <ptredak@nvidia.com> * Fixes Signed-off-by: Przemek Tredak <ptredak@nvidia.com> * Fix duplicate create_empty_quantized_tensor after merge The merge with main introduced duplicate function definition, declaration, and pybind registration for create_empty_quantized_tensor. Remove the duplicates. Signed-off-by: Przemek Tredak <ptredak@nvidia.com> * Fix device index resolution in create_tensor Change the device parameter from at::Device with default torch::kCUDA to std::optional<at::Device> with default nullopt. When no device is specified, resolve to the current CUDA device via c10::cuda::current_device(), ensuring the device always has a valid index. This fixes autograd engine assertions when tensors created without an explicit device are used in backward passes. Signed-off-by: Przemek Tredak <ptredak@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Guard make_empty for custom quantizers without C++ converter Custom quantizers that set self.custom = True and don't override make_empty() will now get a clear NotImplementedError instead of hitting an opaque C++ NVTE_ERROR("Unexpected type for quantizer"). Signed-off-by: Przemek Tredak <ptredak@nvidia.com> * Fix the device from the passed data case Signed-off-by: Przemek Tredak <ptredak@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Przemek Tredak <ptredak@nvidia.com> Signed-off-by: vthumbe1503 <vthumbe@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: vthumbe1503 <vthumbe@nvidia.com>

Signed-off-by: Przemek Tredak <ptredak@nvidia.com>

timmoon10 and others added 3 commits May 12, 2026 17:01

Added the CODEOWNERS file (#2980)

76c2a9e

Signed-off-by: Przemek Tredak <ptredak@nvidia.com>

pull Bot locked and limited conversation to collaborators May 13, 2026

pull Bot added the ⤵️ pull label May 13, 2026

pull Bot merged commit 76c2a9e into phu0ngng:main May 13, 2026
9 of 10 checks passed

pull Bot had a problem deploying to github-pages May 13, 2026 04:34 Failure

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[pull] main from NVIDIA:main#605

[pull] main from NVIDIA:main#605
pull[bot] merged 3 commits into
phu0ngng:mainfrom
NVIDIA:main

pull Bot commented May 13, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

pull Bot commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pull Bot commented May 13, 2026 •

edited

Loading