Skip to content

[pull] main from NVIDIA:main#605

Merged
pull[bot] merged 3 commits into
phu0ngng:mainfrom
NVIDIA:main
May 13, 2026
Merged

[pull] main from NVIDIA:main#605
pull[bot] merged 3 commits into
phu0ngng:mainfrom
NVIDIA:main

Conversation

@pull
Copy link
Copy Markdown

@pull pull Bot commented May 13, 2026

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

timmoon10 and others added 3 commits May 12, 2026 17:01
* [NVRTC] Warn on CUDA version mismatch after compilation failure

When NVRTC kernel compilation fails, detect whether the linked NVRTC
library and the CUDA headers used for compilation are from different
CUDA versions, and if so emit an actionable note to stderr pointing
the user toward NVTE_CUDA_INCLUDE_DIR / CUDA_HOME / LD_LIBRARY_PATH.

The header version is obtained by compiling a tiny probe program that
embeds CUDA_VERSION (from cuda.h) into a static_assert failure message,
so the macro is resolved by the actual preprocessor rather than by
parsing header text.  All probe failures are silent; the check is
purely informational and never causes a premature error.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Tim Moon <tmoon@nvidia.com>

* Move CUDA header version check to CUDA runtime utils

Still buggy, include_directory_version returns CUDA runtime version instead of header version.

Signed-off-by: Tim Moon <tmoon@nvidia.com>

* [NVRTC] Fix CUDA header version detection

The NVRTC probe approach was broken: NVRTC pre-defines CUDART_VERSION
to its own version before processing any includes, so the probe always
returned the NVRTC version regardless of the headers on the include path.

Fix by reading cuda_runtime_api.h as text and parsing the
"#define CUDART_VERSION <integer>" line directly. This is immune to
NVRTC's internal macro management, and the format has been stable across
all CUDA versions.

Also decode raw CUDA version integers to "major.minor" strings in the
error message for readability.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Tim Moon <tmoon@nvidia.com>

* [NVRTC] Add unit tests for CUDA header detection

Test that the CUDA include directory is found and that its version
matches the compile-time CUDART_VERSION.

Also export transformer_engine::cuda::* symbols and tighten the rtc
export pattern in the version script.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Tim Moon <tmoon@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Tweak version message

Suggestion from @ptrendx

Signed-off-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>

* Remove test

Test required exposing CUDA utility functions externally, which is beyond the scope of this work.

Signed-off-by: Tim Moon <tmoon@nvidia.com>

---------

Signed-off-by: Tim Moon <tmoon@nvidia.com>
Signed-off-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Przemyslaw Tredak <ptredak@nvidia.com>
…tion (#2666)

* Replace the make_empty implementation to use C++ implementation for the
known quantizers

Signed-off-by: Przemek Tredak <ptredak@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix lint

Signed-off-by: Przemek Tredak <ptredak@nvidia.com>

* Handle the device passed as string

Signed-off-by: Przemek Tredak <ptredak@nvidia.com>

* Fix

Signed-off-by: Przemek Tredak <ptredak@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix

Signed-off-by: Przemek Tredak <ptredak@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixes

Signed-off-by: Przemek Tredak <ptredak@nvidia.com>

* Replace the make_empty implementation to use C++ implementation for the
known quantizers

Signed-off-by: Przemek Tredak <ptredak@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix lint

Signed-off-by: Przemek Tredak <ptredak@nvidia.com>

* Handle the device passed as string

Signed-off-by: Przemek Tredak <ptredak@nvidia.com>

* Fix

Signed-off-by: Przemek Tredak <ptredak@nvidia.com>

* Fixes

Signed-off-by: Przemek Tredak <ptredak@nvidia.com>

* Fix duplicate create_empty_quantized_tensor after merge

The merge with main introduced duplicate function definition,
declaration, and pybind registration for create_empty_quantized_tensor.
Remove the duplicates.

Signed-off-by: Przemek Tredak <ptredak@nvidia.com>

* Fix device index resolution in create_tensor

Change the device parameter from at::Device with default torch::kCUDA
to std::optional<at::Device> with default nullopt. When no device is
specified, resolve to the current CUDA device via
c10::cuda::current_device(), ensuring the device always has a valid
index. This fixes autograd engine assertions when tensors created
without an explicit device are used in backward passes.

Signed-off-by: Przemek Tredak <ptredak@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Guard make_empty for custom quantizers without C++ converter

Custom quantizers that set self.custom = True and don't override
make_empty() will now get a clear NotImplementedError instead of
hitting an opaque C++ NVTE_ERROR("Unexpected type for quantizer").

Signed-off-by: Przemek Tredak <ptredak@nvidia.com>

* Fix the device from the passed data case

Signed-off-by: Przemek Tredak <ptredak@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Przemek Tredak <ptredak@nvidia.com>
Signed-off-by: vthumbe1503 <vthumbe@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: vthumbe1503 <vthumbe@nvidia.com>
Signed-off-by: Przemek Tredak <ptredak@nvidia.com>
@pull pull Bot locked and limited conversation to collaborators May 13, 2026
@pull pull Bot added the ⤵️ pull label May 13, 2026
@pull pull Bot merged commit 76c2a9e into phu0ngng:main May 13, 2026
9 of 10 checks passed
@pull pull Bot had a problem deploying to github-pages May 13, 2026 04:34 Failure
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants