[pull] main from NVIDIA:main#605
Merged
Merged
Conversation
* [NVRTC] Warn on CUDA version mismatch after compilation failure When NVRTC kernel compilation fails, detect whether the linked NVRTC library and the CUDA headers used for compilation are from different CUDA versions, and if so emit an actionable note to stderr pointing the user toward NVTE_CUDA_INCLUDE_DIR / CUDA_HOME / LD_LIBRARY_PATH. The header version is obtained by compiling a tiny probe program that embeds CUDA_VERSION (from cuda.h) into a static_assert failure message, so the macro is resolved by the actual preprocessor rather than by parsing header text. All probe failures are silent; the check is purely informational and never causes a premature error. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Tim Moon <tmoon@nvidia.com> * Move CUDA header version check to CUDA runtime utils Still buggy, include_directory_version returns CUDA runtime version instead of header version. Signed-off-by: Tim Moon <tmoon@nvidia.com> * [NVRTC] Fix CUDA header version detection The NVRTC probe approach was broken: NVRTC pre-defines CUDART_VERSION to its own version before processing any includes, so the probe always returned the NVRTC version regardless of the headers on the include path. Fix by reading cuda_runtime_api.h as text and parsing the "#define CUDART_VERSION <integer>" line directly. This is immune to NVRTC's internal macro management, and the format has been stable across all CUDA versions. Also decode raw CUDA version integers to "major.minor" strings in the error message for readability. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Tim Moon <tmoon@nvidia.com> * [NVRTC] Add unit tests for CUDA header detection Test that the CUDA include directory is found and that its version matches the compile-time CUDART_VERSION. Also export transformer_engine::cuda::* symbols and tighten the rtc export pattern in the version script. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Tim Moon <tmoon@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Tweak version message Suggestion from @ptrendx Signed-off-by: Tim Moon <4406448+timmoon10@users.noreply.github.com> * Remove test Test required exposing CUDA utility functions externally, which is beyond the scope of this work. Signed-off-by: Tim Moon <tmoon@nvidia.com> --------- Signed-off-by: Tim Moon <tmoon@nvidia.com> Signed-off-by: Tim Moon <4406448+timmoon10@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Przemyslaw Tredak <ptredak@nvidia.com>
…tion (#2666) * Replace the make_empty implementation to use C++ implementation for the known quantizers Signed-off-by: Przemek Tredak <ptredak@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix lint Signed-off-by: Przemek Tredak <ptredak@nvidia.com> * Handle the device passed as string Signed-off-by: Przemek Tredak <ptredak@nvidia.com> * Fix Signed-off-by: Przemek Tredak <ptredak@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Przemek Tredak <ptredak@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixes Signed-off-by: Przemek Tredak <ptredak@nvidia.com> * Replace the make_empty implementation to use C++ implementation for the known quantizers Signed-off-by: Przemek Tredak <ptredak@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix lint Signed-off-by: Przemek Tredak <ptredak@nvidia.com> * Handle the device passed as string Signed-off-by: Przemek Tredak <ptredak@nvidia.com> * Fix Signed-off-by: Przemek Tredak <ptredak@nvidia.com> * Fixes Signed-off-by: Przemek Tredak <ptredak@nvidia.com> * Fix duplicate create_empty_quantized_tensor after merge The merge with main introduced duplicate function definition, declaration, and pybind registration for create_empty_quantized_tensor. Remove the duplicates. Signed-off-by: Przemek Tredak <ptredak@nvidia.com> * Fix device index resolution in create_tensor Change the device parameter from at::Device with default torch::kCUDA to std::optional<at::Device> with default nullopt. When no device is specified, resolve to the current CUDA device via c10::cuda::current_device(), ensuring the device always has a valid index. This fixes autograd engine assertions when tensors created without an explicit device are used in backward passes. Signed-off-by: Przemek Tredak <ptredak@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Guard make_empty for custom quantizers without C++ converter Custom quantizers that set self.custom = True and don't override make_empty() will now get a clear NotImplementedError instead of hitting an opaque C++ NVTE_ERROR("Unexpected type for quantizer"). Signed-off-by: Przemek Tredak <ptredak@nvidia.com> * Fix the device from the passed data case Signed-off-by: Przemek Tredak <ptredak@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Przemek Tredak <ptredak@nvidia.com> Signed-off-by: vthumbe1503 <vthumbe@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: vthumbe1503 <vthumbe@nvidia.com>
Signed-off-by: Przemek Tredak <ptredak@nvidia.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
See Commits and Changes for more details.
Created by
pull[bot] (v2.0.0-alpha.4)
Can you help keep this open source service alive? 💖 Please sponsor : )