Add Windows support#354
Conversation
|
Regarding Incidentally, doing this review would unblock #323 |
|
/ok to test dd1ffc9 |
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughSummary by CodeRabbit
suggestion: WalkthroughAdds Windows build/test support: enables Windows symbol export and MSVC link option, adds a CUDA profiler API installer and CI invocation, re-enables Windows PR job, makes CUPTI import platform-aware, adjusts CUDA/MSVC compile options and C++ detection, and configures cross-platform test runtime library paths. ChangesWindows build and test support
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: ffe8731a-1bea-4fe3-bd00-ae6f639bb863
📒 Files selected for processing (8)
CMakeLists.txtci/build_common.shcmake/NVBenchCUPTI.cmakecmake/NVBenchConfigTarget.cmakenvbench/config.cuh.intesting/axes_metadata.cutesting/cmake/CMakeLists.txttesting/cmake/test_export/CMakeLists.txt
|
I will revert the changes to |
|
@mfranzrebsal The #362 to enable MSVC build of NVBench has been merged, but it is presently unconditionally skipped due to known build failure this PR fixes. Please merge main into this branch, and revert c632eb2 to reenable the PR. The expectation is that CI build using MSVC would complete successfully now. |
edec813 to
787e435
Compare
|
The commit you mention is nowhere to be found, either in main or my rebased branch. I think we are good? |
|
/ok to test 787e435 |
|
@mfranzrebsal Right now the CI has Windows build job disabled in pr.yml#L82-83. Please push a change to remove these two lines to enable the job. |
Remove gate that disables Windows NVBench build job in pr.yaml
|
/ok to test 78b674b |
|
Windows build job fails with: On Linux, the compilation command for This folder should contain "cuda_profiler_api.h" though. |
|
Per my agent, the devcontainer does not install CUDA Profiler API component. I will try installing CUDA Profiler API in the container next |
|
/ok to test 460e14f |
|
/ok to test c6cd097 |
Attempt to fix "LINK : fatal error LNK1561: entry point must be defined" when building benchmarks which need main function provided by static library libnvbench_main after NVIDIA#350
|
/ok to test f8c0554 |
|
/ok to test 6c44ec6 |
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
ci/windows/install_cuda_profiler_api.ps1 (1)
90-90: ⚡ Quick winsuggestion: add bounded retry/backoff around
Invoke-WebRequestto reduce transient network flake in CI and avoid expensive reruns.-Invoke-WebRequest -Uri $cudaVersionUrl -OutFile $installer -UseBasicParsing +$maxAttempts = 3 +for ($attempt = 1; $attempt -le $maxAttempts; $attempt++) { + try { + Invoke-WebRequest -Uri $cudaVersionUrl -OutFile $installer -UseBasicParsing + break + } catch { + if ($attempt -eq $maxAttempts) { throw } + Start-Sleep -Seconds (5 * $attempt) + } +}As per coding guidelines, "For CI and build scripts, focus on matrix correctness, targeted build/test behavior, cache/artifact handling, environment setup, GPU availability assumptions, clear failures, and avoiding unnecessary expensive jobs."
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 59672d05-2943-4844-a992-53b60f173412
📒 Files selected for processing (11)
.github/workflows/build-windows.yml.github/workflows/pr.ymlCMakeLists.txtci/windows/install_cuda_profiler_api.ps1cmake/NVBenchCUPTI.cmakecmake/NVBenchConfigTarget.cmakenvbench/CMakeLists.txtnvbench/config.cuh.intesting/axes_metadata.cutesting/cmake/CMakeLists.txttesting/cmake/test_export/CMakeLists.txt
💤 Files with no reviewable changes (1)
- .github/workflows/pr.yml
|
/ok to test 7f2a6dc |
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
.github/workflows/build-windows.yml (1)
120-135: suggestion: Installing the CUDA profiler component during every build adds a network dependency and extra job time to the Windows lane. If you control the RAPIDS image, it would be more stable to preinstallcuda_profiler_api_<version>there and keep this step only as a fallback path. As per coding guidelines, "For CI and build scripts, focus on matrix correctness, targeted build/test behavior, cache/artifact handling, environment setup, GPU availability assumptions, clear failures, and avoiding unnecessary expensive jobs."
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 19822ea1-862f-4ee9-bd03-71d5e885d3ac
📒 Files selected for processing (11)
.github/workflows/build-windows.yml.github/workflows/pr.ymlCMakeLists.txtci/windows/install_cuda_profiler_api.ps1cmake/NVBenchCUPTI.cmakecmake/NVBenchConfigTarget.cmakenvbench/CMakeLists.txtnvbench/config.cuh.intesting/axes_metadata.cutesting/cmake/CMakeLists.txttesting/cmake/test_export/CMakeLists.txt
💤 Files with no reviewable changes (1)
- .github/workflows/pr.yml
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
ci/windows/install_cuda_profiler_api.ps1 (2)
64-76: ⚡ Quick winimportant: Fail fast on permanent download errors.
A 404/403 from this URL is deterministic here. Retrying it three times just burns Windows CI minutes; rethrow client errors immediately and keep retries for transient failures such as network timeouts or 5xx responses.
As per coding guidelines, "For CI and build scripts, focus on matrix correctness, targeted build/test behavior, cache/artifact handling, environment setup, GPU availability assumptions, clear failures, and avoiding unnecessary expensive jobs."
129-136: ⚡ Quick winimportant: Add a timeout around the installer process.
Start-Process -Waitcan hang indefinitely on a broken or interactive installer, which would pin the whole Windows job. UseWait-Process -TimeoutorProcess.WaitForExit(timeout)and fail explicitly on timeout.As per coding guidelines, "For CI and build scripts, focus on matrix correctness, targeted build/test behavior, cache/artifact handling, environment setup, GPU availability assumptions, clear failures, and avoiding unnecessary expensive jobs."
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: dbd3ed7a-ce12-4db4-a683-6c07f6ac0c35
📒 Files selected for processing (11)
.github/workflows/build-windows.yml.github/workflows/pr.ymlCMakeLists.txtci/windows/install_cuda_profiler_api.ps1cmake/NVBenchCUPTI.cmakecmake/NVBenchConfigTarget.cmakenvbench/CMakeLists.txtnvbench/config.cuh.intesting/axes_metadata.cutesting/cmake/CMakeLists.txttesting/cmake/test_export/CMakeLists.txt
💤 Files with no reviewable changes (1)
- .github/workflows/pr.yml
4xx responses now fail immediately, and the installer is bounded to 15 minutes before being killed and reported as a timeout.
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
|
/ok to test adabe4a |
There was a problem hiding this comment.
Actionable comments posted: 2
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 9d57dc12-bdfa-4833-b076-d34bc7f13830
📒 Files selected for processing (11)
.github/workflows/build-windows.yml.github/workflows/pr.ymlCMakeLists.txtci/windows/install_cuda_profiler_api.ps1cmake/NVBenchCUPTI.cmakecmake/NVBenchConfigTarget.cmakenvbench/CMakeLists.txtnvbench/config.cuh.intesting/axes_metadata.cutesting/cmake/CMakeLists.txttesting/cmake/test_export/CMakeLists.txt
💤 Files with no reviewable changes (1)
- .github/workflows/pr.yml
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
|
/ok to test c6347b5 |
There was a problem hiding this comment.
Actionable comments posted: 1
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 07518bbb-5d90-41ad-81ed-2e80401e387c
📒 Files selected for processing (11)
.github/workflows/build-windows.yml.github/workflows/pr.ymlCMakeLists.txtci/windows/install_cuda_profiler_api.ps1cmake/NVBenchCUPTI.cmakecmake/NVBenchConfigTarget.cmakenvbench/CMakeLists.txtnvbench/config.cuh.intesting/axes_metadata.cutesting/cmake/CMakeLists.txttesting/cmake/test_export/CMakeLists.txt
💤 Files with no reviewable changes (1)
- .github/workflows/pr.yml
| for ($attempt = 1; $attempt -le $MaxAttempts; $attempt++) { | ||
| try { | ||
| Remove-Item $OutFile -ErrorAction SilentlyContinue | ||
| Invoke-WebRequest -Uri $Uri -OutFile $OutFile -UseBasicParsing | ||
| return | ||
| } catch { | ||
| $statusCode = Get-HttpStatusCodeFromError -ErrorRecord $_ | ||
| if ($statusCode -ge 400 -and $statusCode -lt 500) { | ||
| throw "Download failed with non-retryable HTTP status $statusCode from '$Uri'. $_" | ||
| } | ||
|
|
||
| if ($attempt -eq $MaxAttempts) { | ||
| throw | ||
| } | ||
|
|
||
| $delaySeconds = 5 * $attempt | ||
| Write-Warning "Download failed on attempt $attempt of $MaxAttempts. Retrying in $delaySeconds seconds. $_" | ||
| Start-Sleep -Seconds $delaySeconds |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cat -n ci/windows/install_cuda_profiler_api.ps1 | sed -n '73,110p'Repository: NVIDIA/nvbench
Length of output: 1514
important: Add -TimeoutSec 300 to the Invoke-WebRequest call on line 91.
Without a timeout, a stalled download hangs indefinitely until the job-level timeout, bypassing the retry-backoff logic and wasting expensive Windows runner resources. Set -TimeoutSec so the exponential backoff can actually recover from hung connections.
|
/ok to test 177c7b0 |
I made the necessary changes for NVBench to build and the tests to pass. I also tested that it works when using NVBench inside of my own project, and made sure that the scripts
ci/build_nvbench.shandci/test_nvbench.shwork. How should support testing be enabled for the CI? I have a local commit with some changes, but did not want to add them here and possibly trip the CI run.Here is a summary of the changes, disclaimer that they were made by Claude and verified by me, but I am in no way a CMake expert:
#: 1
File: CMakeLists.txt
Change: CMAKE_WINDOWS_EXPORT_ALL_SYMBOLS ON
Failure without it: Link error LNK1181: cannot open input file 'lib\nvbench.lib'. MSVC only generates a .lib import library when the DLL exports symbols. NVBench has no
__declspec(dllexport) annotations, so without this CMake flag, no import library is produced and all downstream targets fail to link.
#: 2
File: cmake/NVBenchCUPTI.cmake
Change: IMPORTED_IMPLIB instead of IMPORTED_LOCATION on Win32
Failure without it: CMake generate error IMPORTED_IMPLIB not set for imported target "nvbench::cupti". On Windows, find_library locates .lib import libraries. A SHARED IMPORTED
target
on Windows requires the .lib path via IMPORTED_IMPLIB (the import library), not IMPORTED_LOCATION (which expects the .dll).
#: 3
File: cmake/NVBenchConfigTarget.cmake
Change: FMT_UNICODE=0, -Xcompiler=/utf-8, --diag_suppress=27
Failure without it: Build errors in every .cu file. (a) fmtlib 11 static-asserts that /utf-8 mode is active — MSVC's host compiler satisfies this with -Xcompiler=/utf-8, but cudafe
evaluates the check independently and always fails, requiring FMT_UNICODE=0 for CUDA. (b) fmtlib's lookup tables use out-of-range char32_t sentinel values that cudafe rejects,
requiring --diag_suppress=27.
#: 4
File: cmake/NVBenchConfigTarget.cmake
Change: AND NOT WIN32 on INSTALL_RPATH
Failure without it: No failure. INSTALL_RPATH is a Unix/ELF concept silently ignored on Windows. The guard is purely a hygiene fix.
#: 5
File: nvbench/config.cuh.in
Change: MSVC_LANG instead of _cplusplus
Failure without it: Build error #error: "NVBench requires a C++17 compiler." in every .cxx file. MSVC reports __cplusplus as 199711L (C++98) regardless of actual standard, unless
/Zc:__cplusplus is passed. _MSVC_LANG always reflects the real standard level.
#: 6
File: testing/axes_metadata.cu
Change: #include
Failure without it: Build error namespace "std" has no member "back_inserter". MSVC's STL doesn't transitively include from like GCC's libstdc++ does.
#: 7
File: testing/cmake/CMakeLists.txt
Change: Forward CMAKE_CUDA_HOST_COMPILER, CMAKE_LINKER, CMAKE_RC_COMPILER, CMAKE_MT
Failure without it: Test failure CUDA_ARCHITECTURES is set to "native", but no NVIDIA GPU was detected. The sub-project cmake configure can't compile/link the GPU query program
#: 8
File: testing/cmake/CMakeLists.txt
Change: ENVIRONMENT "PATH=..." with nvbench bin + CUPTI lib dirs
Failure without it: No failure when run via the build script (which pre-sets PATH). Needed for robustness when ctest is invoked directly — the Windows equivalent of the
LD_LIBRARY_PATH setup the sub-project already has for Unix.
#: 9
File: testing/cmake/test_export/CMakeLists.txt
Change: Add Windows PATH setup for sub-project tests (parallel to existing Unix LD_LIBRARY_PATH)
Reason: The original code only set LD_LIBRARY_PATH on Unix and did nothing on Windows. The sub-project's test_bench.exe and nvbench-ctl.exe need
nvbench.dll and CUPTI DLLs at runtime. On Unix the build tree embeds RUNPATH into the binary so the executable finds libnvbench.so without environment
help; only CUPTI and the install tree need LD_LIBRARY_PATH. Windows has no RUNPATH equivalent — DLL lookup always goes through PATH — so the
sub-project
must set PATH for both tree types. Previously this worked only because the outer test in testing/cmake/CMakeLists.txt set an ENVIRONMENT property on
the ctest --build-and-test process, which the inner CTest happened to inherit. This fix makes the sub-project self-sufficient: it reads the nvbench DLL and CUPTI library locations from imported target properties and sets PATH itself. The shared code resolves the imported configuration once, then
branches only for the CUPTI property name (IMPORTED_IMPLIB on Windows vs IMPORTED_LOCATION on Unix, since find_library locates .lib import libraries on Windows) and the environment variable format.