Skip to content

[Comgr] Skip embedded libc++ -idirafter when system C++ headers exist; fix VFS path mismatch with clang resource dir#2478

Open
yxsamliu wants to merge 1 commit into
amd-mainfrom
amd/dev/yaxunl/fix-comgr-libcxx-stdlibcxx-conflict
Open

[Comgr] Skip embedded libc++ -idirafter when system C++ headers exist; fix VFS path mismatch with clang resource dir#2478
yxsamliu wants to merge 1 commit into
amd-mainfrom
amd/dev/yaxunl/fix-comgr-libcxx-stdlibcxx-conflict

Conversation

@yxsamliu
Copy link
Copy Markdown

This commit fixes two related issues with comgr's embedded libc++ headers
support that surface together when compiling HIP with -nogpuinc on
real-world Linux hosts.

Issue 1: libstdc++/libc++ header conflict (ROCm-issue-2445)

When -nogpuinc is active and embedded libc++ headers are loaded into
the VFS, comgr injected -idirafter <llvm>/include/c++/v1. On distros
that ship libstdc++ at clang's default include path (RHEL/manylinux/
Ubuntu), this produces a hybrid include chain. The host libstdc++
<array>/<stdexcept> pull in a transitive <stddef.h> from the
embedded libc++ overlay, whose #include_next then has no successor
under -nogpuinc and fails.

Example: #include <array> → libstdc++ <array> → libc++ stddef.h
(via VFS) → #include_next <stddef.h> → no next file → error.

The embedded set is partial by design (LIBCXX_USER_HEADERS in
cmake/LibcxxHeaders.cmake), so it can only safely substitute when no
host C++ standard library is present.

Fix: detect host libstdc++ / libc++ on disk (honoring --sysroot=) and
skip the -idirafter injection when found. Also skip when the user
passed -nostdinc++, -nostdinc, or -nostdlibinc. Add
AMD_COMGR_USE_EMBEDDED_LIBCXX={auto,force,disable} env override.

Issue 2: VFS embed paths drift from clang's resource-dir path

The clang Driver and the VFS embed code each constructed a clang binary
path from LLVM_PATH, but using different string-joining functions.
When LLVM_PATH was unset, the two disagreed:

Driver: (Twine("") + "/bin/clang").str() → "/bin/clang"
Embed: SmallString<256>(""); append("bin","clang") → "bin/clang"

Both then fed clang::GetResourcesPath(), which strips two parents and
appends lib/clang/<ver>:

"/bin/clang" → "/lib/clang/23" (absolute)
"bin/clang" → "lib/clang/23" (relative)

So clang searched /lib/clang/23/include/stddef.h while comgr planted
the file at lib/clang/23/include/stddef.h. Mismatch → header not
found.

Fix: extract a single env::getClangBinaryPath() helper used by both
the Driver and the VFS embed code, so the two paths cannot drift again.
When LLVM_PATH is unset, locate the real clang via dladdr on the
loaded libamd_comgr.so and probe sibling layouts (<prefix>/llvm/bin/ clang for ROCm, <prefix>/bin/clang for standard installs); fall back
to /bin/clang only if neither is found, which keeps Driver and VFS
consistent even on unusual layouts.

Tests: added compile_hip_stdlibcxx_conflict_test (system-libstdc++
chain), compile_hip_with_system_libcxx_test (asserts skip-detection
fires when system libc++ present), compile_hip_distroless_test
(forces embedded path via env override). All three skip cleanly on
Windows; the conflict test also skips when no clang resource dir is
reachable on disk (no false failure when prerequisites aren't present).

…; fix VFS path mismatch with clang resource dir

This commit fixes two related issues with comgr's embedded libc++ headers
support that surface together when compiling HIP with `-nogpuinc` on
real-world Linux hosts.

Issue 1: libstdc++/libc++ header conflict (ROCm-issue-2445)

When `-nogpuinc` is active and embedded libc++ headers are loaded into
the VFS, comgr injected `-idirafter <llvm>/include/c++/v1`. On distros
that ship libstdc++ at clang's default include path (RHEL/manylinux/
Ubuntu), this produces a hybrid include chain. The host libstdc++
`<array>`/`<stdexcept>` pull in a transitive `<stddef.h>` from the
embedded libc++ overlay, whose `#include_next` then has no successor
under `-nogpuinc` and fails.

Example: `#include <array>` → libstdc++ `<array>` → libc++ `stddef.h`
(via VFS) → `#include_next <stddef.h>` → no next file → error.

The embedded set is partial by design (LIBCXX_USER_HEADERS in
cmake/LibcxxHeaders.cmake), so it can only safely substitute when no
host C++ standard library is present.

Fix: detect host libstdc++ / libc++ on disk (honoring `--sysroot=`) and
skip the `-idirafter` injection when found. Also skip when the user
passed `-nostdinc++`, `-nostdinc`, or `-nostdlibinc`. Add
`AMD_COMGR_USE_EMBEDDED_LIBCXX={auto,force,disable}` env override.

Issue 2: VFS embed paths drift from clang's resource-dir path

The clang Driver and the VFS embed code each constructed a clang binary
path from `LLVM_PATH`, but using different string-joining functions.
When `LLVM_PATH` was unset, the two disagreed:

  Driver: `(Twine("") + "/bin/clang").str()`        → "/bin/clang"
  Embed:  `SmallString<256>(""); append("bin","clang")` → "bin/clang"

Both then fed `clang::GetResourcesPath()`, which strips two parents and
appends `lib/clang/<ver>`:

  "/bin/clang"  → "/lib/clang/23"   (absolute)
  "bin/clang"   → "lib/clang/23"    (relative)

So clang searched `/lib/clang/23/include/stddef.h` while comgr planted
the file at `lib/clang/23/include/stddef.h`. Mismatch → header not
found.

Fix: extract a single `env::getClangBinaryPath()` helper used by both
the Driver and the VFS embed code, so the two paths cannot drift again.
When `LLVM_PATH` is unset, locate the real clang via `dladdr` on the
loaded libamd_comgr.so and probe sibling layouts (`<prefix>/llvm/bin/
clang` for ROCm, `<prefix>/bin/clang` for standard installs); fall back
to `/bin/clang` only if neither is found, which keeps Driver and VFS
consistent even on unusual layouts.

Tests: added compile_hip_stdlibcxx_conflict_test (system-libstdc++
chain), compile_hip_with_system_libcxx_test (asserts skip-detection
fires when system libc++ present), compile_hip_distroless_test
(forces embedded path via env override). All three skip cleanly on
Windows; the conflict test also skips when no clang resource dir is
reachable on disk (no false failure when prerequisites aren't present).
@yxsamliu yxsamliu requested review from chinmaydd and lamb-j as code owners May 11, 2026 03:06
@lamb-j
Copy link
Copy Markdown
Collaborator

lamb-j commented May 11, 2026

Can this be split into two separate PRs for the two issues? Also comments are a bit verbose (ok if needed for the complexity of the code, but otherwise shorter would be nice)

@lamb-j
Copy link
Copy Markdown
Collaborator

lamb-j commented May 11, 2026

Let me revive #1536, I think it gets at your second issue

@yxsamliu
Copy link
Copy Markdown
Author

Thanks Jacob, makes sense to split. Plan: PR A will be a small follow-up to #1536 that updates the VFS embed site in AMDGPUCompiler ctor (~line 2417) to also use getClangExecutable() — right now it still builds the clang path via getLLVMPath() + sys::path::append, which is the half causing the resource-dir mismatch with the Driver, and that site needs updating anyway once getLLVMPath() is gone. PR B will be the libstdc++/libc++ skip logic stacked on PR A, with the comments trimmed down. I'll wait for #1536 to settle before posting either.

@chinmaydd chinmaydd added the comgr Related to Code Object Manager label May 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comgr Related to Code Object Manager

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants