Fix new api#220
Open
songqing-sq wants to merge 8 commits into
Open
Conversation
…ed dependency resolution
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Each commit fixes one problem, is independently reviewable, and includes verified
reproduction evidence from actual Bazel builds.
Net change to
apt/is identical todev:git diff reorg-commits dev -- apt/is empty.Test setup used for all Bazel evidence:
All Bazel builds were run on a remote Alibaba Cloud Linux 3 (RHEL-based) machine
with Bazel 8.5.1. Ubuntu was not used because Ubuntu's
/lib/x86_64-linux-gnu/path layout makes dangling absolute symlinks from Debianpackages resolve accidentally, masking the bugs.
/tmp/repro/MODULE.bazeluseslocal_path_overrideto point at the relevantcode version for each reproduction, and
apt.installfetches packages from theDebian bookworm snapshot at
https://sonic-build.alibaba-inc.com/debian_snapshot/20260410.Commit 1 —
fix dangling symbolic link: support intra-package symlinksProblem
Debian packages contain two categories of symlinks:
(e.g.
libssl.so → libssl.so.3wherelibssl.so.3lives inlibssl3).(e.g.
libcurses.so → libncurses.sowhere both live inlibncurses-dev).The
deb_importdiscovery loop adds every.so-matching path toso_filesregardlessof whether it is a regular file or a symlink. Then self-symlinks are identified and
popped from
symlinks. But at the pointsymlink_outsis computed they are alreadygone:
deb_exportextractsoutsfrom the tar archive in a single action. Sincelibcurses.sois a symlink entry in the tar, the extraction creates a symlink on thefilesystem. Bazel 8.6.0 is lenient about symlinks in declared file outputs when the tar
covers both the symlink and its target, so the build succeeds — but the generated BUILD
is structurally wrong:
libcurses.sois declared as a regular output, not actx.actions.symlink()output.Reproduction
The bug affects two categories of symlinks: absolute symlinks (e.g.
lib64/ld-linux-x86-64.so.2 → /lib/x86_64-linux-gnu/ld-linux-x86-64.so.2in libc6)and self-symlinks (e.g.
libcurses.so → libncurses.soin libncurses-dev). Bothcategories land in
outsinstead ofsymlink_outs. Ubuntu masks the absolute-symlinkcase because
/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2exists on Ubuntu; AlibabaCloud Linux does not have this path, so the symlink is dangling and Bazel rejects it.
Tar listing — libc6 package contains
lib64/ld-linux-x86-64.so.2as an absolute symlink:Buggy generated BUILD —
lib64/ld-linux-x86-64.so.2inouts;symlink_outsempty:Actual error on remote (Bazel 8.5.1, Alibaba Cloud Linux 3):
Fixed code (with
local_path_overridepointing at/tmp/rules_fixed) succeeds:Structural evidence for self-symlinks (
libncurses-dev):Buggy BUILD puts
libcurses.soinoutsalongside regular files;symlink_outsomits it; no
self_symlinksattribute. On RHEL the relative symlink still resolves(both files land in the same extracted directory), but the generated BUILD is
structurally incorrect.
Fix
Fixed generated BUILD (
/tmp/repro_bazel_out/external/rules_distroless++apt+bookworm_libncurses-dev-amd64_6.4-4/BUILD.bazel):deb_exportgenerates an explicitctx.actions.symlink()for eachself_symlinksentry, using the package's own
outsas the target.Commit 2 —
apt: add merge_directory rule and :directory target to dependency_setProblem
Cross-compilation toolchains (sysroots) need a single merged directory tree containing
all headers and libraries. The hub repo's
dependency_setproduced only:data(aflat list of files) with no merged directory. Downstream toolchains had to reconstruct
a directory tree manually.
Reproduction
Buggy hub BUILD — no
:directorytarget (code atebfd74a):Fix
New
merge_directoryStarlark rule accepts multipleDirectoryInfoproviders andcreates a single unified directory via symlinks.
Fixed hub BUILD:
Commit 3 —
deb_export/deb_import: support GNU ld linker scripts in .so filesProblem
Some
.sofiles in-devpackages are not ELF shared libraries — they are GNU ldlinker scripts: text files with
GROUP(),INPUT(), orOUTPUT_FORMAT()directives.Verified from the fetched
libc6-devpackage (same content as on system):The absolute paths (
/lib/x86_64-linux-gnu/libm.so.6) do not exist inside Bazel'shermetic sandbox. The buggy code copies this file verbatim into the Bazel output tree.
Reproduction
Repro sources (
/tmp/repro/, module pointing at/tmp/rules_before3/):libncurses.soin Bazel output — raw linker script, not rewritten:The linker script makes the linker pull in
libncurses.so.6from the deb package.That
.so.6was built on Debian 12 against GLIBC 2.34; the remote machine has anolder GLIBC.
Actual build error on remote (Bazel 8.5.1, Alibaba Cloud Linux 3):
libm.soandlibc.soinlibc6-devalso contain absolute-path linker scripts:Paths like
/lib/x86_64-linux-gnu/libm.so.6do not exist in Bazel's hermeticsandbox. Any build that links against
libc6-dev:morlibc6-dev:cin a fullyhermetic container (RBE, Docker with no bind-mounts) fails with the corresponding
cannot finderror.(On the remote machine
/lib/x86_64-linux-gnu/libm.so.6happened to exist, solibm.soviaadditional_linker_inputsdidn't break directly — the GLIBC errorfrom the ncurses linker script is the concrete on-machine failure.)
Evidence: fixed BUILD — linker scripts detected, paths rewritten
After commit 3 (code at HEAD),
deb_importclassifieslibncurses.so,libm.so,etc. as linker scripts at fetch time and rewrites absolute paths to Bazel-relative
$$BINDIR/external/<repo>/...paths.Fixed generated BUILD —
libm.soclassified as linkscript:Follow-on regression: self-symlink whose target is a linker script
Adding
linkscript_outsas a new output category breaks a case inlibncurses-dev:libcurses.so → libncurses.sois a self-symlink andlibncurses.sois itself a linkerscript.
Tar listing:
After adding linkscript support,
libncurses.somoves fromoutstolinkscript_outs.The self-symlink resolution code in
deb_exportonly looked up the symlink target inouts. Withlibncurses.sonow absent fromouts,libcurses.so's target isunresolvable:
Fixed generated BUILD confirms both are now present:
Design
Commit 4 —
deb_import: replace so_library with per-.so cc_import and readelf-based dependency resolutionProblem
The previous architecture modeled C++ library dependencies at package granularity:
one
so_libraryrule per package, bundling all.sofiles together. This caused threeindependent bugs that share the same root cause.
Bug A —
cc_shared_libraryfails to link symbolsso_librarycreates one empty GNU ld script per package directory to use as aninterface library:
cc_shared_libraryusesinterface_libraryfor symbol resolution, notdynamic_library.The empty script exports nothing.
Reproduction
Buggy generated BUILD —
so_librarywith all.sofiles; per-libcc_importtargets uselibXxx_importnaming:cc_importnames like:ncurses(fixed naming) do not exist — only:libncurses_import:Bug A — empty interface library breaks
cc_shared_library:so_librarywrites a comment-only linker script as theinterface_libraryfor every.so:cc_shared_libraryuses theinterface_libraryto determine which symbols the sharedlibrary provides. Since the interface library is empty, the resulting
cc_shared_libraryoutput does not record any dynamic linkage to the deb package's
.so. When a downstreamcc_binarylinks against it viadynamic_deps, boost symbols are unresolved.Reproduction (code at
1aeef2d, before commit 4):Fixed code (after commit 4, per-
.socc_importwithout empty interface library)builds successfully:
Bug B —
.pc-Ddefines silently droppedpkgconfig.bzlparsesCflags: -DFOO=1into adefineslist, but neither_CC_IMPORT_TMPLnor_CC_LIBRARY_TMPLhad a{defines}placeholder. For example,hiredisexports-D_FILE_OFFSET_BITS=64via its.pcfile. Without this define,consumers compile with the wrong file offset type.
Why refactor instead of patch
Bug A requires running
readelf -dWto get each.so's NEEDED dependencies — oncethat is in place, the natural output is one
cc_importper.so. At that pointso_libraryis structurally obsolete. The refactor fixes all three bugs and improvesdependency precision from package-level to library-level.
New Architecture
Fixed generated BUILD (
/tmp/repro_bazel_out/external/rules_distroless++apt+bookworm_libncurses-dev-amd64_6.4-4/BUILD.bazel):Dependency resolution via
readelf:LC_ALL=Cis required: non-C locales render[as【(fullwidth bracket), breakingthe grep pattern.
Additional fixes included in this commit
-Ipaths from.pcCflags are honored (before: only the first was used)definesfrom.pc-Dflags are now emitted intocc_library(defines = [...])Commit 5 —
apt: add per-.so aliases to hub repoProblem
Consumers had to use architecture-specific internal Bazel labels:
Reproduction
Buggy hub BUILD — only a single package-level alias (code at
ebfd74a):Querying a specific library target fails:
Fix
The module extension scans each
.deb's file listing and generates onealias()per.sowithselect()for each architecture.Fixed hub BUILD — per-.so aliases:
Consumers use:
Commit 6 —
deb_import: detect multiarch include dirs from header file pathsProblem
Some packages ship architecture-specific headers in
usr/include/<triplet>/(e.g.usr/include/x86_64-linux-gnu/) but do not list this directory in.pcCflags.Primary headers include these with bare filenames.
Verified from
liblua5.1-0-dev_5.1.5-9build1_amd64.deb:Reproduction
Package:
linux-libc-dev— ships all kernel headers, includingusr/include/x86_64-linux-gnu/asm/types.h. Has no.pcfile, sohdrs_includesderives only from
.pcCflags and is empty; the multiarch subdir is never added.Repro source:
Buggy generated BUILD for
linux-libc-dev(module pointing at/tmp/rules_before6/):Fixed generated BUILD (module pointing at
/tmp/rules_fixed/):Note on hard error:
bazel build //:asm_testwith buggy code succeeded on theremote machine because the linux-sandbox allowed reading the host's
/usr/include/asm/types.h. The compiler's.ddependency file confirmed hostinclude was used. In a hermetic build (RBE, Docker without bind-mounts of
/usr/include), the buggy code produces:Fix
At fetch time, scan actual header file paths. For any header found under
usr/include/<first-component>/where the component matches-linux-(multiarchtriplet pattern), add that directory to
includes:The fix covers both
linux-libc-dev(kernel headers, no.pc) andlibc6-dev(C runtime headers, no
.pc) and any other package that ships multiarch-dir headerswithout listing the directory in its
.pcCflags.Commit 7 —
deb_import: support .ipp inline template files in -dev packagesProblem
Boost and other C++ libraries use
.ippfiles (inline template implementations) thatare
#included directly by public headers. The file scan only recognized*.hand*.hpp, so.ippfiles fell through to theelse: continuebranch and were neveradded to
outs.Verified from
libboost1.74-dev_1.74.0-18ubuntu2_amd64.deb:Reproduction
Buggy file-scan loop —
.ipphas no matching branch:Buggy
outsconstruction —.ippnever included:When a translation unit includes any header that
#includes a.ippfile:Fix