Describe the bug
It appears that gcs_file.integrity is the actual deciding factor on which file is downloaded1.
TL;DR:
- Get two valid integrity hashes for files in the same bucket.
- Set
gcs_file.url to fileA.
- Set
gcs_file.integrity to that of fileB.
- Notice that fileB is what's actually downloaded.
To Reproduce
Create two gcs_file rules. The first one (keep-sorted in this case) is correct. The 2nd one should use the same integrity as the first - eg, a copy-paste error:
load("@rules_gcs//gcs:repo_rules.bzl", "gcs_file")
BUCKET = "redacted/"
KEEP_SORTED_BINARY = "keep-sorted-v0.2.0-8c6ebc8-cgo0"
TXTPBFMT_BINARY = "txtpbfmt-20231218-084445f-cgo0"
gcs_file(
name = "keep-sorted",
downloaded_file_path = "keep-sorted",
executable = True,
integrity = "sha256-fJNSs7rMk1mhP0tW3vPTxXf7G4VE5BZ/GNF6AevePGs=",
url = "gs://" + BUCKET + KEEP_SORTED_BINARY,
)QH_DOWNLOADS_
gcs_file(
name = "txtpbfmt",
downloaded_file_path = "txtpbfmt",
executable = True,
# integrity = "sha256-d01zAfX06+omGiFg4snN24BDxPolTLhOYRm2khDsS7M=", # correct hash
integrity = "sha256-fJNSs7rMk1mhP0tW3vPTxXf7G4VE5BZ/GNF6AevePGs=", # duplicate hash of keep-sorted
url = "gs://" + BUCKET + TXTPBFMT_BINARY, # Note: different URL
)
The above happens to be in a module_extension called pyle3_deps in private/extensions.bzl, but I don't think that should matter. Maybe it does because of 1.
Create a target that consumes both binaries. Or whatever you want to do to get Bazel to actually download things.
Investigate the binaries in the external path:
$ # the "good" one is 3.5M and hashes to 7c9352b (which is the same as the encoded integrity)
$ ls -lah $(bazel info output_base)/external/+pyle3_deps+keep-sorted/keep-sorted
INFO: Invocation ID: b3e0038e-b4af-4e3e-8d6c-5a428e3985ad
-rwxr-x--x 1 dthor primarygroup 3.5M Mar 6 21:20 /usr/local/google/home/dthor/.cache/bazel/_bazel_dthor/0f8c52850e7230283fc2f8033149fba2/external/+pyle3_deps+keep-sorted/keep-sorted
$ sha256sum !$
sha256sum $(bazel info output_base)/external/+pyle3_deps+keep-sorted/keep-sorted
INFO: Invocation ID: 7c696b7a-015d-483c-9b5d-ee88c8f42369
7c9352b3bacc9359a13f4b56def3d3c577fb1b8544e4167f18d17a01ebde3c6b /usr/local/google/home/dthor/.cache/bazel/_bazel_dthor/0f8c52850e7230283fc2f8033149fba2/external/+pyle3_deps+keep-sorted/keep-sorted
$
$ # The "bad" one is... also 3.5M and hashes to 7c9352b ??
$ ls -lah $(bazel info output_base)/external/+pyle3_deps+txtpbfmt/txtpbfmt
INFO: Invocation ID: 8c555ffd-79ca-408d-b5ba-c5ca7734a51c
-rwxr-x--x 1 dthor primarygroup 3.5M Mar 28 03:22 /usr/local/google/home/dthor/.cache/bazel/_bazel_dthor/0f8c52850e7230283fc2f8033149fba2/external/+pyle3_deps+txtpbfmt/txtpbfmt
$ sha256sum !$
sha256sum $(bazel info output_base)/external/+pyle3_deps+txtpbfmt/txtpbfmt
INFO: Invocation ID: 709562fb-e5a1-4c52-ac40-f9cad6217092
7c9352b3bacc9359a13f4b56def3d3c577fb1b8544e4167f18d17a01ebde3c6b /usr/local/google/home/dthor/.cache/bazel/_bazel_dthor/0f8c52850e7230283fc2f8033149fba2/external/+pyle3_deps+txtpbfmt/txtpbfmt
We can even run both and assert they're the same program:
$ $(bazel info output_base)/external/+pyle3_deps+keep-sorted/keep-sorted --version
INFO: Invocation ID: 65da1aa3-9244-4798-823d-9d8849e408d0
unknown flag: --version
Usage of /usr/local/google/home/dthor/.cache/bazel/_bazel_dthor/0f8c52850e7230283fc2f8033149fba2/external/+pyle3_deps+keep-sorted/keep-sorted:
--color string Whether to color debug output. One of "always", "never", or "auto" (default "auto")
--lines line ranges Line ranges of the form "start:end". Only processes keep-sorted blocks that overlap with the given line ranges. Can only be used when fixing a single file. (default [])
--mode mode Determines what mode to run this tool in. One of ["fix" "lint"]
-v, --verbose count Log more verbosely
unknown flag: --version
$
$ # same output for a "different" file.
$ $(bazel info output_base)/external/+pyle3_deps+txtpbfmt/txtpbfmt --version
INFO: Invocation ID: 54257d07-2aba-42e1-8656-24a7a94d8f2e
unknown flag: --version
Usage of /usr/local/google/home/dthor/.cache/bazel/_bazel_dthor/0f8c52850e7230283fc2f8033149fba2/external/+pyle3_deps+txtpbfmt/txtpbfmt:
--color string Whether to color debug output. One of "always", "never", or "auto" (default "auto")
--lines line ranges Line ranges of the form "start:end". Only processes keep-sorted blocks that overlap with the given line ranges. Can only be used when fixing a single file. (default [])
--mode mode Determines what mode to run this tool in. One of ["fix" "lint"]
-v, --verbose count Log more verbosely
unknown flag: --version
Fix the bad target by setting the correct integrity:
integrity = "sha256-d01zAfX06+omGiFg4snN24BDxPolTLhOYRm2khDsS7M=", # correct hash
# integrity = "sha256-fJNSs7rMk1mhP0tW3vPTxXf7G4VE5BZ/GNF6AevePGs=", # duplicate hash of keep-sorted
url = "gs://" + BUCKET + TXTPBFMT_BINARY, # Note: different URL
Get Bazel to pull the files again, and inspect:
$ # The good one hasn't changed, nothing interesting here
$ ls -lah $(bazel info output_base)/external/+pyle3_deps+keep-sorted/keep-sorted
INFO: Invocation ID: e59e4022-0456-42bc-bebe-f91d9cdbbfb1
-rwxr-x--x 1 dthor primarygroup 3.5M Mar 6 21:20 /usr/local/google/home/dthor/.cache/bazel/_bazel_dthor/0f8c52850e7230283fc2f8033149fba2/external/+pyle3_deps+keep-sorted/keep-sorted
$ sha256sum $(bazel info output_base)/external/+pyle3_deps+keep-sorted/keep-sorted
INFO: Invocation ID: 3f26b282-eb35-49cf-ad15-dfe0d1924443
7c9352b3bacc9359a13f4b56def3d3c577fb1b8544e4167f18d17a01ebde3c6b /usr/local/google/home/dthor/.cache/bazel/_bazel_dthor/0f8c52850e7230283fc2f8033149fba2/external/+pyle3_deps+keep-sorted/keep-sorted
$
$ # But now txtpbfmt is fixed!
$ ls -lah $(bazel info output_base)/external/+pyle3_deps+txtpbfmt/txtpbfmt
INFO: Invocation ID: 3a28cf67-7a98-4f00-9dbf-b9ca24335211
-rwxr-x--x 1 dthor primarygroup 2.8M Mar 28 03:39 /usr/local/google/home/dthor/.cache/bazel/_bazel_dthor/0f8c52850e7230283fc2f8033149fba2/external/+pyle3_deps+txtpbfmt/txtpbfmt
$ sha256sum $(bazel info output_base)/external/+pyle3_deps+txtpbfmt/txtpbfmt
INFO: Invocation ID: 368864b8-fb0d-4126-89b1-48a5794be42e
774d7301f5f4ebea261a2160e2c9cddb8043c4fa254cb84e6119b69210ec4bb3 /usr/local/google/home/dthor/.cache/bazel/_bazel_dthor/0f8c52850e7230283fc2f8033149fba2/external/+pyle3_deps+txtpbfmt/txtpbfmt
$
$ # Confirmed by checking the help of the binary
$ $(bazel info output_base)/external/+pyle3_deps+txtpbfmt/txtpbfmt --version
INFO: Invocation ID: 0858fd54-a480-4ba5-93f6-12b092f60c16
flag provided but not defined: -version
Usage of /usr/local/google/home/dthor/.cache/bazel/_bazel_dthor/0f8c52850e7230283fc2f8033149fba2/external/+pyle3_deps+txtpbfmt/txtpbfmt:
-allow_triple_quoted_strings
Allow Python-style """ or ''' delimited strings in input.
-alsologtostderr
log to standard error as well as files
-dry_run
Enable dry run mode.
...
Expected behavior
The actual txtpbfmt binary should be downloaded and a checksum integrity error should be shown.
Environment
- OS name + version: Debian 13
- Version of the code:
rules_gcs 1.0.0
Additional context
Technically two targets are not needed to reproduce - only two valid integrity values of files in the bucket. It's just easier to see "same" and "different" with two targets.
I haven't checked to see if using the integrity value of a file in a different bucket - or even a different directory within the same bucket - has a different effect.
This all came about because of some lazy copy-paste-edit I did. I didn't bother to change txtpbfmt's integrity because I thought Bazel would yell at me and give me the expected one. I was very confused when my textproto format script started sorting things!
Describe the bug
It appears that
gcs_file.integrityis the actual deciding factor on which file is downloaded1.TL;DR:
gcs_file.urlto fileA.gcs_file.integrityto that of fileB.To Reproduce
Create two
gcs_filerules. The first one (keep-sortedin this case) is correct. The 2nd one should use the same integrity as the first - eg, a copy-paste error:The above happens to be in a
module_extensioncalledpyle3_depsinprivate/extensions.bzl, but I don't think that should matter. Maybe it does because of 1.Create a target that consumes both binaries. Or whatever you want to do to get Bazel to actually download things.
Investigate the binaries in the
externalpath:We can even run both and assert they're the same program:
Fix the bad target by setting the correct integrity:
Get Bazel to pull the files again, and inspect:
Expected behavior
The actual
txtpbfmtbinary should be downloaded and a checksum integrity error should be shown.Environment
rules_gcs 1.0.0Additional context
Technically two targets are not needed to reproduce - only two valid integrity values of files in the bucket. It's just easier to see "same" and "different" with two targets.
I haven't checked to see if using the integrity value of a file in a different bucket - or even a different directory within the same bucket - has a different effect.
This all came about because of some lazy copy-paste-edit I did. I didn't bother to change
txtpbfmt'sintegritybecause I thought Bazel would yell at me and give me the expected one. I was very confused when my textproto format script started sorting things!Footnotes
Or perhaps more correctly, which file gets added to the
externaldirectory. There may be some Bazel voodoo related to the cache dir. It's conceivable that rules_gcs downloads the correct file and then assigns it to the Bazel cache based on the integrity's hash rather than a computed hash, and then when Bazel goes through and makes the external directory it pulls the wrong file. ↩ ↩2