Skip to content

Integrity value changes which binary is downloaded #18

@dougthor42

Description

@dougthor42

Describe the bug
It appears that gcs_file.integrity is the actual deciding factor on which file is downloaded1.

TL;DR:

  1. Get two valid integrity hashes for files in the same bucket.
  2. Set gcs_file.url to fileA.
  3. Set gcs_file.integrity to that of fileB.
  4. Notice that fileB is what's actually downloaded.

To Reproduce

Create two gcs_file rules. The first one (keep-sorted in this case) is correct. The 2nd one should use the same integrity as the first - eg, a copy-paste error:

load("@rules_gcs//gcs:repo_rules.bzl", "gcs_file")

BUCKET = "redacted/"
KEEP_SORTED_BINARY = "keep-sorted-v0.2.0-8c6ebc8-cgo0"
TXTPBFMT_BINARY = "txtpbfmt-20231218-084445f-cgo0"
gcs_file(
    name = "keep-sorted",
    downloaded_file_path = "keep-sorted",
    executable = True,
    integrity = "sha256-fJNSs7rMk1mhP0tW3vPTxXf7G4VE5BZ/GNF6AevePGs=",
    url = "gs://" + BUCKET + KEEP_SORTED_BINARY,
)QH_DOWNLOADS_
gcs_file(
    name = "txtpbfmt",
    downloaded_file_path = "txtpbfmt",
    executable = True,
    # integrity = "sha256-d01zAfX06+omGiFg4snN24BDxPolTLhOYRm2khDsS7M=",  # correct hash
    integrity = "sha256-fJNSs7rMk1mhP0tW3vPTxXf7G4VE5BZ/GNF6AevePGs=",    # duplicate hash of keep-sorted
    url = "gs://" + BUCKET + TXTPBFMT_BINARY,  # Note: different URL
)

The above happens to be in a module_extension called pyle3_deps in private/extensions.bzl, but I don't think that should matter. Maybe it does because of 1.

Create a target that consumes both binaries. Or whatever you want to do to get Bazel to actually download things.

Investigate the binaries in the external path:

$ # the "good" one is 3.5M and hashes to 7c9352b (which is the same as the encoded integrity)
$ ls -lah $(bazel info output_base)/external/+pyle3_deps+keep-sorted/keep-sorted
INFO: Invocation ID: b3e0038e-b4af-4e3e-8d6c-5a428e3985ad
-rwxr-x--x 1 dthor primarygroup 3.5M Mar  6 21:20 /usr/local/google/home/dthor/.cache/bazel/_bazel_dthor/0f8c52850e7230283fc2f8033149fba2/external/+pyle3_deps+keep-sorted/keep-sorted
$ sha256sum !$
sha256sum $(bazel info output_base)/external/+pyle3_deps+keep-sorted/keep-sorted
INFO: Invocation ID: 7c696b7a-015d-483c-9b5d-ee88c8f42369
7c9352b3bacc9359a13f4b56def3d3c577fb1b8544e4167f18d17a01ebde3c6b  /usr/local/google/home/dthor/.cache/bazel/_bazel_dthor/0f8c52850e7230283fc2f8033149fba2/external/+pyle3_deps+keep-sorted/keep-sorted
$
$ # The "bad" one is... also 3.5M and hashes to 7c9352b ??
$ ls -lah $(bazel info output_base)/external/+pyle3_deps+txtpbfmt/txtpbfmt
INFO: Invocation ID: 8c555ffd-79ca-408d-b5ba-c5ca7734a51c
-rwxr-x--x 1 dthor primarygroup 3.5M Mar 28 03:22 /usr/local/google/home/dthor/.cache/bazel/_bazel_dthor/0f8c52850e7230283fc2f8033149fba2/external/+pyle3_deps+txtpbfmt/txtpbfmt
$ sha256sum !$
sha256sum $(bazel info output_base)/external/+pyle3_deps+txtpbfmt/txtpbfmt
INFO: Invocation ID: 709562fb-e5a1-4c52-ac40-f9cad6217092
7c9352b3bacc9359a13f4b56def3d3c577fb1b8544e4167f18d17a01ebde3c6b  /usr/local/google/home/dthor/.cache/bazel/_bazel_dthor/0f8c52850e7230283fc2f8033149fba2/external/+pyle3_deps+txtpbfmt/txtpbfmt

We can even run both and assert they're the same program:

$ $(bazel info output_base)/external/+pyle3_deps+keep-sorted/keep-sorted --version
INFO: Invocation ID: 65da1aa3-9244-4798-823d-9d8849e408d0
unknown flag: --version
Usage of /usr/local/google/home/dthor/.cache/bazel/_bazel_dthor/0f8c52850e7230283fc2f8033149fba2/external/+pyle3_deps+keep-sorted/keep-sorted:
      --color string        Whether to color debug output. One of "always", "never", or "auto" (default "auto")
      --lines line ranges   Line ranges of the form "start:end". Only processes keep-sorted blocks that overlap with the given line ranges. Can only be used when fixing a single file. (default [])
      --mode mode           Determines what mode to run this tool in. One of ["fix" "lint"]
  -v, --verbose count       Log more verbosely
unknown flag: --version
$
$ # same output for a "different" file.
$ $(bazel info output_base)/external/+pyle3_deps+txtpbfmt/txtpbfmt --version
INFO: Invocation ID: 54257d07-2aba-42e1-8656-24a7a94d8f2e
unknown flag: --version
Usage of /usr/local/google/home/dthor/.cache/bazel/_bazel_dthor/0f8c52850e7230283fc2f8033149fba2/external/+pyle3_deps+txtpbfmt/txtpbfmt:
      --color string        Whether to color debug output. One of "always", "never", or "auto" (default "auto")
      --lines line ranges   Line ranges of the form "start:end". Only processes keep-sorted blocks that overlap with the given line ranges. Can only be used when fixing a single file. (default [])
      --mode mode           Determines what mode to run this tool in. One of ["fix" "lint"]
  -v, --verbose count       Log more verbosely
unknown flag: --version

Fix the bad target by setting the correct integrity:

    integrity = "sha256-d01zAfX06+omGiFg4snN24BDxPolTLhOYRm2khDsS7M=",    # correct hash
    # integrity = "sha256-fJNSs7rMk1mhP0tW3vPTxXf7G4VE5BZ/GNF6AevePGs=",  # duplicate hash of keep-sorted
    url = "gs://" + BUCKET + TXTPBFMT_BINARY,  # Note: different URL

Get Bazel to pull the files again, and inspect:

$ # The good one hasn't changed, nothing interesting here
$ ls -lah $(bazel info output_base)/external/+pyle3_deps+keep-sorted/keep-sorted
INFO: Invocation ID: e59e4022-0456-42bc-bebe-f91d9cdbbfb1
-rwxr-x--x 1 dthor primarygroup 3.5M Mar  6 21:20 /usr/local/google/home/dthor/.cache/bazel/_bazel_dthor/0f8c52850e7230283fc2f8033149fba2/external/+pyle3_deps+keep-sorted/keep-sorted
$ sha256sum $(bazel info output_base)/external/+pyle3_deps+keep-sorted/keep-sorted
INFO: Invocation ID: 3f26b282-eb35-49cf-ad15-dfe0d1924443
7c9352b3bacc9359a13f4b56def3d3c577fb1b8544e4167f18d17a01ebde3c6b  /usr/local/google/home/dthor/.cache/bazel/_bazel_dthor/0f8c52850e7230283fc2f8033149fba2/external/+pyle3_deps+keep-sorted/keep-sorted
$
$ # But now txtpbfmt is fixed!
$ ls -lah $(bazel info output_base)/external/+pyle3_deps+txtpbfmt/txtpbfmt
INFO: Invocation ID: 3a28cf67-7a98-4f00-9dbf-b9ca24335211
-rwxr-x--x 1 dthor primarygroup 2.8M Mar 28 03:39 /usr/local/google/home/dthor/.cache/bazel/_bazel_dthor/0f8c52850e7230283fc2f8033149fba2/external/+pyle3_deps+txtpbfmt/txtpbfmt
$ sha256sum $(bazel info output_base)/external/+pyle3_deps+txtpbfmt/txtpbfmt
INFO: Invocation ID: 368864b8-fb0d-4126-89b1-48a5794be42e
774d7301f5f4ebea261a2160e2c9cddb8043c4fa254cb84e6119b69210ec4bb3  /usr/local/google/home/dthor/.cache/bazel/_bazel_dthor/0f8c52850e7230283fc2f8033149fba2/external/+pyle3_deps+txtpbfmt/txtpbfmt
$
$ # Confirmed by checking the help of the binary
$ $(bazel info output_base)/external/+pyle3_deps+txtpbfmt/txtpbfmt --version
INFO: Invocation ID: 0858fd54-a480-4ba5-93f6-12b092f60c16
flag provided but not defined: -version
Usage of /usr/local/google/home/dthor/.cache/bazel/_bazel_dthor/0f8c52850e7230283fc2f8033149fba2/external/+pyle3_deps+txtpbfmt/txtpbfmt:
  -allow_triple_quoted_strings
        Allow Python-style """ or ''' delimited strings in input.
  -alsologtostderr
        log to standard error as well as files
  -dry_run
        Enable dry run mode.
...

Expected behavior
The actual txtpbfmt binary should be downloaded and a checksum integrity error should be shown.

Environment

  • OS name + version: Debian 13
  • Version of the code: rules_gcs 1.0.0

Additional context
Technically two targets are not needed to reproduce - only two valid integrity values of files in the bucket. It's just easier to see "same" and "different" with two targets.

I haven't checked to see if using the integrity value of a file in a different bucket - or even a different directory within the same bucket - has a different effect.

This all came about because of some lazy copy-paste-edit I did. I didn't bother to change txtpbfmt's integrity because I thought Bazel would yell at me and give me the expected one. I was very confused when my textproto format script started sorting things!

Footnotes

  1. Or perhaps more correctly, which file gets added to the external directory. There may be some Bazel voodoo related to the cache dir. It's conceivable that rules_gcs downloads the correct file and then assigns it to the Bazel cache based on the integrity's hash rather than a computed hash, and then when Bazel goes through and makes the external directory it pulls the wrong file. 2

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions