Skip to content

Add a Guix-based reproducible build pipeline for cuprated#615

Open
kim0 wants to merge 5 commits into
Cuprate:mainfrom
kim0:guix-reproducible-build
Open

Add a Guix-based reproducible build pipeline for cuprated#615
kim0 wants to merge 5 commits into
Cuprate:mainfrom
kim0:guix-reproducible-build

Conversation

@kim0
Copy link
Copy Markdown

@kim0 kim0 commented May 17, 2026

Add a Guix-based reproducible build pipeline for cuprated

Provides a deterministic, hermetic build flow for cuprated on Linux:
pinned Guix-time-machine toolchain, byte-stable source archive, sealed
guix shell --container --pure build, and a CI smoke job that runs the
whole flow twice on every PR and asserts byte-identical artifacts. A
signed JSON manifest binds the artifact to its source.

All changes confined to contrib/guix/ and one new .github/workflows/
file.

Reproducibility evidence

Built the PR's merge commit (dec0bbee5c18f2dc6baae9e9ca7bb0c4d041931d)
on two completely different hosts and got bit-identical artifacts:

Host cuprated-0.0.9-x86_64-unknown-linux-gnu.tar.gz sha256
GitHub Actions ubuntu-24.04 (Azure VM, 16 GB / 4 vCPU) 416cef3b90aa63d50e80dfc52a16dbd4392e76d8e59bb86d733cf6251c0f12c3
Bare-metal x86_64 Linux (125 GB / 48 vCPU) 416cef3b90aa63d50e80dfc52a16dbd4392e76d8e59bb86d733cf6251c0f12c3

Same merge commit, completely different hosts, identical down to the
byte. Every potential drift point (kernel, glibc, libstdc++, hostname,
tmpdir paths, build parallelism, locale) is neutralized.

Earlier rounds of verification on an older HEAD also produced
bit-identical artifacts across two different x86_64
microarchitectures
— Intel Xeon Gold (Sapphire Rapids) and AMD EPYC
9454P (Zen 4) — confirming the build is host-CPU-vendor independent.

The smoke job re-verifies in-pipeline determinism on every CI run (two
independent builds in separate work trees, with byte-identical
artifact + distsrc + build-metadata + rustc/cargo versions +
guix-describe all compared).

Guarantees of the build flow

  • Pinned toolchain. Guix channel commit + manifest of every
    build-relevant package. Reproduced via guix time-machine; nothing
    depends on the host's compilers, libc, or system tools.
  • Deterministic source archive. git archive + cargo vendor --locked packaged with sorted-name, fixed-mtime, fixed-uid/gid tar
    flags. Byte-stable for a given commit.
  • Hermetic build. guix shell --container --pure with
    SOURCE_DATE_EPOCH, --remap-path-prefix, -ffile-prefix-map,
    -C codegen-units=1, OPENSSL_NO_VENDOR=1. A distsrc
    content-equivalence check (diff -rq against git archive of the
    claimed commit) catches a tampered source archive before any
    compilation. The embedded git commit is bound to the distsrc, not to
    whatever outer checkout happens to host the build.
  • Path safety. Output paths are derived from validated
    build-metadata fields; . / .. and non-basename segments are
    explicitly rejected before any path is composed.
  • Signed attestation. Stable, sorted-key JSON manifest binding
    artifact sha256 ↔ distsrc sha256 ↔ builder identity ↔ Guix channel
    commit ↔ rustc/cargo versions. Detached GPG signature; fails
    closed
    if gpg is missing rather than writing an unsigned
    attestation.
  • CI bootstrap is itself signed. The Guix binary tarball is pinned
    by both SHA256 and the Guix release signer's GPG fingerprint
    (A28BF40C…3D8351, Efraim Flashner). Both checks run before any
    extraction. Keyserver fetch is tried across three well-known servers,
    each in its own scratch GPG homedir, with VALIDSIG asserted inside
    each iteration before the homedir is committed.
  • Mechanical regression guard. Every build log is grepped for
    -march=native, -mcpu=native, and target-cpu=native. A future
    cc-rs / rustc / CMake change quietly re-introducing host-CPU codegen
    surfaces directly in the CI job summary.
  • All actions pinned by full commit SHA, not by mutable tag.

Threat model

contrib/guix/README.md enumerates the trust roots — Guix substitute
keys, Guix binary bootstrap tarball, channel commit + introduction,
release signing key, git tree integrity — and is explicit about what
reproducibility does and does not protect against. Notably, it does
not detect a compromised toolchain that produces deterministic but
backdoored output; that problem is deliberately deferred to Guix's own
attestation chain.


Closes #470.

Ahmed Kamal and others added 5 commits May 17, 2026 23:40
Adds a Guix-first build pipeline that produces byte-identical
cuprated-<version>-x86_64-unknown-linux-gnu.tar.gz artifacts from this
repository.

Pipeline layout:

  contrib/guix/
    channels.scm                pinned Guix instance (commit + channel
                                introduction for the official channel)
    manifest.scm                pinned build profile (bash, coreutils,
                                git, gcc-toolchain, cmake, make,
                                pkg-config, openssl, perl, python,
                                rust, rust:cargo, gzip, tar, findutils,
                                diffutils, gawk, nss-certs)
    guix-mk-distsrc             create a deterministic source archive
                                (cargo vendor --locked + deterministic
                                tar flags) inside guix shell --container
    guix-build                  build cuprated inside a hermetic
                                guix-shell --container --pure profile;
                                captures guix-describe.json on the
                                outer host (the container has no `guix`)
    guix-checksums              aggregate SHA256SUMS over output tarballs
                                only (ldd output is intentionally
                                excluded as host-loader-variable)
    guix-verify                 integrity check vs sidecar .SHA256SUM
                                (documented narrowly: integrity, not
                                authenticity)
    guix-attest                 GPG-sign a canonical JSON attestation
                                (builder_id, distsrc sha, artifact sha,
                                guix channel commit, rustc/cargo
                                versions); fail-closed when gpg missing
    libexec/build.sh            inner build driver:
                                  - per-run mktemp build root
                                  - CARGO_NET_OFFLINE set before any
                                    cargo invocation
                                  - distsrc content-equivalence check
                                    (diff -rq vs git archive of the
                                    claimed git_commit)
                                  - CXXFLAGS workaround for Guix
                                    gcc-15.2 libstdc++'s undefined
                                    _GLIBCXX_HAVE_FENV_H
    libexec/package.sh          deterministic tar.gz packaging of
                                cuprated + license + service file
    smoke-reproducible.sh       self-check that builds twice and
                                compares ALL determinism-sensitive
                                outputs (distsrc, artifact, metadata,
                                rustc/cargo versions, guix-describe);
                                fails if any build log contains
                                -march=native / -mcpu=native /
                                target-cpu=native; preserves work dirs
                                on failure for debugging

Determinism inputs are pinned at every layer:

  - Guix instance      pinned by commit in channels.scm
  - Build profile      pinned via Guix commit + manifest.scm
  - Source tree        pinned by git commit, verified at build time
                        via diff -rq against git archive
  - Rust deps          Cargo.lock + cargo vendor --locked --versioned-dirs
  - Build flags        --remap-path-prefix, -ffile-prefix-map,
                        codegen-units=1
  - Time               SOURCE_DATE_EPOCH = git commit time
  - Tar/gzip metadata  sorted names, fixed mtime/uid/gid/mode,
                        --pax-option strips atime/ctime, gzip -n
Runs ./contrib/guix/smoke-reproducible.sh on every PR that touches
the pipeline scripts or workspace Cargo metadata, and on demand via
workflow_dispatch.

Trust-anchor pinning:

  - actions/checkout pinned by full commit SHA
    (de0fac2e... = v6.0.2), not by mutable tag
  - Guix binary tarball verified by BOTH:
      * pinned SHA256
        (aa41025489c5061543e9c48873eaa829b900b2da75d40f9648913622f5f47817)
      * pinned signer fingerprint
        (A28BF40C3E551372662D14F741AAE7DCCA3D8351 - Efraim Flashner,
         Guix release signer, expires 2029-01-18)
    Both checks run BEFORE the tarball is extracted; a single
    compromised anchor cannot bootstrap a malicious daemon.
  - No third-party actions other than actions/checkout. Disk
    cleanup, Guix install, and verification are all inline so the
    workflow file IS the full supply-chain spec.

The job also runs a top-level mechanical grep for native-arch flags
(-march=native / -mcpu=native / target-cpu=native) across every
build log, in addition to the same check inside the smoke script,
so a regression surfaces directly in the GH job summary.

Path filter is intentionally narrow (~25-35 min per run). For PRs
that change crate source without touching the pipeline, trigger
the smoke manually via workflow_dispatch.
Operator-facing documentation for the Guix reproducible build flow.

Notable sections:

  Scope
    x86_64-unknown-linux-gnu only; aarch64 / macOS on the roadmap.

  RandomX
    Documents the years-old `cmake::Config::define("DARCH", "native")`
    typo in Cuprate/randomx-rs's build.rs. cmake reads ARCH (not
    DARCH), so the line is a silent no-op and the actual build uses
    CMake's ARCH=default - which produces compiler-capability-gated
    -maes/-mssse3/-mavx2 (all host-CPU-independent under a pinned
    toolchain). Filing a typo fix upstream would unblock the
    `RANDOMX_ARCH=native` env-var path for miners; until then, the
    smoke job's -march=native grep is the regression guard.

  Distsrc content equivalence
    Explains the libexec/build.sh check that compares the extracted
    distsrc source tree against git archive of the claimed commit,
    excluding only mk-distsrc-added paths (vendor/, .cargo/,
    .cuprate-distsrc.json). A tampered distsrc that lies about its
    git_commit fails this check before the build starts.

  Threat model
    Spells out the trust roots (Guix substitute keys, Guix binary
    tarball, channel commit+introduction, release signing key, git
    tree integrity) and what the pipeline protects against vs not.
    Substitute trust is described in Guix-precise terms (signed by
    authorized substitute keys), not the looser "content-addressed
    and signed".

  Known workarounds
    Documents the CXXFLAGS _GLIBCXX_HAVE_FENV_H workaround for
    Guix's gcc-15.2 libstdc++, with a clear `GUIX_SKIP_FENV_WORKAROUND=1`
    knob to test whether it's still needed.
Apply review feedback from a second round of GPT-5.5-pro oracle review:

- guix-attest, guix-checksums: reject `.`/`..` and non-basename path
  components for every metadata value used as a filename or directory
  name (version, rust_target, distsrc, sanitized identity). The existing
  `^[A-Za-z0-9._+-]+$` regex was correct against slashes but permitted
  `.` and `..` segments. Also pass `--` to `sha256sum` as belt-and-braces
  against filenames starting with `-`.
- workflow: rewrite the `cat "$key" | sudo …` comment to accurately
  describe why the pipe form is preferred (SC2024 cleanliness; the cat
  itself runs unprivileged) instead of overclaiming robustness against
  non-world-readable keys.
- smoke-reproducible.sh: turn the silent `cp`/`chmod` swallows in the
  EXIT trap into explicit warnings so a degraded workflow-level log scan
  is visible; export `build.log` and `mk-distsrc.log` per run alongside
  the cargo-verbose log. When `guix-build` exits 0 but no artifact
  appears, dump `ls -la $out_dir`, the last 100 lines of `build.log`,
  and the last 50 lines of `mk-distsrc.log` so the failure mode is no
  longer mute.
- workflow: add an `actions/upload-artifact@v5.0.0` step (pinned by full
  SHA) that uploads `contrib/guix/smoke-logs/` on failure, so the next
  recurrence of the silent-no-artifact failure has the full forensic
  set instead of just the timestamped tail.
The first GHA run after the round-5 forensics commit finally surfaced
the real reason GHA smoke kept silently producing no artifact:

  error: failed to sync
  Caused by:
    failed to unpack `windows-0.62.2/...`
  Caused by:
    No space left on device (os error 28)

The host /dev/root had 112 GB free after the Reclaim runner disk
step, so this is not a host-disk problem. It's the container's /tmp:
`guix shell --container` mounts a private tmpfs over /tmp, sized at
the kernel's tmpfs default (~50% of RAM ~= 8 GB on a 16 GB runner).
cuprate's full cargo-vendor tree exceeds that - `windows-0.62.2`
alone unpacks to a couple of GB.

Fix: redirect TMPDIR inside the container to a bind-mounted
`$repo_root/contrib/guix/.work/` directory. Bind-mounted host paths
are NOT shadowed by the container's tmpfs, so mktemp/cargo-vendor
target the host volume.

While here, also fix the secondary bug that made this so confusing
to diagnose: bash's `set -e` is silently disabled inside `$(...)`
command substitutions unless `shopt -s inherit_errexit` is set. The
smoke script was assigning `distsrc="$(./guix-mk-distsrc ...)"`,
mk-distsrc was failing, but the empty distsrc fell through to
`guix-build --distsrc ""` which printed a misleading "missing
required --distsrc" instead of surfacing the original disk-full
error. With inherit_errexit, the substitution exits non-zero and
set -e fires at the assignment.

Also gitignore .work/ and the existing out/ + smoke-logs/ paths so
local runs don't dirty the working tree.

Cluster smoke is unaffected (32 GB RAM, plenty of container tmpfs).
GHA smoke should now actually finish.
@github-actions github-actions Bot added A-workspace Area: Changes to a root workspace file or general repo file. A-docs Area: Related to documentation. A-ci Area: Related to CI. labels May 17, 2026
@kim0
Copy link
Copy Markdown
Author

kim0 commented May 20, 2026

Howdy @hinto-janai was looking for a fun weekend project, so worked on this PR. Would appreciate a review. Thx!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-ci Area: Related to CI. A-docs Area: Related to documentation. A-workspace Area: Changes to a root workspace file or general repo file.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bootstrappable builds

1 participant