Add Guix-based reproducible build pipeline under contrib/guix#2
Closed
kim0 wants to merge 5 commits into
Closed
Conversation
88bed1b to
d8b70ec
Compare
d8b70ec to
76fe9ed
Compare
22a4d01 to
dfa7ca4
Compare
Adds a Guix-first build pipeline that produces byte-identical
cuprated-<version>-x86_64-unknown-linux-gnu.tar.gz artifacts from this
repository.
Pipeline layout:
contrib/guix/
channels.scm pinned Guix instance (commit + channel
introduction for the official channel)
manifest.scm pinned build profile (bash, coreutils,
git, gcc-toolchain, cmake, make,
pkg-config, openssl, perl, python,
rust, rust:cargo, gzip, tar, findutils,
diffutils, gawk, nss-certs)
guix-mk-distsrc create a deterministic source archive
(cargo vendor --locked + deterministic
tar flags) inside guix shell --container
guix-build build cuprated inside a hermetic
guix-shell --container --pure profile;
captures guix-describe.json on the
outer host (the container has no `guix`)
guix-checksums aggregate SHA256SUMS over output tarballs
only (ldd output is intentionally
excluded as host-loader-variable)
guix-verify integrity check vs sidecar .SHA256SUM
(documented narrowly: integrity, not
authenticity)
guix-attest GPG-sign a canonical JSON attestation
(builder_id, distsrc sha, artifact sha,
guix channel commit, rustc/cargo
versions); fail-closed when gpg missing
libexec/build.sh inner build driver:
- per-run mktemp build root
- CARGO_NET_OFFLINE set before any
cargo invocation
- distsrc content-equivalence check
(diff -rq vs git archive of the
claimed git_commit)
- CXXFLAGS workaround for Guix
gcc-15.2 libstdc++'s undefined
_GLIBCXX_HAVE_FENV_H
libexec/package.sh deterministic tar.gz packaging of
cuprated + license + service file
smoke-reproducible.sh self-check that builds twice and
compares ALL determinism-sensitive
outputs (distsrc, artifact, metadata,
rustc/cargo versions, guix-describe);
fails if any build log contains
-march=native / -mcpu=native /
target-cpu=native; preserves work dirs
on failure for debugging
Determinism inputs are pinned at every layer:
- Guix instance pinned by commit in channels.scm
- Build profile pinned via Guix commit + manifest.scm
- Source tree pinned by git commit, verified at build time
via diff -rq against git archive
- Rust deps Cargo.lock + cargo vendor --locked --versioned-dirs
- Build flags --remap-path-prefix, -ffile-prefix-map,
codegen-units=1
- Time SOURCE_DATE_EPOCH = git commit time
- Tar/gzip metadata sorted names, fixed mtime/uid/gid/mode,
--pax-option strips atime/ctime, gzip -n
dfa7ca4 to
398f490
Compare
added 2 commits
May 17, 2026 23:48
Runs ./contrib/guix/smoke-reproducible.sh on every PR that touches
the pipeline scripts or workspace Cargo metadata, and on demand via
workflow_dispatch.
Trust-anchor pinning:
- actions/checkout pinned by full commit SHA
(de0fac2e... = v6.0.2), not by mutable tag
- Guix binary tarball verified by BOTH:
* pinned SHA256
(aa41025489c5061543e9c48873eaa829b900b2da75d40f9648913622f5f47817)
* pinned signer fingerprint
(A28BF40C3E551372662D14F741AAE7DCCA3D8351 - Efraim Flashner,
Guix release signer, expires 2029-01-18)
Both checks run BEFORE the tarball is extracted; a single
compromised anchor cannot bootstrap a malicious daemon.
- No third-party actions other than actions/checkout. Disk
cleanup, Guix install, and verification are all inline so the
workflow file IS the full supply-chain spec.
The job also runs a top-level mechanical grep for native-arch flags
(-march=native / -mcpu=native / target-cpu=native) across every
build log, in addition to the same check inside the smoke script,
so a regression surfaces directly in the GH job summary.
Path filter is intentionally narrow (~25-35 min per run). For PRs
that change crate source without touching the pipeline, trigger
the smoke manually via workflow_dispatch.
Operator-facing documentation for the Guix reproducible build flow.
Notable sections:
Scope
x86_64-unknown-linux-gnu only; aarch64 / macOS on the roadmap.
RandomX
Documents the years-old `cmake::Config::define("DARCH", "native")`
typo in Cuprate/randomx-rs's build.rs. cmake reads ARCH (not
DARCH), so the line is a silent no-op and the actual build uses
CMake's ARCH=default - which produces compiler-capability-gated
-maes/-mssse3/-mavx2 (all host-CPU-independent under a pinned
toolchain). Filing a typo fix upstream would unblock the
`RANDOMX_ARCH=native` env-var path for miners; until then, the
smoke job's -march=native grep is the regression guard.
Distsrc content equivalence
Explains the libexec/build.sh check that compares the extracted
distsrc source tree against git archive of the claimed commit,
excluding only mk-distsrc-added paths (vendor/, .cargo/,
.cuprate-distsrc.json). A tampered distsrc that lies about its
git_commit fails this check before the build starts.
Threat model
Spells out the trust roots (Guix substitute keys, Guix binary
tarball, channel commit+introduction, release signing key, git
tree integrity) and what the pipeline protects against vs not.
Substitute trust is described in Guix-precise terms (signed by
authorized substitute keys), not the looser "content-addressed
and signed".
Known workarounds
Documents the CXXFLAGS _GLIBCXX_HAVE_FENV_H workaround for
Guix's gcc-15.2 libstdc++, with a clear `GUIX_SKIP_FENV_WORKAROUND=1`
knob to test whether it's still needed.
398f490 to
260bc19
Compare
Apply review feedback from a second round of GPT-5.5-pro oracle review: - guix-attest, guix-checksums: reject `.`/`..` and non-basename path components for every metadata value used as a filename or directory name (version, rust_target, distsrc, sanitized identity). The existing `^[A-Za-z0-9._+-]+$` regex was correct against slashes but permitted `.` and `..` segments. Also pass `--` to `sha256sum` as belt-and-braces against filenames starting with `-`. - workflow: rewrite the `cat "$key" | sudo …` comment to accurately describe why the pipe form is preferred (SC2024 cleanliness; the cat itself runs unprivileged) instead of overclaiming robustness against non-world-readable keys. - smoke-reproducible.sh: turn the silent `cp`/`chmod` swallows in the EXIT trap into explicit warnings so a degraded workflow-level log scan is visible; export `build.log` and `mk-distsrc.log` per run alongside the cargo-verbose log. When `guix-build` exits 0 but no artifact appears, dump `ls -la $out_dir`, the last 100 lines of `build.log`, and the last 50 lines of `mk-distsrc.log` so the failure mode is no longer mute. - workflow: add an `actions/upload-artifact@v5.0.0` step (pinned by full SHA) that uploads `contrib/guix/smoke-logs/` on failure, so the next recurrence of the silent-no-artifact failure has the full forensic set instead of just the timestamped tail.
a3e1979 to
69872d7
Compare
The first GHA run after the round-5 forensics commit finally surfaced
the real reason GHA smoke kept silently producing no artifact:
error: failed to sync
Caused by:
failed to unpack `windows-0.62.2/...`
Caused by:
No space left on device (os error 28)
The host /dev/root had 112 GB free after the Reclaim runner disk
step, so this is not a host-disk problem. It's the container's /tmp:
`guix shell --container` mounts a private tmpfs over /tmp, sized at
the kernel's tmpfs default (~50% of RAM ~= 8 GB on a 16 GB runner).
cuprate's full cargo-vendor tree exceeds that - `windows-0.62.2`
alone unpacks to a couple of GB.
Fix: redirect TMPDIR inside the container to a bind-mounted
`$repo_root/contrib/guix/.work/` directory. Bind-mounted host paths
are NOT shadowed by the container's tmpfs, so mktemp/cargo-vendor
target the host volume.
While here, also fix the secondary bug that made this so confusing
to diagnose: bash's `set -e` is silently disabled inside `$(...)`
command substitutions unless `shopt -s inherit_errexit` is set. The
smoke script was assigning `distsrc="$(./guix-mk-distsrc ...)"`,
mk-distsrc was failing, but the empty distsrc fell through to
`guix-build --distsrc ""` which printed a misleading "missing
required --distsrc" instead of surfacing the original disk-full
error. With inherit_errexit, the substitution exits non-zero and
set -e fires at the assignment.
Also gitignore .work/ and the existing out/ + smoke-logs/ paths so
local runs don't dirty the working tree.
Cluster smoke is unaffected (32 GB RAM, plenty of container tmpfs).
GHA smoke should now actually finish.
Owner
Author
|
Replaced by upstream PR |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Add a Guix-based reproducible build pipeline for cuprated
Provides a deterministic, hermetic build flow for
cupratedon Linux:pinned Guix-time-machine toolchain, byte-stable source archive, sealed
guix shell --container --purebuild, and a CI smoke job that runs thewhole flow twice on every PR and asserts byte-identical artifacts. A
signed JSON manifest binds the artifact to its source.
All changes confined to
contrib/guix/and one new.github/workflows/file. No workspace
Cargo.toml/Cargo.lockchanges, norandomx-rschanges (see RandomX below).Reproducibility evidence
Built the PR's merge commit (
dec0bbee5c18f2dc6baae9e9ca7bb0c4d041931d)on two completely different hosts and got bit-identical artifacts:
cuprated-0.0.9-x86_64-unknown-linux-gnu.tar.gzsha256ubuntu-24.04(Azure VM, 16 GB / 4 vCPU)416cef3b90aa63d50e80dfc52a16dbd4392e76d8e59bb86d733cf6251c0f12c3416cef3b90aa63d50e80dfc52a16dbd4392e76d8e59bb86d733cf6251c0f12c3Same merge commit, completely different hosts, identical down to the
byte. Every potential drift point (kernel, glibc, libstdc++, hostname,
tmpdir paths, build parallelism, locale) is neutralized.
Earlier rounds of verification on an older HEAD also produced
bit-identical artifacts across two different x86_64
microarchitectures — Intel Xeon Gold (Sapphire Rapids) and AMD EPYC
9454P (Zen 4) — confirming the build is host-CPU-vendor independent.
The smoke job re-verifies in-pipeline determinism on every CI run (two
independent builds in separate work trees, with byte-identical
artifact + distsrc + build-metadata + rustc/cargo versions +
guix-describe all compared).
Guarantees of the build flow
build-relevant package. Reproduced via
guix time-machine; nothingdepends on the host's compilers, libc, or system tools.
git archive+cargo vendor --lockedpackaged with sorted-name, fixed-mtime, fixed-uid/gid tarflags. Byte-stable for a given commit.
guix shell --container --purewithSOURCE_DATE_EPOCH,--remap-path-prefix,-ffile-prefix-map,-C codegen-units=1,OPENSSL_NO_VENDOR=1. A distsrccontent-equivalence check (
diff -rqagainstgit archiveof theclaimed commit) catches a tampered source archive before any
compilation. The embedded git commit is bound to the distsrc, not to
whatever outer checkout happens to host the build.
build-metadata fields;
./..and non-basename segments areexplicitly rejected before any path is composed.
artifact sha256 ↔ distsrc sha256 ↔ builder identity ↔ Guix channel
commit ↔ rustc/cargo versions. Detached GPG signature; fails
closed if
gpgis missing rather than writing an unsignedattestation.
by both SHA256 and the Guix release signer's GPG fingerprint
(
A28BF40C…3D8351, Efraim Flashner). Both checks run before anyextraction. Keyserver fetch is tried across three well-known servers,
each in its own scratch GPG homedir, with
VALIDSIGasserted insideeach iteration before the homedir is committed.
-march=native,-mcpu=native, andtarget-cpu=native. A futurecc-rs / rustc / CMake change quietly re-introducing host-CPU codegen
surfaces directly in the CI job summary.
RandomX
Cuprate/randomx-rs@567bdca'sbuild.rscallscmake::Config::new(…).define("DARCH", "native"). Upstreamtevador/RandomX'sCMakeLists.txthas zero references toDARCH— it readsARCH, defaulting to"default"when unset. The.define("DARCH", …)line is a years-old silent typo; the ARCH thatactually takes effect is the CMake default
"default", which producesa host-CPU-independent build via compiler-capability-gated
-maes -mssse3 -mavx2.This means no
randomx-rspatches are required. The PR depends onthe upstream
Cuprate/randomx-rsrev unchanged. The smoke job'snative-flag grep is the regression guard against a future "honest" fix
of the
DARCH→ARCHtypo silently re-introducing host-CPU codegen.What this PR does NOT do
randomx-rs. Upstream rev unchanged.Cargo.tomlorCargo.lock. Zero workspacedependency drift.
advisory; release flow remains maintainer-driven.
Threat model
contrib/guix/README.mdenumerates the trust roots — Guix substitutekeys, Guix binary bootstrap tarball, channel commit + introduction,
release signing key, git tree integrity — and is explicit about what
reproducibility does and does not protect against. Notably, it does
not detect a compromised toolchain that produces deterministic but
backdoored output; that problem is deliberately deferred to Guix's own
attestation chain.