ci: migrate main CI to Buildkite#453
Conversation
Replaces .github/workflows/ci.yml with a Buildkite pipeline that fans the build/bats/cargo+lint/msrv jobs across linux+macos agents using matrix expansion. Service bring-up (vault, vaultwarden, infisical, bitwarden, gnome-keyring) moves into reusable .buildkite/scripts/ helpers so the pipeline yaml stays declarative. The release/release-plz/docs/autofix/semantic-pr-lint workflows stay on GitHub Actions since they're tightly coupled to GH features (Releases, PRs, Pages, autofix.ci, PR-title checks). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
9f76fb5 to
d530266
Compare
Greptile SummaryThis PR migrates primary CI from GitHub Actions to Buildkite, replacing Confidence Score: 5/5Safe to merge; all findings are P2 quality-of-life improvements with no blocking defects. Only P2 issues found (flaky macOS vault readiness check, silent healthcheck loop timeouts). All previously flagged P1 concerns (cross-platform depends_on, LocalStack dropped, libudev-dev missing, unquoted env values, tranche count, hardcoded expensive filter) are resolved in the current code. .buildkite/scripts/setup-services.sh — macOS vault readiness and Linux service healthcheck timeout handling. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[push / PR] --> BL[build-linux\nqueue: linux]
A --> BM[build-macos\nqueue: macos]
BL -->|artifact: fnox-linux| BAL0[bats-linux tranche 0]
BL -->|artifact: fnox-linux| BAL1[bats-linux tranche 1]
BL -->|artifact: fnox-linux| COL[ci-other-linux\ncargo test + lint + clippy]
BM -->|artifact: fnox-macos| BAM0[bats-macos tranche 0]
BM -->|artifact: fnox-macos| BAM1[bats-macos tranche 1]
BM -->|artifact: fnox-macos| COM[ci-other-macos\ncargo test + lint + clippy]
A --> MSRV[msrv\ncargo msrv verify\nqueue: linux]
BAL0 --- SS1[setup-services.sh\nvault · vaultwarden · infisical\nlocalstack · keychain]
BAL1 --- SS1
BAM0 --- SS2[setup-services.sh\nvault dev mode · parallel]
BAM1 --- SS2
COL --- KC[setup-keychain.sh\ngnome-keyring]
BAL0 --- BF[bats-filter.sh\nBATS_FILTER_TAGS]
BAL1 --- BF
BAM0 --- BF
BAM1 --- BF
Reviews (9): Last reviewed commit: "[autofix.ci] apply automated fixes" | Re-trigger Greptile |
Addresses review feedback on the Buildkite migration: - Replace the build/ci-other os matrix with explicit per-OS steps (build-linux/build-macos, ci-other-linux/ci-other-macos). Buildkite `depends_on` against a matrix step waits on every permutation, which would let a macos build failure block linux tests and vice versa. Bats keeps a per-OS matrix on tranche only. - Add `libudev-dev` apt install to build/ci-other/msrv on Linux to match the prior workflow; without it cargo build fails on fresh Ubuntu agents. - Add LocalStack provisioning (KMS key alias `alias/fnox-testing` and a seeded Secrets Manager entry) plus AWS_* / LOCALSTACK_ENDPOINT exports. Without these the aws-kms / aws-sm / aws-ps bats tests silently skip. - Quote append_env values with `printf '%q'` so secrets containing whitespace or shell metacharacters round-trip safely through BUILDKITE_ENV_FILE. - Bring tranche split back to 2 (TRANCHE_COUNT=2, tranche 0/1) and add BATS_FILTER_TAGS=!expensive to match the prior workflow's default-PR test budget. - Poll LocalStack and Vault health endpoints rather than relying on a fixed sleep, matching the prior workflow. - shfmt: drop unnecessary quotes inside `[[ ]]` in setup-age.sh. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Code Review
This pull request migrates the CI pipeline from GitHub Actions to Buildkite, introducing a new pipeline configuration and setup scripts for services like Vaultwarden, HashiCorp Vault, and Infisical. Feedback focuses on improving script robustness by correctly quoting environment variables, ensuring dependencies like openssl are installed, avoiding Docker naming conflicts, and replacing brittle sleep commands with health checks.
| append_env() { | ||
| echo "export $1=${2}" >>"$BUILDKITE_ENV_FILE" | ||
| export "$1=$2" | ||
| } |
There was a problem hiding this comment.
The append_env function does not quote the value when writing to $BUILDKITE_ENV_FILE. This can lead to malformed environment files if the value contains spaces or special characters. Additionally, Buildkite environment files typically use KEY="VALUE" format rather than the export command.
| append_env() { | |
| echo "export $1=${2}" >>"$BUILDKITE_ENV_FILE" | |
| export "$1=$2" | |
| } | |
| append_env() { | |
| echo "$1=\"${2}\"" >>"$BUILDKITE_ENV_FILE" | |
| export "$1=$2" | |
| } |
| case "$(uname -s)" in | ||
| Linux) | ||
| sudo apt-get update | ||
| sudo apt-get install -y parallel gnome-keyring libsecret-tools dbus-x11 |
There was a problem hiding this comment.
The script uses openssl later (line 33) but it is not explicitly installed. While often present on base images, it's safer to include it in the apt-get install list to ensure the script is self-contained and robust across different agent environments.
| sudo apt-get install -y parallel gnome-keyring libsecret-tools dbus-x11 | |
| sudo apt-get install -y parallel gnome-keyring libsecret-tools dbus-x11 openssl |
| -keyout /tmp/vaultwarden-certs/key.pem \ | ||
| -out /tmp/vaultwarden-certs/cert.pem \ | ||
| -subj "/CN=localhost" \ | ||
| -addext "subjectAltName=DNS:localhost,IP:127.0.0.1" 2>/dev/null || true |
There was a problem hiding this comment.
Ignoring errors from openssl with || true can lead to confusing failures in the subsequent docker run command if the certificates are not created. It is better to let the script fail here if certificate generation fails so the root cause is clear.
| -addext "subjectAltName=DNS:localhost,IP:127.0.0.1" 2>/dev/null || true | |
| -addext "subjectAltName=DNS:localhost,IP:127.0.0.1" 2>/dev/null |
| -subj "/CN=localhost" \ | ||
| -addext "subjectAltName=DNS:localhost,IP:127.0.0.1" 2>/dev/null || true | ||
|
|
||
| docker run -d --name vaultwarden \ |
There was a problem hiding this comment.
Using a fixed container name (vaultwarden) will cause the job to fail if a previous run left a stale container or if multiple jobs share the same Docker host. Removing any existing container before starting ensures a clean state. Note that if multiple tranches run on the same host, they will also conflict on port 8080.
| docker run -d --name vaultwarden \ | |
| docker rm -f vaultwarden 2>/dev/null || true | |
| docker run -d --name vaultwarden \ |
| -v /tmp/vaultwarden-certs:/data/certs:ro \ | ||
| vaultwarden/server:latest | ||
|
|
||
| docker run -d --name vault --cap-add=IPC_LOCK \ |
There was a problem hiding this comment.
Similar to the vaultwarden container, the vault container name should be cleared before starting to avoid conflicts with stale containers from previous runs.
| docker run -d --name vault --cap-add=IPC_LOCK \ | |
| docker rm -f vault 2>/dev/null || true | |
| docker run -d --name vault --cap-add=IPC_LOCK \ |
|
|
||
| docker compose -f test/docker-compose.infisical-ci.yml up -d | ||
|
|
||
| sleep 5 |
| DBUS_SESSION_BUS_ADDRESS=$(cat ~/.dbus-session/bus-address) | ||
| export DBUS_SESSION_BUS_ADDRESS | ||
| buildkite-agent meta-data set "DBUS_SESSION_BUS_ADDRESS" "$DBUS_SESSION_BUS_ADDRESS" || true | ||
| echo "export DBUS_SESSION_BUS_ADDRESS=$DBUS_SESSION_BUS_ADDRESS" >>"$BUILDKITE_ENV_FILE" |
There was a problem hiding this comment.
The environment variable value should be quoted when appended to $BUILDKITE_ENV_FILE to prevent issues with special characters in the D-Bus address, and the export prefix is unnecessary for Buildkite environment files.
| echo "export DBUS_SESSION_BUS_ADDRESS=$DBUS_SESSION_BUS_ADDRESS" >>"$BUILDKITE_ENV_FILE" | |
| echo "DBUS_SESSION_BUS_ADDRESS=\"$DBUS_SESSION_BUS_ADDRESS\"" >>"$BUILDKITE_ENV_FILE" |
- Install openssl explicitly and stop swallowing cert-generation errors so a cert failure surfaces here, not as an opaque vaultwarden startup error. - `docker rm -f` vaultwarden/vault/localstack before each run so persistent agents don't collide with stale containers from prior jobs. - Replace the trailing service-readiness pause with a vaultwarden health-check loop (vault and localstack already had loops). - Quote DBUS_SESSION_BUS_ADDRESS via printf %q in setup-keychain.sh for symmetry with append_env, and drop a leftover buildkite-agent meta-data write that nothing reads. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- `retry.automatic.limit: 2` (3 total attempts) to match the prior `nick-fields/retry max_attempts: 3`. Buildkite's `limit` counts retries after the first run, so the previous `limit: 3` was 4 total attempts. - setup-services.sh now sources setup-keychain.sh on Linux instead of duplicating the gnome-keyring/dbus install + DBUS export — the shared block was identical between the two scripts. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The prior workflow set BATS_FILTER_TAGS to "" (run everything, including expensive integration tests) when the head branch was release-plz/*, and "!expensive" otherwise. The migrated pipeline hardcoded "!expensive", which would have silently dropped the release-gate coverage. Move the decision into a small helper that writes BATS_FILTER_TAGS to BUILDKITE_ENV_FILE based on BUILDKITE_PULL_REQUEST + BUILDKITE_BRANCH. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
setup-keychain.sh already installs gnome-keyring + libsecret-tools (and runs apt-get update). The inline apt-get install in ci-other-linux is reduced to just libudev-dev, which the helper doesn't cover. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Buildkite runs the items in `commands:` as one concatenated bash script, but each helper script previously ran as a subprocess — its `export` calls and BUILDKITE_ENV_FILE writes never reached `mise run test:bats` / `cargo test`, so VAULT_*, AWS_*, BW_SESSION, INFISICAL_*, BATS_FILTER_TAGS, and DBUS_SESSION_BUS_ADDRESS were all lost. Source bats-filter.sh, setup-services.sh, and setup-keychain.sh from the pipeline so their exports stay in the parent shell. setup-age.sh and redact.sh are left as subprocess invocations — they don't export anything the next command needs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The agents don't ship with mise, so `mise install` failed with
exit 127 ("command not found") on the first real run. Add a
small install-mise.sh helper that curls https://mise.run on miss,
puts ~/.local/bin on PATH, and runs `mise trust --all` to silence
the first-run prompt. Source it from each step's commands so the
PATH change persists into `mise install` and the rest of the run.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… sets it bats-filter.sh wrote BATS_FILTER_TAGS to BUILDKITE_ENV_FILE but never exported it. Since the agent doesn't re-read BUILDKITE_ENV_FILE between commands, the variable was unset by the time `mise run test:bats` ran. Add the missing `export` (matches the pattern in setup-services.sh's append_env) — the file write stays as a fallback. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Build #10 failed because mise hit the unauthenticated GitHub API rate limit while resolving 10+ tool versions (age, bitwarden, vault, infisical, communique, aube, cargo-msrv, cargo-edit, cargo-binstall, git-cliff). mise's own warning explicitly asks for GITHUB_TOKEN to be set. install-mise.sh now pulls GITHUB_TOKEN from the Buildkite cluster secret store (`buildkite-agent secret get GITHUB_TOKEN`) when the caller hasn't already set it. Requires a GITHUB_TOKEN secret in the endev/fnox pipeline's cluster. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The build still hit the rate limit on #11 even with the secret-fetch in place — most likely because the secret hasn't been created in the cluster yet. Print what `buildkite-agent secret get` returned (success or failure stderr) so we can tell whether the secret exists at all, versus the agent not being authorized to read it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 68f9d90. Configure here.
setup-services.sh cleared stale vaultwarden / vault / localstack containers but left the docker-compose-managed Infisical stack (postgres + redis + infisical) running between builds. On persistent agents the leftover containers and ports cause the next `docker compose up -d` to fail or pick up stale state. Add `docker compose ... down --remove-orphans -v` before the up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Build #13's macOS agent failed on: curl: (16) Error in the HTTP2 framing layer while fetching https://mise.run. Add --retry 5 + --http1.1 to dodge the transient HTTP/2 framing issue and recover from upstream blips without wedging the build. (Build #13's macOS agent is also still 404'ing on the GITHUB_TOKEN cluster secret — that's an infra-side action separate from this patch.) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…irely Build #11/12/13 hit the unauthenticated GitHub API rate limit because the aqua-backend tools (age, vault, hk, shellcheck, …) only had version pins in mise.lock — mise still queried GitHub to resolve each pinned tag → asset URL, so 20+ API calls per `mise install` blew past 60/h. Run `mise lock --platform linux-x64,macos-arm64`, which fills in the direct download URL, checksum, and (where available) GitHub attestation provenance for every tool on both CI platforms. mise installs now go straight to the release-asset CDN with zero GitHub API calls. Drop the now-unnecessary `buildkite-agent secret get GITHUB_TOKEN` diagnostic from install-mise.sh — no token required. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Summary
.github/workflows/ci.ymlwith.buildkite/pipeline.yml. The pipeline fansbuild → ci-bats (matrix × 3 tranches) → ci-other → msrvacrossqueue: linuxandqueue: macosagents using Buildkite matrix expansion, and usesbuildkite-agent artifactfor the build → test handoff (Buildkite has no first-class equivalent ofactions/upload-artifact)..buildkite/scripts/so the pipeline YAML stays declarative.Notes for reviewer
Things to set up on the Buildkite side before this lands cleanly:
queue=linuxandqueue=macos(Linux agents need Docker; macOS agents need Homebrew).AGE_SECRETto agents (env var or a secrets plugin) —setup-age.shskips silently when it's missing, mirroring the fork-PR behavior of the old workflow..buildkite/pipeline.yml.The bats job keeps the same retry policy (3 attempts) and tranche split (
TRANCHE0/1/2 overTRANCHE_COUNT=3) as before — no test code changes.Test plan
.buildkite/pipeline.ymland confirm all four jobs succeed on both linux and macosBUILDKITE_ENV_FILE🤖 Generated with Claude Code
Note
Medium Risk
CI is fully migrated to Buildkite with new artifact handoff and service-provisioning scripts, so misconfiguration of agents, env propagation, or Docker/service startup could break builds across platforms.
Overview
Migrates primary CI from GitHub Actions to Buildkite by deleting
.github/workflows/ci.ymland adding a new.buildkite/pipeline.ymlthat builds per-OS, fans outbatstranches, runscargo test/lint/clippy, and verifies MSRV, handing off the compiledfnoxbinary via Buildkite artifacts.Adds reusable Buildkite scripts for tool bootstrap (
install-mise.sh), selectivebatstag filtering (bats-filter.sh), CI secret handling (setup-age.sh,redact.sh), and integration-service provisioning (setup-keychain.sh,setup-services.sh) including Vault/Vaultwarden/Infisical/LocalStack setup and env propagation viaBUILDKITE_ENV_FILE.Updates docs to reference Buildkite-based CI setup (Bitwarden/Vault testing) and removes the GitHub Actions CI badge from
README.md; refreshesmise.lockwith platform-pinned URLs/checksums.Reviewed by Cursor Bugbot for commit 35f6136. Bugbot is set up for automated code reviews on this repo. Configure here.