Skip to content

ci: migrate main CI to Buildkite#453

Open
jdx wants to merge 15 commits into
mainfrom
claude/eager-yonath-67b142
Open

ci: migrate main CI to Buildkite#453
jdx wants to merge 15 commits into
mainfrom
claude/eager-yonath-67b142

Conversation

@jdx
Copy link
Copy Markdown
Owner

@jdx jdx commented Apr 30, 2026

Summary

  • Replaces .github/workflows/ci.yml with .buildkite/pipeline.yml. The pipeline fans build → ci-bats (matrix × 3 tranches) → ci-other → msrv across queue: linux and queue: macos agents using Buildkite matrix expansion, and uses buildkite-agent artifact for the build → test handoff (Buildkite has no first-class equivalent of actions/upload-artifact).
  • Splits the heavy service bring-up (vault, vaultwarden, infisical, bitwarden, gnome-keyring) into reusable scripts under .buildkite/scripts/ so the pipeline YAML stays declarative.
  • Leaves the release/release-plz/docs/autofix/semantic-pr-lint workflows on GitHub Actions — those are tightly coupled to GH-specific features (Releases, PRs, Pages, autofix.ci, PR-title checks) and don't have a clean Buildkite equivalent.
  • Updates the test docs (CRUSH.md, BITWARDEN_TESTING.md, VAULT_TESTING.md) to describe the new Buildkite-based service setup, and removes the now-broken GitHub Actions CI badge from the README.

Notes for reviewer

Things to set up on the Buildkite side before this lands cleanly:

  • Register agents tagged queue=linux and queue=macos (Linux agents need Docker; macOS agents need Homebrew).
  • Expose AGE_SECRET to agents (env var or a secrets plugin) — setup-age.sh skips silently when it's missing, mirroring the fork-PR behavior of the old workflow.
  • Create a Buildkite pipeline pointing at .buildkite/pipeline.yml.

The bats job keeps the same retry policy (3 attempts) and tranche split (TRANCHE 0/1/2 over TRANCHE_COUNT=3) as before — no test code changes.

Test plan

  • Configure Buildkite agents with the required queues and AGE_SECRET
  • Trigger a build of .buildkite/pipeline.yml and confirm all four jobs succeed on both linux and macos
  • Verify the bats tranches still run in parallel and that vault/vaultwarden/infisical/bitwarden tests pick up their env vars from BUILDKITE_ENV_FILE
  • Confirm the GH Actions workflows still in place (release, release-plz, docs, autofix, semantic-pr-lint) are unaffected

🤖 Generated with Claude Code


Note

Medium Risk
CI is fully migrated to Buildkite with new artifact handoff and service-provisioning scripts, so misconfiguration of agents, env propagation, or Docker/service startup could break builds across platforms.

Overview
Migrates primary CI from GitHub Actions to Buildkite by deleting .github/workflows/ci.yml and adding a new .buildkite/pipeline.yml that builds per-OS, fans out bats tranches, runs cargo test/lint/clippy, and verifies MSRV, handing off the compiled fnox binary via Buildkite artifacts.

Adds reusable Buildkite scripts for tool bootstrap (install-mise.sh), selective bats tag filtering (bats-filter.sh), CI secret handling (setup-age.sh, redact.sh), and integration-service provisioning (setup-keychain.sh, setup-services.sh) including Vault/Vaultwarden/Infisical/LocalStack setup and env propagation via BUILDKITE_ENV_FILE.

Updates docs to reference Buildkite-based CI setup (Bitwarden/Vault testing) and removes the GitHub Actions CI badge from README.md; refreshes mise.lock with platform-pinned URLs/checksums.

Reviewed by Cursor Bugbot for commit 35f6136. Bugbot is set up for automated code reviews on this repo. Configure here.

Replaces .github/workflows/ci.yml with a Buildkite pipeline that fans the
build/bats/cargo+lint/msrv jobs across linux+macos agents using matrix
expansion. Service bring-up (vault, vaultwarden, infisical, bitwarden,
gnome-keyring) moves into reusable .buildkite/scripts/ helpers so the
pipeline yaml stays declarative.

The release/release-plz/docs/autofix/semantic-pr-lint workflows stay on
GitHub Actions since they're tightly coupled to GH features (Releases, PRs,
Pages, autofix.ci, PR-title checks).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@jdx jdx force-pushed the claude/eager-yonath-67b142 branch from 9f76fb5 to d530266 Compare April 30, 2026 13:32
@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented Apr 30, 2026

Greptile Summary

This PR migrates primary CI from GitHub Actions to Buildkite, replacing .github/workflows/ci.yml with .buildkite/pipeline.yml and a set of reusable setup scripts. The pipeline correctly splits builds per-OS (avoiding the cross-platform depends_on pitfall), restores LocalStack/AWS coverage, properly quotes env-file writes, and handles release-plz expensive-test gating via bats-filter.sh. Only P2 issues remain: macOS Vault startup uses an unconditional sleep 2 rather than a readiness poll (flaky on slow agents), and the Linux service healthcheck loops continue silently on timeout instead of failing with a diagnostic message.

Confidence Score: 5/5

Safe to merge; all findings are P2 quality-of-life improvements with no blocking defects.

Only P2 issues found (flaky macOS vault readiness check, silent healthcheck loop timeouts). All previously flagged P1 concerns (cross-platform depends_on, LocalStack dropped, libudev-dev missing, unquoted env values, tranche count, hardcoded expensive filter) are resolved in the current code.

.buildkite/scripts/setup-services.sh — macOS vault readiness and Linux service healthcheck timeout handling.

Important Files Changed

Filename Overview
.buildkite/pipeline.yml New Buildkite pipeline: correctly splits build per-OS with scoped depends_on keys, 2-tranche bats matrix, libudev-dev installed on all Linux steps, and redact/keychain steps in ci-other. No new blocking issues beyond those already addressed in previous threads.
.buildkite/scripts/setup-services.sh Comprehensive service setup including LocalStack provisioning (restoring AWS coverage). Two P2 issues: macOS Vault readiness uses a bare sleep instead of a poll loop, and all three Linux healthcheck loops silently continue on timeout rather than failing with a diagnostic message.
.buildkite/scripts/bats-filter.sh Correctly mirrors the old GHA release-plz expensive-test logic using BUILDKITE_PULL_REQUEST + BUILDKITE_BRANCH. Properly writes to BUILDKITE_ENV_FILE with %q quoting.
.buildkite/scripts/install-mise.sh Installs mise if absent, sets PATH, and trusts the project's mise.toml. HTTP/1.1 retry is a documented workaround for macOS HTTP/2 flakes.
.buildkite/scripts/setup-keychain.sh Starts gnome-keyring on Linux and writes DBUS_SESSION_BUS_ADDRESS to BUILDKITE_ENV_FILE with proper %q quoting. macOS no-op is correct.
.buildkite/scripts/redact.sh Registers fnox secrets with Buildkite redaction and verifies round-trip; skips gracefully when no age key is present.
.buildkite/scripts/setup-age.sh Writes the AGE_SECRET to ~/.config/fnox/age.txt with chmod 600; silently skips when absent, mirroring fork-PR behavior.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[push / PR] --> BL[build-linux\nqueue: linux]
    A --> BM[build-macos\nqueue: macos]

    BL -->|artifact: fnox-linux| BAL0[bats-linux tranche 0]
    BL -->|artifact: fnox-linux| BAL1[bats-linux tranche 1]
    BL -->|artifact: fnox-linux| COL[ci-other-linux\ncargo test + lint + clippy]

    BM -->|artifact: fnox-macos| BAM0[bats-macos tranche 0]
    BM -->|artifact: fnox-macos| BAM1[bats-macos tranche 1]
    BM -->|artifact: fnox-macos| COM[ci-other-macos\ncargo test + lint + clippy]

    A --> MSRV[msrv\ncargo msrv verify\nqueue: linux]

    BAL0 --- SS1[setup-services.sh\nvault · vaultwarden · infisical\nlocalstack · keychain]
    BAL1 --- SS1
    BAM0 --- SS2[setup-services.sh\nvault dev mode · parallel]
    BAM1 --- SS2

    COL --- KC[setup-keychain.sh\ngnome-keyring]
    BAL0 --- BF[bats-filter.sh\nBATS_FILTER_TAGS]
    BAL1 --- BF
    BAM0 --- BF
    BAM1 --- BF
Loading

Fix All in Claude Code

Reviews (9): Last reviewed commit: "[autofix.ci] apply automated fixes" | Re-trigger Greptile

Comment thread .buildkite/pipeline.yml
Comment thread .buildkite/scripts/setup-services.sh
Comment thread .buildkite/pipeline.yml Outdated
Comment thread .buildkite/scripts/setup-services.sh
Comment thread .buildkite/pipeline.yml Outdated
Addresses review feedback on the Buildkite migration:

- Replace the build/ci-other os matrix with explicit per-OS steps
  (build-linux/build-macos, ci-other-linux/ci-other-macos). Buildkite
  `depends_on` against a matrix step waits on every permutation, which
  would let a macos build failure block linux tests and vice versa.
  Bats keeps a per-OS matrix on tranche only.
- Add `libudev-dev` apt install to build/ci-other/msrv on Linux to
  match the prior workflow; without it cargo build fails on fresh
  Ubuntu agents.
- Add LocalStack provisioning (KMS key alias `alias/fnox-testing` and
  a seeded Secrets Manager entry) plus AWS_* / LOCALSTACK_ENDPOINT
  exports. Without these the aws-kms / aws-sm / aws-ps bats tests
  silently skip.
- Quote append_env values with `printf '%q'` so secrets containing
  whitespace or shell metacharacters round-trip safely through
  BUILDKITE_ENV_FILE.
- Bring tranche split back to 2 (TRANCHE_COUNT=2, tranche 0/1) and
  add BATS_FILTER_TAGS=!expensive to match the prior workflow's
  default-PR test budget.
- Poll LocalStack and Vault health endpoints rather than relying on
  a fixed sleep, matching the prior workflow.
- shfmt: drop unnecessary quotes inside `[[ ]]` in setup-age.sh.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request migrates the CI pipeline from GitHub Actions to Buildkite, introducing a new pipeline configuration and setup scripts for services like Vaultwarden, HashiCorp Vault, and Infisical. Feedback focuses on improving script robustness by correctly quoting environment variables, ensuring dependencies like openssl are installed, avoiding Docker naming conflicts, and replacing brittle sleep commands with health checks.

Comment on lines +15 to +18
append_env() {
echo "export $1=${2}" >>"$BUILDKITE_ENV_FILE"
export "$1=$2"
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The append_env function does not quote the value when writing to $BUILDKITE_ENV_FILE. This can lead to malformed environment files if the value contains spaces or special characters. Additionally, Buildkite environment files typically use KEY="VALUE" format rather than the export command.

Suggested change
append_env() {
echo "export $1=${2}" >>"$BUILDKITE_ENV_FILE"
export "$1=$2"
}
append_env() {
echo "$1=\"${2}\"" >>"$BUILDKITE_ENV_FILE"
export "$1=$2"
}

Comment thread .buildkite/scripts/setup-services.sh Outdated
case "$(uname -s)" in
Linux)
sudo apt-get update
sudo apt-get install -y parallel gnome-keyring libsecret-tools dbus-x11
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The script uses openssl later (line 33) but it is not explicitly installed. While often present on base images, it's safer to include it in the apt-get install list to ensure the script is self-contained and robust across different agent environments.

Suggested change
sudo apt-get install -y parallel gnome-keyring libsecret-tools dbus-x11
sudo apt-get install -y parallel gnome-keyring libsecret-tools dbus-x11 openssl

Comment thread .buildkite/scripts/setup-services.sh Outdated
-keyout /tmp/vaultwarden-certs/key.pem \
-out /tmp/vaultwarden-certs/cert.pem \
-subj "/CN=localhost" \
-addext "subjectAltName=DNS:localhost,IP:127.0.0.1" 2>/dev/null || true
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Ignoring errors from openssl with || true can lead to confusing failures in the subsequent docker run command if the certificates are not created. It is better to let the script fail here if certificate generation fails so the root cause is clear.

Suggested change
-addext "subjectAltName=DNS:localhost,IP:127.0.0.1" 2>/dev/null || true
-addext "subjectAltName=DNS:localhost,IP:127.0.0.1" 2>/dev/null

-subj "/CN=localhost" \
-addext "subjectAltName=DNS:localhost,IP:127.0.0.1" 2>/dev/null || true

docker run -d --name vaultwarden \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using a fixed container name (vaultwarden) will cause the job to fail if a previous run left a stale container or if multiple jobs share the same Docker host. Removing any existing container before starting ensures a clean state. Note that if multiple tranches run on the same host, they will also conflict on port 8080.

Suggested change
docker run -d --name vaultwarden \
docker rm -f vaultwarden 2>/dev/null || true
docker run -d --name vaultwarden \

-v /tmp/vaultwarden-certs:/data/certs:ro \
vaultwarden/server:latest

docker run -d --name vault --cap-add=IPC_LOCK \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Similar to the vaultwarden container, the vault container name should be cleared before starting to avoid conflicts with stale containers from previous runs.

Suggested change
docker run -d --name vault --cap-add=IPC_LOCK \
docker rm -f vault 2>/dev/null || true
docker run -d --name vault --cap-add=IPC_LOCK \

Comment thread .buildkite/scripts/setup-services.sh Outdated

docker compose -f test/docker-compose.infisical-ci.yml up -d

sleep 5
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using a fixed sleep 5 is brittle. It may be too short if the system is under heavy load, leading to flaky tests, or unnecessarily long. A health check loop (e.g., using curl to check the Vault or Vaultwarden API) would be more robust and efficient.

Comment thread .buildkite/scripts/setup-keychain.sh Outdated
DBUS_SESSION_BUS_ADDRESS=$(cat ~/.dbus-session/bus-address)
export DBUS_SESSION_BUS_ADDRESS
buildkite-agent meta-data set "DBUS_SESSION_BUS_ADDRESS" "$DBUS_SESSION_BUS_ADDRESS" || true
echo "export DBUS_SESSION_BUS_ADDRESS=$DBUS_SESSION_BUS_ADDRESS" >>"$BUILDKITE_ENV_FILE"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The environment variable value should be quoted when appended to $BUILDKITE_ENV_FILE to prevent issues with special characters in the D-Bus address, and the export prefix is unnecessary for Buildkite environment files.

Suggested change
echo "export DBUS_SESSION_BUS_ADDRESS=$DBUS_SESSION_BUS_ADDRESS" >>"$BUILDKITE_ENV_FILE"
echo "DBUS_SESSION_BUS_ADDRESS=\"$DBUS_SESSION_BUS_ADDRESS\"" >>"$BUILDKITE_ENV_FILE"

- Install openssl explicitly and stop swallowing cert-generation
  errors so a cert failure surfaces here, not as an opaque
  vaultwarden startup error.
- `docker rm -f` vaultwarden/vault/localstack before each run so
  persistent agents don't collide with stale containers from prior
  jobs.
- Replace the trailing service-readiness pause with a vaultwarden
  health-check loop (vault and localstack already had loops).
- Quote DBUS_SESSION_BUS_ADDRESS via printf %q in setup-keychain.sh
  for symmetry with append_env, and drop a leftover
  buildkite-agent meta-data write that nothing reads.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread .buildkite/pipeline.yml Outdated
Comment thread .buildkite/scripts/setup-services.sh Outdated
- `retry.automatic.limit: 2` (3 total attempts) to match the prior
  `nick-fields/retry max_attempts: 3`. Buildkite's `limit` counts
  retries after the first run, so the previous `limit: 3` was 4
  total attempts.
- setup-services.sh now sources setup-keychain.sh on Linux instead
  of duplicating the gnome-keyring/dbus install + DBUS export — the
  shared block was identical between the two scripts.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread .buildkite/pipeline.yml Outdated
The prior workflow set BATS_FILTER_TAGS to "" (run everything,
including expensive integration tests) when the head branch was
release-plz/*, and "!expensive" otherwise. The migrated pipeline
hardcoded "!expensive", which would have silently dropped the
release-gate coverage.

Move the decision into a small helper that writes BATS_FILTER_TAGS
to BUILDKITE_ENV_FILE based on BUILDKITE_PULL_REQUEST + BUILDKITE_BRANCH.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread .buildkite/pipeline.yml Outdated
setup-keychain.sh already installs gnome-keyring + libsecret-tools
(and runs apt-get update). The inline apt-get install in
ci-other-linux is reduced to just libudev-dev, which the helper
doesn't cover.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread .buildkite/pipeline.yml
Buildkite runs the items in `commands:` as one concatenated bash
script, but each helper script previously ran as a subprocess —
its `export` calls and BUILDKITE_ENV_FILE writes never reached
`mise run test:bats` / `cargo test`, so VAULT_*, AWS_*, BW_SESSION,
INFISICAL_*, BATS_FILTER_TAGS, and DBUS_SESSION_BUS_ADDRESS were
all lost. Source bats-filter.sh, setup-services.sh, and
setup-keychain.sh from the pipeline so their exports stay in the
parent shell.

setup-age.sh and redact.sh are left as subprocess invocations —
they don't export anything the next command needs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread .buildkite/scripts/bats-filter.sh
jdx and others added 4 commits April 30, 2026 09:35
The agents don't ship with mise, so `mise install` failed with
exit 127 ("command not found") on the first real run. Add a
small install-mise.sh helper that curls https://mise.run on miss,
puts ~/.local/bin on PATH, and runs `mise trust --all` to silence
the first-run prompt. Source it from each step's commands so the
PATH change persists into `mise install` and the rest of the run.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… sets it

bats-filter.sh wrote BATS_FILTER_TAGS to BUILDKITE_ENV_FILE but never
exported it. Since the agent doesn't re-read BUILDKITE_ENV_FILE between
commands, the variable was unset by the time `mise run test:bats` ran.
Add the missing `export` (matches the pattern in setup-services.sh's
append_env) — the file write stays as a fallback.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Build #10 failed because mise hit the unauthenticated GitHub API rate
limit while resolving 10+ tool versions (age, bitwarden, vault,
infisical, communique, aube, cargo-msrv, cargo-edit, cargo-binstall,
git-cliff). mise's own warning explicitly asks for GITHUB_TOKEN to be
set.

install-mise.sh now pulls GITHUB_TOKEN from the Buildkite cluster
secret store (`buildkite-agent secret get GITHUB_TOKEN`) when the
caller hasn't already set it. Requires a GITHUB_TOKEN secret in the
endev/fnox pipeline's cluster.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The build still hit the rate limit on #11 even with the secret-fetch
in place — most likely because the secret hasn't been created in the
cluster yet. Print what `buildkite-agent secret get` returned (success
or failure stderr) so we can tell whether the secret exists at all,
versus the agent not being authorized to read it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 68f9d90. Configure here.

Comment thread .buildkite/scripts/setup-services.sh
jdx and others added 4 commits April 30, 2026 09:51
setup-services.sh cleared stale vaultwarden / vault / localstack
containers but left the docker-compose-managed Infisical stack
(postgres + redis + infisical) running between builds. On
persistent agents the leftover containers and ports cause the next
`docker compose up -d` to fail or pick up stale state.

Add `docker compose ... down --remove-orphans -v` before the up.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Build #13's macOS agent failed on:
  curl: (16) Error in the HTTP2 framing layer

while fetching https://mise.run. Add --retry 5 + --http1.1 to dodge
the transient HTTP/2 framing issue and recover from upstream blips
without wedging the build.

(Build #13's macOS agent is also still 404'ing on the GITHUB_TOKEN
cluster secret — that's an infra-side action separate from this
patch.)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…irely

Build #11/12/13 hit the unauthenticated GitHub API rate limit because
the aqua-backend tools (age, vault, hk, shellcheck, …) only had
version pins in mise.lock — mise still queried GitHub to resolve each
pinned tag → asset URL, so 20+ API calls per `mise install` blew past
60/h.

Run `mise lock --platform linux-x64,macos-arm64`, which fills in the
direct download URL, checksum, and (where available) GitHub
attestation provenance for every tool on both CI platforms. mise
installs now go straight to the release-asset CDN with zero GitHub
API calls.

Drop the now-unnecessary `buildkite-agent secret get GITHUB_TOKEN`
diagnostic from install-mise.sh — no token required.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant