[CI] Prototype SPIRV-focused CI workflow#2451
Open
lamb-j wants to merge 13 commits into
Open
Conversation
Mirrors the workflow already running on ROCm/SPIRV-LLVM-Translator,
with the roles swapped: this side checks out llvm-project at PR head
and pulls SPIRV-LLVM-Translator at amd-staging tip. Same build chain
(LLVM + Clang + LLD + amd-llvm-spirv + device-libs + Comgr standalone)
and same three lit/gtest suites:
- check-amd-llvm-spirv non-blocking + sticky PR comment
(upstream Khronos churn ~1 fail/wk)
- check-llvm-codegen-spirv blocking
- check-comgr blocking (lit + gtest + ctest layers)
Catches breakage in compiler / SPIRV translator that would fail
downstream Comgr testing, without paying the cost of a full TheRock
build. Plan: once stable, promote to a TheRock stage-based workflow.
Collaborator
🔴 New failures (0) — likely caused by this PR(none) 🟢 Fixed by this PR (0) — failing on baseline, passing here(none)
|
The check-amd-llvm-spirv step is non-blocking because upstream Khronos churn breaks ~1 lit test per week. Today the sticky comment just lists N failing tests and tells the reviewer to compare against amd-staging manually. It also misses the inverse signal — a Khronos-upstream merge PR often *fixes* tests red on amd-staging. Run check-amd-llvm-spirv twice: once with the PR-head llvm-project, then again after swapping llvm-project to amd-staging tip. Diff the two FAIL lists in the github-script step and partition into: - new (in PR, not in baseline) -- likely caused by this PR - fixed (in baseline, not in PR) -- resolved by this PR - common (in both) -- pre-existing breakage Headline picks the dominant bucket. Sticky comment marker unchanged (<!-- spirv-ci:translator-lit -->), so existing comments update in place. The translator subdir at llvm/projects/SPIRV-LLVM-Translator/ is untracked from llvm-project's tree, so the baseline `git checkout` of amd-staging leaves the overlay alone. Build dir is reused; CMake reconfigure + ninja incremental keep the second pass bounded by the PR diff size — cheap on small PRs, larger on upstream-merge PRs. Falls back to the legacy single-list shape if the baseline run didn't produce a result. Companion change in the ROCm/SPIRV-LLVM-Translator copy of this workflow: ROCm/SPIRV-LLVM-Translator#188.
Collaborator
Fixup to the previous commit. The baseline lit step combined ninja and the grep capture into one shell script. Under bash -e + set -o pipefail, the non-zero ninja exit (lit failures) aborted the script before the grep ran, so build/spirv-fails-baseline.txt was never written and the comment script fell through to "baseline comparison unavailable". Split the capture into its own step, mirroring the working PR-head pattern. Same fix applied in companion translator PR ROCm/SPIRV-LLVM-Translator#189 (observed on ROCm/SPIRV-LLVM-Translator#187 after #188 landed).
Collaborator
When a workflow has multiple on: triggers (pull_request + workflow_dispatch), GitHub disambiguates the emitted check context with a trailing event suffix — actual context becomes "SPIRV CI - amd-staging / Build & Test (pull_request)" instead of the bare "SPIRV CI - amd-staging / Build & Test" the required-check rule expects. Required check stays "Pending — Required" forever and blocks non-admin merges on amd-staging. Drop workflow_dispatch — never used in practice, pull_request's synchronize/reopened types already cover the retriggers we'd want. Same fix in companion translator PR ROCm/SPIRV-LLVM-Translator#189.
Collaborator
Generic name was at risk of colliding with other workflows. Adds "Linux" platform qualifier so future "Windows Build & Test" etc. can slot in without further renaming. Doesn't list components (LLVM, Comgr, translator) since the workflow will expand. ** ACTION REQUIRED on merge: ** The amd-staging ruleset's required-check context is currently "SPIRV CI - amd-staging / Build & Test" — never actually matched (the matcher uses bare check_run.name, see ROCm/SPIRV-LLVM-Translator#191). Update to "Linux Build & Test" so the rule fixes both the rename and the long-standing matcher bug at once. Companion change in the translator copy: ROCm/SPIRV-LLVM-Translator#191.
Collaborator
Symmetric to the translator-side change in ROCm/SPIRV-LLVM-Translator#194. Splits the single Linux Build & Test job into 4: Linux Build (required) ├─► Linux Test - SPIRV translator lit (informational, baseline-diff) ├─► Linux Test - LLVM SPIRV codegen (informational) └─► Linux Test - Comgr (informational) Build uploads `build/`, `build-comgr/`, `build-device-libs/` as a single GHA artifact (`linux-build-tree`) after `strip --strip-unneeded`. Test jobs do a fresh source checkout + download the artifact. Difference from the translator copy: - PR head IS llvm-project (default checkout, no `repository:`); the translator is overlaid at amd-staging tip under llvm/projects/. - Translator-lit baseline-diff swaps llvm-project (not the translator) via `git fetch origin amd-staging && git checkout FETCH_HEAD` from cwd root. The translator overlay at llvm/projects/SPIRV-LLVM-Translator is untracked from llvm-project's tree, so the swap doesn't touch it. - cmake source dir is `llvm` (not `llvm-project/llvm`). ACTION REQUIRED on merge: same as #194 — update the amd-staging-psdb ruleset's required-check context from "Linux Build & Test" → "Linux Build".
Mirrors TheRock's Multi-Arch CI shape: top-level dispatcher with
platform-variant jobs that call reusable per-platform workflows. Adding
Windows later = drop in spirv-ci-windows.yml + a windows_release job.
Files:
- spirv-ci.yml — top-level dispatcher (~25 lines), byte-identical
to the SPIRV-LLVM-Translator copy. Triggers on pull_request +
workflow_dispatch. Sole job (linux_release, name Linux::release)
calls spirv-ci-linux.yml.
- spirv-ci-linux.yml — workflow_call. Holds the build job + 3 test
jobs (factored from the prior single spirv-ci.yml). Differs from
the translator-side copy only in checkout blocks (which repo is
PR head vs which is pinned to amd-staging tip; same divergence as
the prior single-file structure).
Rendered check_run names in the PR rollup (workflow_call composes the
slash hierarchy):
- SPIRV Compiler CI / Linux::release / Build
- SPIRV Compiler CI / Linux::release / Test SPIRV translator lit
- SPIRV Compiler CI / Linux::release / Test LLVM SPIRV codegen
- SPIRV Compiler CI / Linux::release / Test Comgr
Convention alignment: Title Case workflow name with no branch suffix,
snake_case job IDs + display name override, no literal slashes in job
names, concurrency only on dispatcher, secrets: inherit at the
dispatcher, permissions read-only at workflow level with
pull-requests: write escalated only on the translator-lit job, both
pull_request and workflow_dispatch triggers (workflow_call sidesteps
the (pull_request) check-name suffix bug we hit before), pinned
container image.
Companion change: ROCm/SPIRV-LLVM-Translator#194.
ACTION REQUIRED on merge: update the amd-staging-psdb ruleset's
required-check context from "Linux Build" → "Linux::release / Build".
…ure) Same fix as ROCm/SPIRV-LLVM-Translator#194 follow-up. The previous commit moved pull-requests: write to job-level inside spirv-ci-linux.yml, but per GHA rules a called workflow can't exceed the caller's permission cap. The dispatcher's `contents: read` only cap caused startup_failure (no jobs run, empty rollup). Lift pull-requests: write to the dispatcher's workflow-level permissions. The called workflow's job-level grant on test_translator_lit still scopes the actual usage to that single job.
Same fix as ROCm/SPIRV-LLVM-Translator#194 follow-up. actions/upload-artifact@v4 strips executable bits and excludes hidden files by default. The first symptom: clang-23 came back non-executable in the Comgr test job ("Permission denied"). The second: cmake reconfigure on the test side failed inside FetchContent's SPIRV-Headers git update because the .git dir got dropped on upload. Tar the build trees before upload and untar after download — preserves both modes and hidden files in one shot.
Same two wall-time bugs as ROCm/SPIRV-LLVM-Translator#194 follow-up: 1. `tar -xf` restored mtimes from when the build job produced files, making source (freshly checked out in the test job) appear newer than build outputs and triggering ninja cascade-rebuild. Switch to `tar -xmf` so build outputs are newer. 2. Translator-lit job's llvm-project checkout used `fetch-depth: 0` only to enable the baseline-swap `git checkout amd-staging`. The swap step explicitly does `git fetch --depth=1` + `git checkout FETCH_HEAD` which works on a shallow clone. Switch to `fetch-depth: 1`. Drive-by: trim verbose comments on the tar/untar steps.
Same gate as ROCm/SPIRV-LLVM-Translator#194 follow-up. The lit steps stay continue-on-error so pre-existing amd-staging breakage doesn't block, but a final step exits non-zero when newFails > 0 (PR head FAILs not present in baseline). Real PR-introduced regressions turn the check red instead of just landing in the partition comment.
On llvm-project PRs the failing-check signal from the translator-lit
gate step is enough — a sticky partition comment would be noise.
Removed:
- The actions/github-script step that posted the partition comment
(~65 lines of inline JS)
- pull-requests: write permissions at both the workflow_call job
level (was scoping the comment) AND the dispatcher workflow level
(was the caller-cap that allowed it)
Net: llvm-project's SPIRV CI now runs with permissions: contents: read
only — matches TheRock's tightest-perms convention.
The partition logic still computes new/fixed/pre-existing internally
(the gate step uses spirv-fails-pr.txt + spirv-fails-baseline.txt to
decide whether to fail). Just no comment artifact.
Translator-side workflow keeps the comment — translator-author PRs
benefit from the inline context.
Same fix as ROCm/SPIRV-LLVM-Translator#197. tar -m sets per-file mtimes from sequential extraction order; build.ninja ends up older than CMakeCache.txt, triggering ninja's regen rule and cascade rebuild. Touch build.ninja explicitly to make it the newest file in the tree.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Introduces SPIRV-focused PR CI for
ROCm/llvm-projectamd-staging. Builds LLVM/Clang/translator/Comgr in one job, runs SPIRV-relevant test suites in parallel test jobs that consume a GHA-artifact build tree. Catches breakage in the compiler / SPIRV translator that would fail downstream Comgr testing, without paying the cost of a full TheRock build.PR rollup shows 4 separately-runnable checks:
Structure
2-file
workflow_callchain modeled on TheRock's Multi-Arch CI shape:spirv-ci.yml— top-level dispatcher (~25 lines), pull_request + workflow_dispatchspirv-ci-linux.yml— Linux variant: build job + 3 parallel test jobsAdding Windows later = drop in
spirv-ci-windows.yml+ aWindows::releasejob.runs-on: azure-linux-scale-rocmwith the manylinux container TheRock uses.permissions: contents: readonly at top level (no PR comment posted on llvm-project PRs — the failing check is enough signal).Translator-lit baseline-diff
Translator-lit job runs lit against PR head, swaps llvm-project to⚠️ Pre-existing buckets. The check fails on 🔴 New so real PR-introduced regressions block; pre-existing Khronos drift doesn't.
amd-stagingtip (the translator overlay atllvm/projects/SPIRV-LLVM-Translator/is untracked from llvm-project's tree, so the swap leaves it alone), reconfigures + incrementally rebuilds + reruns lit. Diffs the two FAIL lists into 🔴 New (PR-introduced) / 🟢 Fixed /Companion change in the translator copy of this workflow (which also posts a sticky PR comment with the partition data): ROCm/SPIRV-LLVM-Translator#194.