[ci]: Auto-apply ci:run-all-archs label to submodule bump PRs#5745
[ci]: Auto-apply ci:run-all-archs label to submodule bump PRs#5745amd-chiranjeevi wants to merge 10 commits into
ci:run-all-archs label to submodule bump PRs#5745Conversation
| strategy: | ||
| fail-fast: false | ||
| matrix: | ||
| pytorch_git_ref: ["release/2.10", "release/2.11", "release/2.12"] |
There was a problem hiding this comment.
Keep PRs focused on one topic at a time please. I'm already making this change in #5729 (as announced on #5714 (comment))
There was a problem hiding this comment.
Thanks Scott for the heads up. I will revert pytorch changes in this PR.
There was a problem hiding this comment.
No problem, I mainly want us to avoid duplicating work. Each expansion of the CI matrix has the potential to add considerable load to our build and test systems, as well as increase the risk of flakes, so I want us to roll out these changes carefully.
In the case of the GPU targets/families, we probably want to have submodule update PRs run more jobs than regular PRs. We can currently achieve that with the ci:run-all-archs label https://github.com/ROCm/TheRock/blob/main/docs/development/ci_behavior_manipulation.md or we can add some dedicated code to the configure script for that. We could even have https://github.com/ROCm/TheRock/blob/main/build_tools/github_actions/bump_automation.py automatically apply that label to the bump PRs that it creates 🤔
BTW since I created the #5729 from a fork, it's affected by #5737 and the workflow jobs failed. I could recreate from the shared repository but we need to get the CI system back to working on all kinds of PRs. I also decided on that PR to start with just expanding the Linux matrix, leaving Windows alone, as I think/hope that should get us sufficient coverage with minimal extra configuration. Like with the GPU targets/families though, what we should really do is dynamically choose which jobs to run (more on submodule update PRs, less on routine build system changes, opt-in when a developer wants extra testing and sees something the CI system doesn't know to look for on its own yet, etc.)
There was a problem hiding this comment.
We could even have https://github.com/ROCm/TheRock/blob/main/build_tools/github_actions/bump_automation.py automatically apply that label to the bump PRs that it creates 🤔
Thanks Scott, this approach looks perfect, I'll proceed with this.
There was a problem hiding this comment.
now I am looking for what ci:run-all-archs label enable. as per current infra availability, only Linux gf94x, windows gfx110x tests are being tested on the prs.
ci:run-all-archs label to submodule bump PRs
| "fetch-gfx-targets": ["gfx950"], | ||
| "build_variants": ["release", "asan", "tsan"], | ||
| # Build-only on PRs (including ci:run-all-archs); tests run on push and nightly | ||
| "postsubmit_check_only_for_family": True, |
There was a problem hiding this comment.
is this the right change?
presubmit = PR
postsubmit = on push
wouldn't the label ci:run-all-archs cover this and we don't need to add this scenario?
There was a problem hiding this comment.
my biggest worry here, is for submodule bumps, we need to make sure specific archs only runs sanity checks (and not full tests)
| # If postsubmit_check_only_for_family is set, skip tests on PRs only | ||
| if platform_info.get("postsubmit_check_only_for_family", False) and not ( | ||
| is_push or is_schedule | ||
| ): | ||
| test_runs_on = "" | ||
| print( | ||
| f" {family_name}: postsubmit_check_only_for_family flag set, " | ||
| f"disabling test runner for pull request runs" | ||
| ) | ||
|
|
There was a problem hiding this comment.
i think we would still want gfx950 to run all tests since we have enough machines, but the machines that we do not have enough of, we should only run sanity
Motivation
Submodule bump PRs (rocm-systems, rocm-libraries) currently only build the default presubmit families (gfx94x, gfx110x, gfx1151, gfx120x), missing build coverage for older and nightly-only families. Instead of permanently adding families to the presubmit matrix (which adds CI load to every PR), automatically apply
ci:run-all-archswhenbump_automation.pycreates a bump PR. This gives bump PRs:nightly_check_only_for_familyor no hardware remain build-onlyTest Plan
bump_automation.pyhas theci:run-all-archslabel applied automatically.in the job matrix.
Test Result
To be updated
Submission Checklist