Skip to content

Disable PyTorch sccache on fork PRs via a computed cache_type#5816

Merged
subodh-dubey-amd merged 1 commit into
mainfrom
users/subodh-dubey-amd/pytorch-sccache-disable-on-forks
Jun 14, 2026
Merged

Disable PyTorch sccache on fork PRs via a computed cache_type#5816
subodh-dubey-amd merged 1 commit into
mainfrom
users/subodh-dubey-amd/pytorch-sccache-disable-on-forks

Conversation

@subodh-dubey-amd

@subodh-dubey-amd subodh-dubey-amd commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Summary

Fixes #5737 (supersedes the Draft #5738). PyTorch CI jobs fail on PRs from forks at the sccache steps:

User: arn:aws:iam::692859939525:user/therock-external-upload is not authorized to perform:
sts:TagSession on resource: arn:aws:iam::324352301041:role/therock-ci
... AccessDenied ... s3:GetObject on therock-pytorch-sccache-ci/.../.sccache_check

Root cause

sccache uses an S3 bucket reached via an OIDC-assumed IAM role (therock-ci). Fork PRs don't get OIDC tokens, so the role can't be assumed. The gate github.repository_owner == 'ROCm' is always true on fork PRs (it's the base repo), and it only guarded the credentials step — Verify sccache and the build still passed --use-sccache, so sccache started and failed on S3 access.

Fix

Compute the effective cache type once, in build_tools/github_actions/compute_pytorch_cache_type.py (reusing _is_current_run_pr_from_fork() from s3_buckets.py), downgrading sccache -> none on fork PRs / non-ROCm repos. Every sccache step in the two *_pytorch_wheels_ci.yml workflows now keys off that single steps.cache.outputs.cache_type instead of repeating fork checks. ccache/none pass through unchanged; in-org runs keep sccache and its hard-fail semantics.

This follows the review direction on #5738 ("compute earlier… this all needs to go through scripts, not yml code") and limits changes to the _ci.yml workflows (the release / multi_arch_build_* workflows are never fork-triggered).

Files

  • build_tools/github_actions/compute_pytorch_cache_type.py (new)
  • build_tools/github_actions/tests/compute_pytorch_cache_type_test.py (new, 8 cases)
  • build_portable_linux_pytorch_wheels_ci.yml, build_windows_pytorch_wheels_ci.yml — add "Determine cache type" step; gate all sccache steps on its output.

Test plan

PyTorch sccache uses an S3 bucket reached through an OIDC-assumed IAM role
(therock-ci). Pull requests from forks (and runs in non-ROCm repos) cannot
assume that role, so the build failed: the credentials step was gated on
`github.repository_owner == 'ROCm'` (always true on fork PRs) and the verify /
build steps still passed `--use-sccache`, so sccache started and failed on
`s3:GetObject` (AccessDenied).

Instead of repeating fork checks across several `if:` conditions, compute the
effective cache type once in build_tools/github_actions/compute_pytorch_cache_type.py
(reusing _is_current_run_pr_from_fork from s3_buckets.py), downgrading
`sccache` -> `none` on fork PRs / non-ROCm repos, and gate every sccache step
in the two `*_pytorch_wheels_ci.yml` workflows on that single output. ccache and
none pass through unchanged; in-org runs keep sccache (and its hard-fail
semantics) intact.

Scope is the CI workflows only -- the release / multi_arch_build workflows are
never triggered by fork PRs.

Fixes #5737.

Co-authored-by: Cursor <cursoragent@cursor.com>

@ScottTodd ScottTodd left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I think the longer term fix here will be to move the pytorch builds to AWS runners that have base credentials for writing to a dev cache regardless of whether the triggering workflow run is from a fork PR or not. Do we have that worked into the design @amd-shiraz ? We should have caches set up in that way for pytorch (e.g. sccache here), ROCm (e.g. ccache today), and other projects like JAX too.


sys.path.insert(0, str(Path(__file__).resolve().parent.parent))

from _therock_utils.s3_buckets import _is_current_run_pr_from_fork

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

underscore-prefixed functions like _is_current_run_pr_from_fork are considered private implementation details of the current file and should not typically be imported from other files. I'm willing to let this one slide though.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cleaned up in #5848 — added a public is_current_run_pr_from_fork() wrapper and repointed the import. Thanks!

@subodh-dubey-amd subodh-dubey-amd merged commit ef3f9e6 into main Jun 14, 2026
127 of 146 checks passed
@subodh-dubey-amd subodh-dubey-amd deleted the users/subodh-dubey-amd/pytorch-sccache-disable-on-forks branch June 14, 2026 07:55
@github-project-automation github-project-automation Bot moved this from TODO to Done in TheRock Triage Jun 14, 2026
@subodh-dubey-amd

Copy link
Copy Markdown
Contributor Author

Verified on #5729 (a real fork PR) after merging main — Multi-Arch CI run 27493147226.

The fork path now behaves correctly, e.g. the Linux release/2.10 Build PyTorch job:

  • Determine cache type: cache_type='sccache' -> effective='none' (fork detected, no OIDC access)
  • Configure AWS Credentials for sccache / Verify sccache / Report cache stats: skipped
  • Build PyTorch wheels: success, no AccessDenied
  • Split PyTorch fat wheel: success

All 4 PyTorch build cells passed:

subodh-dubey-amd added a commit that referenced this pull request Jun 17, 2026
#5848)

Follow-up to #5816, addressing the review nit at
#5816 (comment)
(importing the private `_is_current_run_pr_from_fork` across files).

Adds a public `is_current_run_pr_from_fork()` wrapper in `s3_buckets.py`
and repoints `compute_pytorch_cache_type.py` (and its test mocks) to it.
No behavior change.

## Test plan
- [x] `compute_pytorch_cache_type_test.py` (8 cases) pass
- [x] Real fork-event smoke test: fork+sccache -> none

## Submission Checklist
Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
amd-chiranjeevi pushed a commit that referenced this pull request Jun 17, 2026
#5848)

Follow-up to #5816, addressing the review nit at
#5816 (comment)
(importing the private `_is_current_run_pr_from_fork` across files).

Adds a public `is_current_run_pr_from_fork()` wrapper in `s3_buckets.py`
and repoints `compute_pytorch_cache_type.py` (and its test mocks) to it.
No behavior change.

## Test plan
- [x] `compute_pytorch_cache_type_test.py` (8 cases) pass
- [x] Real fork-event smoke test: fork+sccache -> none

## Submission Checklist
Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

[Issue] PyTorch "Configure AWS Credentials for sccache" steps failing on PRs from forks

2 participants