Skip to content

Update to torch=2.10 and rocm=7.1 and Pin Versions#17

Open
michaelmckinsey1 wants to merge 3 commits intoLBANN:mainfrom
michaelmckinsey1:new-versions
Open

Update to torch=2.10 and rocm=7.1 and Pin Versions#17
michaelmckinsey1 wants to merge 3 commits intoLBANN:mainfrom
michaelmckinsey1:new-versions

Conversation

@michaelmckinsey1
Copy link
Collaborator

@michaelmckinsey1 michaelmckinsey1 commented Feb 12, 2026

  • Update wheels to PyTorch=2.10, cuda=12.9, rocm=7.1
  • Pin distconv to latest commit, pin ccl install to latest release
    • These will not be an issue to update manually. I think this is more sane than the debugging effort when things break pointing at develop
  • All 3 wheels are tested up to 100 epochs ON ONE NODE
  • plugin installed via ccl installation script (install-rccl.sh) causing segfault on more than 1 node
2.8.0+cu126 -> 2.10.0+cu129
2.8.0+rocm6.4 -> 2.10.0+rocm7.1
2.8.0+rocm642 -> 2.10.0+rocm710

@michaelmckinsey1 michaelmckinsey1 self-assigned this Feb 12, 2026
@michaelmckinsey1 michaelmckinsey1 linked an issue Feb 12, 2026 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Move to Torch 2.10 and Rocm 7.1

1 participant