Skip to content

Muon CPU offload #4691

@pengdurice

Description

@pengdurice

Is your feature request related to a problem? Please describe.
When training GLM5 with 6 nodes + Muon optimizer + LoRA, it will OOM.

Tag the @mcore-oncall
to get oncall's attention to this issue.

Describe the solution you'd like
Offload the optimizer states of Muon to CPU can help the issue.

Describe alternatives you've considered
Increasing # of GPUs can help, but offloading Muon optimizer states like done for Adam will be helpful.

Additional context
A tentative PR for CPU offload is created #4475, feel free to review!

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions