Context
Issues #26 and #34 have tracked MPS support. This issue focuses specifically on the trainer-side auto-detection as a self-contained, beginner-friendly task.
Problem
trainer/finetune.py currently determines the compute device like this (paraphrased):
device = "cuda" if torch.cuda.is_available() else "cpu"
On Apple Silicon Macs, torch.backends.mps.is_available() returns True, but the trainer never checks for it, so training always falls back to CPU — which is 10–30× slower.
Fix
Update the device detection logic to:
if torch.cuda.is_available():
device = "cuda"
elif torch.backends.mps.is_available():
device = "mps"
else:
device = "cpu"
And pass the device through to TrainingArguments / Trainer appropriately.
Caveats
bitsandbytes quantization (QLoRA) does not support MPS — fall back to plain LoRA automatically when MPS is detected and the user selected QLoRA, and surface a warning in the UI.
device_map="auto" from accelerate does not always handle MPS correctly; set device_map={"":"mps"} explicitly.
Acceptance criteria
Context
Issues #26 and #34 have tracked MPS support. This issue focuses specifically on the trainer-side auto-detection as a self-contained, beginner-friendly task.
Problem
trainer/finetune.pycurrently determines the compute device like this (paraphrased):On Apple Silicon Macs,
torch.backends.mps.is_available()returnsTrue, but the trainer never checks for it, so training always falls back to CPU — which is 10–30× slower.Fix
Update the device detection logic to:
And pass the device through to
TrainingArguments/Trainerappropriately.Caveats
bitsandbytesquantization (QLoRA) does not support MPS — fall back to plain LoRA automatically when MPS is detected and the user selected QLoRA, and surface a warning in the UI.device_map="auto"fromacceleratedoes not always handle MPS correctly; setdevice_map={"":"mps"}explicitly.Acceptance criteria
trainer/finetune.pydetects MPS and sets the device correctly