Context
Issue #26 tracks MPS support generally. The fine-tuning wizard (added in #33) defaults to device_map="auto" in both trainer/qlora.py and the inference cache in app/api.py. On Apple Silicon, bitsandbytes 4-bit quantization (used by QLoRA) is not supported — attempting to run QLoRA on MPS will either fall back to CPU or error out.
What needs to happen
-
Detect device at wizard start — call GET /api/gpu on page mount in FinetuneState and store the backend (cuda / mps / cpu).
-
Disable QLoRA on MPS — if backend == "mps", grey out the QLoRA technique button with tooltip: "QLoRA requires CUDA. Switch to LoRA for Apple Silicon." Auto-select LoRA instead.
-
Set correct dtype in loader — trainer/loader.py should pass torch_dtype=torch.float16 and skip BitsAndBytesConfig when MPS is detected.
-
Inference endpoint — app/api.py::infer hardcodes torch_dtype=torch.float16 and device_map="auto". On MPS this needs .to("mps") instead of device_map="auto" (which is CUDA-only).
Acceptance criteria
Context
Issue #26 tracks MPS support generally. The fine-tuning wizard (added in #33) defaults to
device_map="auto"in bothtrainer/qlora.pyand the inference cache inapp/api.py. On Apple Silicon,bitsandbytes4-bit quantization (used by QLoRA) is not supported — attempting to run QLoRA on MPS will either fall back to CPU or error out.What needs to happen
Detect device at wizard start — call
GET /api/gpuon page mount inFinetuneStateand store the backend (cuda/mps/cpu).Disable QLoRA on MPS — if
backend == "mps", grey out the QLoRA technique button with tooltip: "QLoRA requires CUDA. Switch to LoRA for Apple Silicon." Auto-select LoRA instead.Set correct dtype in loader —
trainer/loader.pyshould passtorch_dtype=torch.float16and skipBitsAndBytesConfigwhen MPS is detected.Inference endpoint —
app/api.py::inferhardcodestorch_dtype=torch.float16anddevice_map="auto". On MPS this needs.to("mps")instead ofdevice_map="auto"(which is CUDA-only).Acceptance criteria