Skip to content

feat: wire MPS (Apple Silicon) device into fine-tuning wizard #34

@SahilKumar75

Description

@SahilKumar75

Context

Issue #26 tracks MPS support generally. The fine-tuning wizard (added in #33) defaults to device_map="auto" in both trainer/qlora.py and the inference cache in app/api.py. On Apple Silicon, bitsandbytes 4-bit quantization (used by QLoRA) is not supported — attempting to run QLoRA on MPS will either fall back to CPU or error out.

What needs to happen

  1. Detect device at wizard start — call GET /api/gpu on page mount in FinetuneState and store the backend (cuda / mps / cpu).

  2. Disable QLoRA on MPS — if backend == "mps", grey out the QLoRA technique button with tooltip: "QLoRA requires CUDA. Switch to LoRA for Apple Silicon." Auto-select LoRA instead.

  3. Set correct dtype in loadertrainer/loader.py should pass torch_dtype=torch.float16 and skip BitsAndBytesConfig when MPS is detected.

  4. Inference endpointapp/api.py::infer hardcodes torch_dtype=torch.float16 and device_map="auto". On MPS this needs .to("mps") instead of device_map="auto" (which is CUDA-only).

Acceptance criteria

  • Wizard detects MPS and auto-selects LoRA on Apple Silicon machines
  • QLoRA button is visually disabled with an explanatory tooltip on MPS
  • Training runs without error on M1/M2/M3 Mac using LoRA (no quantization)
  • Test chat inference works on MPS after training

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions