this is a big task, but it's important.
pytorch ecosystem is "dependency hell" at best, and rarely works well on other platforms besides linux, especially for tasks with many deps like peft, bitsandbytes
llama cpp ecosystem uses CMAKE and is easy to get to work with linux, windows, max, and even WASM!
we're using pytorch for fine tuning only because llama-cpp doesn't support GPU
the same will apply to stable-diffusion too!
this ticket is for
- updating llama-cpp downstream to support the gpu
- updating worker to ditch pytorch and still support the same fine-tune job flow, status updates
- ok for the lora output to be gguf of course, not pth