Automated build pipeline for running quantised LLMs on a home NAS after ipex-llm discontinued active updates. Attempted GPU-accelerated inference via SYCL on an Intel Pentium Gold 8505 (UHD iGPU, 48 EU) as an alternative to Vulkan.
After benchmarking, CPU inference via Ollama with Vulkan outperformed both SYCL and IPEX-LLM paths on this hardware — the iGPU's shared memory bandwidth and low EU count couldn't overcome the dispatch overhead, while llama.cpp's AVX2 CPU path was already well-optimised for this workload.
Also includes an LLMster Docker image builder: a headless LM Studio server container for serving local inference endpoints on a NAS without a display environment.
Key learnings: GPU backend selection for LLM inference (SYCL vs Vulkan vs CPU), llama.cpp inference stack internals, containerised headless ML serving.