9596; fix M5 performance; check for llama#107
Conversation
|
I took also the liberty and increased the likelihood of getting new builds as it seemed that recently the bot didn't issue any PR. |
|
Hi! This is the friendly automated conda-forge-linting service. I just wanted to let you know that I linted all conda-recipes in your PR ( I do have some suggestions for making it better though... For recipe/recipe.yaml:
This message was generated by GitHub Actions workflow run https://github.com/conda-forge/conda-forge-webservices/actions/runs/27368233897. Examine the logs at this URL for more detail. |
|
Hi! This is the friendly conda-forge automerge bot! I considered the following status checks when analyzing this PR:
Thus the PR was passing and merged! Have a great day! |
We need the newer SDK to support TensorOps on the M5 chips. This makes a massive difference in performance for me. I was getting the following warning:
If we build with a newer SDK, this vanishes and performance increases. E.g. for
llama-server -hf unsloth/Qwen3.6-27B-MTP-GGUF:Q4_K_M --spec-type draft-mtp --spec-draft-n-max 2we go from 220t/s (prefill) and 24t/s (decode) to 550t/s (prefill) and 27t/s (decode).