Could you please provide source code on github, so it is easy to patch and re-compile from source? Or upstream the Ampere specific optimization to the llama.cpp project? Currently, the optimized code is useless for many of the new models due to bugfixes not included in this old version and out of date code.
Compare how easy it is for stock llama.cpp to run fast on AWS Graviton chips: https://github.com/aws/aws-graviton-getting-started/blob/main/machinelearning/llama.cpp.md
Could you please provide source code on github, so it is easy to patch and re-compile from source? Or upstream the Ampere specific optimization to the llama.cpp project? Currently, the optimized code is useless for many of the new models due to bugfixes not included in this old version and out of date code.
Compare how easy it is for stock llama.cpp to run fast on AWS Graviton chips: https://github.com/aws/aws-graviton-getting-started/blob/main/machinelearning/llama.cpp.md