Ollama are just llama.cpp wrapper which adds more complexity that benefits
- Ollama doesn't have webui, llama.cpp have (don't need separate things like AnythingLLM)
- Inference are faster with llama.cpp
- You don't need to use Modelfile bullshit to load ggufs, just load them
Upd: also you may look at llamafile project, just one binary which runs everywhere
Ollama are just llama.cpp wrapper which adds more complexity that benefits
Upd: also you may look at llamafile project, just one binary which runs everywhere