Ollama does not come with (or require) node or python. It is written in Go. If you are writing a node or python app, then the official clients being announced here could be useful, but they are not runtimes, and they are not required to use ollama. This very fundamental mistake in your message indicates to me that you haven’t researched ollama enough. If you’re going to criticize something, it is good to research it more.
> does not expose the full capability of llama.cpp
As far as I’ve been able to tell, Ollama also exposes effectively everything llama.cpp offers. Maybe my use cases with llama.cpp weren’t advanced enough? Please feel free to list what is actually missing. Ollama allows you to deeply customize the parameters of models being served.
I already acknowledged that ollama was not a solution for every situation. For running on your own desktop, it is great. If you’re trying to deploy a multiuser LLM server, you probably want something else. If you’re trying to build a downloadable application, you probably want something else.
How much of a performance overhead does this runtime add, anyway? Each request to a model would eat so much GPU for actual text generation, the cost to process the request and response strings even in a slow, garbage-collected seems negligible both in latency.