Could you please enlighten me regarding all these engines, I’m using lamacpp and ollama. Should I also try mlx, onnx, vllm, etc. I’m not quite sure whats the difference between all these. I’m running on CPU and sometimes GPU
Ollama is a wrapper around llama.cpp thei using ggml format. Onnx is different ml model format and onnxruntime developer by microsoft. Mlx is ml framework from Apple. If you want the fastest speed on MacOS most likely stick with mlx