Hacker News new | past | comments | ask | show | jobs | submit login

It's a different inference engine with different capabilities. It should be a lot faster on Nvidia cards. I don't have comp benchmarks for llama.cpp but if you find some compare them to this.

https://nvidia.github.io/TensorRT-LLM/performance.html https://github.com/lapp0/lm-inference-engines/




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: