Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Knowing the performance is interesting. Apparently it's 1-3 tokens/second.


ikllama.cpp is a fork of llama.cpp which specializes on CPU inference, some benchmarks from 1 year ago: https://github.com/ikawrakow/ik_llama.cpp/discussions/164




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: