Knowing the performance is interesting. Apparently it's 1-3 tokens/second. | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		skybrian 18 days ago \| parent \| context \| favorite \| on: I have written gemma3 inference in pure C Knowing the performance is interesting. Apparently it's 1-3 tokens/second.

kgeist 18 days ago [–]

ikllama.cpp is a fork of llama.cpp which specializes on CPU inference, some benchmarks from 1 year ago: https://github.com/ikawrakow/ik_llama.cpp/discussions/164

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact