Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
skybrian
18 days ago
|
parent
|
context
|
favorite
| on:
I have written gemma3 inference in pure C
Knowing the performance is interesting. Apparently it's 1-3 tokens/second.
kgeist
18 days ago
[–]
ikllama.cpp is a fork of llama.cpp which specializes on CPU inference, some benchmarks from 1 year ago:
https://github.com/ikawrakow/ik_llama.cpp/discussions/164
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: