Hacker News new | past | comments | ask | show | jobs | submit login

A lot of people are reporting low context sizes tokens per second without discussing how slow it is at bigger sizes. In some cases they are also not mentioning time to first inference. If you dig around /LocalLLaMA you can find some posts with better bench-marking.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: