Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

There are no benchmarks on the 8B & 14B models, the most popular on consumer hardware. Are they hiding something? Did anyone benchmark them?

And why did they hide the generalist benchmarks like MMLU-pro & TruthfulQA?

I wish we had proper public benchmarks that are up to date. LMarena was proven useless by the Llama4 scandal, and LiveBench is unrealistic and misses too many models.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: