Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Pretty much, yep. There was definitely a more significant jump there in the middle where 7B models went from being a complete waste of time to actually useful. Then going from being able to craft a sensible response to 80% of questions to 90% is a much smaller apparent increase but takes a lot more compute to achieve as per the pareto principle.


I see giant models like Intel chips over the last decade: big, powerful, expensive, energy hogs.

Small models are like arm: you get much of the performance you actually need for common consumer tasks, very cheap to run, and energy efficient.

We need both but I personally spend most of my ML time on small models training and I’m very happy with the results.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: