Hacker News new | past | comments | ask | show | jobs | submit login

Probably because the benchmarks with higher models are, at this time, negligible. Increasing transformers and iterating attention might be a dead-stop for more capable models beyond 2T parameters. But, I'm not sure.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: