Hacker News new | past | comments | ask | show | jobs | submit login

It activates only 37B per query, but you don’t know which ones ahead of time, so you gotta store all 671B in (V)RAM.



But you don't need cluster networking or nvlink so much like with splitting out llama 405B. You could even split them out with friends over internet levels of bandwidth.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: