Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Interesting. My assumption was one of the innovations of DeepSeek and the modern GPT models was performing low precision pretraining rather than just finetuning further. I didn't realize you still need accumulation at a higher precision anyway


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: