Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
laidoffamazon
19 days ago
|
parent
|
context
|
favorite
| on:
Writing Speed-of-Light Flash Attention for 5090 in...
Interesting. My assumption was one of the innovations of DeepSeek and the modern GPT models was performing low precision pretraining rather than just finetuning further. I didn't realize you still need accumulation at a higher precision anyway
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: