Hacker News new | past | comments | ask | show | jobs | submit login

For int8 we've done very precise ablations with deterministic data loading to show incremental impact: https://cloud.google.com/blog/products/compute/accurate-quan...

Totally agreed with all your comments! MaxText mainline is right now a reference implementation for users who have their own scientific opinions on model architecture and convergence. We're additionally hoping to let MaxText run compatibly with some open-source models for customers who want to use known good configurations.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: