Hacker News new | past | comments | ask | show | jobs | submit login

You can measure it either way, and you’ll see it both ways in the literature. In this case it doesn’t matter much how you measure since 2048 is a typical pretraining sequence length. 300k sequences per batch is huge compared to typical batch sizes in the literature, which are closer to 2048, for about 4M tokens total.



Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: