Hacker News new | past | comments | ask | show | jobs | submit login

I train for maybe ~12 hours a day, some days, especially around Christmas I didn't. I also lost a lot of days when trying out different stuff or when the weights didn't save to drive before the Colab timed out.

Having said that, I was training the full model with an accumulated batch size for a while so it was taking > 10min per step. I've also been using pretty low learning rates for most of the latter stages.

Overall the model is currently at ~11k steps and the loss can actually go down further but after playing with different checkpoints last week, the best one didnt seem to be the newest one so I left it at that one.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: