I train for maybe ~12 hours a day, some days, especially around Christmas I didn't. I also lost a lot of days when trying out different stuff or when the weights didn't save to drive before the Colab timed out.
Having said that, I was training the full model with an accumulated batch size for a while so it was taking > 10min per step. I've also been using pretty low learning rates for most of the latter stages.
Overall the model is currently at ~11k steps and the loss can actually go down further but after playing with different checkpoints last week, the best one didnt seem to be the newest one so I left it at that one.
Having said that, I was training the full model with an accumulated batch size for a while so it was taking > 10min per step. I've also been using pretty low learning rates for most of the latter stages.
Overall the model is currently at ~11k steps and the loss can actually go down further but after playing with different checkpoints last week, the best one didnt seem to be the newest one so I left it at that one.