How’d you deal with continuously training with Google Colab? I’ve noticed there’s sometimes I/O errors when loading data from large directories and runtime disconnects after a few hours that force me to reauthorize Drive access manually.
Always having it open in a tab in a browser is a big one. Working mostly from Drive and not being almost out of space in the Colab's disk also helps. Make sure to not write over the same files too many times but use different filenames when writing - there are hidden quotas for "downloading/uploading" a file which you can hit.
I still got disconnects occasionally but not often near the end.
They might've also made it a bit more stable at some point, or I might have learned better how to avoid the Colab pitfalls, not sure.