The best system administrator is the one that has learned from their catastrophic fuck up.
To that effect, I still have the same job as I did before I ran "yum update" without knowing it attempts to do in place kernel upgrades. Which resulted in a corrupted RedHat installation on a server we could not turn off.
There is learning from a catastrophic fuck up, and then there is incompetence. Backups is like Day 1, SysAdmin 101. I can't quite grasp how so many different backup systems were left unchecked. Every morning I receive messages saying everything is fine, yet I still go into the backup systems, to make sure they actually did run. In case there was issue with the system alerting me.
> There is learning from a catastrophic fuck up, and then there is incompetence.
We all start at incompetence, but eventually we — wait for it — learn from our experiences. Would you believe that Caesar, Michael Jordan and Steve Wozniak once were so incompetent that they couldn't even control their bowels or tie their shoes? They learned.
Is it possible that the guys in the team running GitLab's operations were misplaced? Certainly — that's a management issue. And I can guar-an-tee you that GitLab now has a team of ops guys who viscerally understand the need for good backups: they'd be insane to disperse that team to the winds.
There's no excuse for backups not being setup, period. For such a high profile site, and the rigorous hiring circus they put candidates through. This doesn't fall under "a learning experience". I wish them luck, but this is just gross negligence.
6/6 failed backup procedures. Looks like they are going to be hiring a new sysadmin/devops person...