"Our backups to S3 apparently don’t work either: the bucket is empty" 6/6 failed...

Washuu · on Feb 1, 2017

The best system administrator is the one that has learned from their catastrophic fuck up.

To that effect, I still have the same job as I did before I ran "yum update" without knowing it attempts to do in place kernel upgrades. Which resulted in a corrupted RedHat installation on a server we could not turn off.

overcast · on Feb 1, 2017

There is learning from a catastrophic fuck up, and then there is incompetence. Backups is like Day 1, SysAdmin 101. I can't quite grasp how so many different backup systems were left unchecked. Every morning I receive messages saying everything is fine, yet I still go into the backup systems, to make sure they actually did run. In case there was issue with the system alerting me.

wtbob · on Feb 1, 2017

> There is learning from a catastrophic fuck up, and then there is incompetence.

We all start at incompetence, but eventually we — wait for it — learn from our experiences. Would you believe that Caesar, Michael Jordan and Steve Wozniak once were so incompetent that they couldn't even control their bowels or tie their shoes? They learned.

Is it possible that the guys in the team running GitLab's operations were misplaced? Certainly — that's a management issue. And I can guar-an-tee you that GitLab now has a team of ops guys who viscerally understand the need for good backups: they'd be insane to disperse that team to the winds.

overcast · on Feb 1, 2017

There's no excuse for backups not being setup, period. For such a high profile site, and the rigorous hiring circus they put candidates through. This doesn't fall under "a learning experience". I wish them luck, but this is just gross negligence.

brongondwana · on Feb 1, 2017

"on a server we could not turn off"

I think I've found your root issue. If you can't turn any server off with a few minutes' notice, you're one hardware failure away from dead.

awinter-py · on Feb 1, 2017

Like TJ watson said "fire you? I just spent ten million dollars training you."

cnst · on Feb 1, 2017

But at the same time they couldn't be bothered to pay market salaries to people working in the US.