Hacker News new | past | comments | ask | show | jobs | submit login

In a few years the guy doing the `rm -rf` is going to be on a job interview and someone will recall bits of this report. Enough bits to remember the guy, not enough bits to remember that it wasn't his (individual) fault.

Transparency doesn't mean publicly throwing people under the bus.

I'm not a GitLab customer, I'm relaxed. :)




Honestly, if I were interviewing the guy, that would almost be a bonus! Like, everyone makes mistakes, we're all human, but I can guarantee you that THAT person will never make that particular mistake ever again. And he's going to be 10 times more diligent than the average engineer in making sure there are good backup/restore procedures.


There's a probably apocryphal story like this about a guy forgetting to refuel a plane. The pilot made sure that guy was solely responsible for refuelling his plane in future, because he knew he'd never forget again.


Exactly.

I run backups on my computer before installing new software/fiddling with important settings/etc. because I've fucked up before.

I'll run backups of phones (or at least verify that they are present) before trying to fix issues on them after nuking my mom's phone which resulted in her losing pictures of my niece and nephew. (Luckily she had sent a lot of those pictures to us via e-mail, but still).

We learn and adjust.


I think that might have been in Chuck Yeager's (auto-?) biography. (Great read, BTW.)


I remember reading the story in "How to Win Friends and Influence People". The plane was filled with jet fuel and had to do an emergency landing, but the pilot was able to save everyone. When he got back he told the mechanic that he wanted him to fill his plane the next day because he knew that he'd never make the same mistake again.

If the pilot can forgive someone for a mistake that almost cost lives, I'm sure any good interviewer can forgive him for a mistake that cost data and will probably never be repeated.


I've heard this anecdote before and it never sat well with me. Forgetting to fuel a plane as a plane mechanic exposes a serious character flaw that could lead to something devastating if allowed to continue (perhaps next time he forgets to oil the engine? Grease the brakes?). Sensationalizing this story could actually do alot of harm. The plane mechanic should have been fired for failing such an important task. If he showed incredible remorse and was responsible enough to own up to his mistakes, he should still have been striped of all his other responsibilities and only fuel planes until he has proven himself enough to take on more responsibilities again.


When people are afraid to loose their jobs if they make an error you can be pretty sure they will do everything in their power to hide the fact that they made an error, which is the exact opposite of the behavior you want. To allow process improvements it must be absolutely clear that errors will not be punished, but used to help everyone to learn.


The JAL 2 mishap is legendary in the aviation world. Learning from mistakes is a big part of aviation safety

https://en.wikipedia.org/wiki/Japan_Airlines_Flight_2#The_.2...

The Captain basically got up before the NTSB and when asked what happened, he responded "I F__ked Up!" instead of trying to deflect blame onto an unforeseen system glitch or other excuse. Its since been known as the "Asoh Defense"

They also have the NASA ASRS for reporting near misses, and incidents without fear of FAA enforcement.

https://en.wikipedia.org/wiki/Aviation_Safety_Reporting_Syst...


It must be coupled with processes that guard against errors though. Defense in depth. I'd imagine the pilot has a tick sheet to go over before takeoff and fuel is an item on that sheet.


You imagine right. It's on every checklist: check fuel quantity and type.


I think you highly underestimate the number of mistakes like this on the flight line.

By an order of magnitude it sounds like from your comment. Even if you get 99.99% reliability (good luck with humans involved) think of the number of flight movements per day multiplied by the number of tasks that must be completed.

This is why there are redundant checks and checklists and systems in place. To catch human errors, as absolutely everyone in the business will eventually make a trivial yet critical mistake.

Demanding individual human perfection is great, but you'll find you will end up with no workforce.


The aviation industry recognizes and accepts that people make mistakes, and that this is a simple fact of being human. Firing that mechanic without fixing the process wouldn't have done any good in the long run. Someone else would just make the same mistake. Maybe not the following week, maybe not the following month, but eventually, it would've happened again. The right answer is to fix the process.


Agreed. Point in case, the recent death of nearly the entire Chapecoense football team:

> According to the preliminary report, several decisions of the flight crew were incompatible with aviation regulations and rendered the flight unsafe. Insufficient flight planning (disregarding necessary fuel stops) and not declaring an emergency when the fuel neared exhaustion caused the crash.

https://en.wikipedia.org/wiki/LaMia_Flight_2933


That guy is going to be interviewing at some company with someone who's obsessive enough about outage reports to remember a then-obscure one years later, but enough of an idiot to not understand that people aren't personally to blame for this sort of stuff?

Sounds like even in that very contrived scenario the guy involved would dodge a bullet in not being hired by a bunch of idiots.


"I worked at GitLab."

Googles name + GitLab, finds postmortem

Highly likely, and now you don't get to tell your own story and emphasize what you want to.


Awsome postmortem -is there any thing you would do differently today?

What's your most valuable lesson from that incident?

You're hired!


also, maybe some people on here are perfect, but if you've used Unix for more than half your life (as I have) you've 'rm -rf'-ed some stuff.

I think people who've been through disasters have a much better understanding of the importance and methods of not ending up there than those with a perfectly clean record.

IOW, I'd hire the "rm -rf" guy first if he owns it.


I'd rather hire someone who learned what not to do than someone who hasn't yet.


Years ago I worked for a University. We lost power in our data centre. No big deal right. Stuff comes back up, you realize which service dependencies you missed, set them to run at startup, change some VM dependency startup order and you're good.

One of the SAN arrays didn't come up, and then started rebuilding itself. Our storage was one of those multi-million dollar contracts from IBM. They flew a guy out to the University and after a lot of work, they said the array was lost and unrecoverable.

Backups for production for some VMs were on virtual tape .. on the same shelves as production. O_o

At least a lot of our clusters were split between racks, so in many cases we could just clone another one. We learned that MS BizSpark, in a cluster, only puts the private key on half the machines. We had to recreate a bunch of BizSpark jobs based off what we could still see in the database and our old notes and password vaults. We had been planning on upgrading to a newer version of BizSpark on a Server 2012 (it was on 2003), so this kinda forced us to. Shortly afterwards we learned how to make powershell scripts to backup those jobs and test the backups by redeploying them to lower environments.

The sys admin over the backups was looking for a new job. You can't really fire people from universities easily, because it's very difficult to find IT staff that will take university wages. Word was out though, if he didn't find new work, he was going to be let go. Not laid off, made redundant, or have his position removed. He would be fired.


When we interview people one of the questions we like to ask is "What's the biggest thing you've accidentally deleted?"

When people answer that question honestly and with humility it is a big plus.


Might be a plus for your organization, and it might be devised as a trap by another. It's like the biggest weakness question.

Oh, we can't hire someone who has made a mistake THAT big.


You don't want to work for a company that has that attitude anyway, honestly. That shows they have a poor attitude towards problems and probably will overreact to things like missing deadlines or pursuing a solution that ends up not working, etc.


You'll never know if it was a great company with a bad interviewer. It's better to use all the advantages you can to get through an interview and get the job to see for yourself. I don't think you can learn anything definitive from most interviews - they're mostly subjective, unscientific voodoo.


knowing exactly how a potential employee handles an error he might have caused? This guy is going to be fighting off job offers, if he hasn't already been.


Good! I'd like to talk about what the engineer learned from the experience. Certainly if trawling through someone's public repos and records turns up a pattern of repeated mistakes, that should be considered - but the mistakes we all make from time to time are chances to learn.

So what I'd be interested in seeing is if the candidate did learn. The mistake is less important than the candidate demonstrating they moved past it as a stronger developer.

On the flip side - given a choice in situation, I'd prefer not to work for a place that dredges up my old bugs and uses them in isolation as a basis for their decision. That suggests the kind of environment I wouldn't enjoy being in.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: