Hacker News new | past | comments | ask | show | jobs | submit login

Which makes me wonder... did they just pulled a Murphy's, or are the services so unstable that they go down if no one's overlooking it? Maybe the services already go down multiple times a day, but the outage is short?



It's unlikely that any gmail outage would go unnoticed, considering how much activity it gets 24/7.

Also, these guys are in engineering. They are very likely not even directly involved when there are outages. They build the systems and protocols to avoid and recover from outages, but don't actually perform the work themselves. It's developers vs. IT.


[I used to be a GMail SRE]

Correct, it's pretty much impossible for an outage to not be noticed and the GMail on-call being automatically paged.

SREs at GMail are engineers, yes, but they're very much directly involved with fixing outages - not so much at the 'try turning it off and then turning it on again' level, more the 'redirect all traffic away from this cluster into a different one, while we roll back the broken update'.

SRE is a combination of problem-solving when there are outages, and building tools to 1) automate away the manual jobs involved in massive-scale system administration so that outages are less likely to occur.


Actually, one of the things they said in the AMA is that they don't have any concept of "level one" triage. Rather, they try as much as possible to direct pages to the engineers who built the software because that way it's more likely to get fixed properly and permanently.


I don't think there are only 5 people on the team. Could be a coincidence or may be some hacker timed it to perfection. To check downtime: http://downrightnow.com/gmail


http://queue.acm.org/detail.cfm?id=2371516#sidebar

Doesn't sound like an organisation that would miss services going down even briefly.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: