Twitter was down | Hacker News

rossdavidh on July 11, 2019 | [–]

Ok, this is too many high-profile, apparently unrelated outages in the last month to be completely a coincidence. Hypotheses:

1) software complexity is escalating over time, and logically will continue to until something makes it stop. It has now reached the point where even large companies cannot maintain high reliability.

2) internet volume is continually increasing over time, and periodically we hit a point where there are just too many pieces required to make it work (until some change the infrastructure solves that). We had such a point when dialup was no longer enough, and we solved that with fiber. Now we have a chokepoint somewhere else in the system, and it will require a different infrastructure change

3) Russia or China or Iran or somebody is f*(#ing with us, to see what they are able to break if they needed to, if they need to apply leverage to, for example, get sanctions lifted

4) Just a series of unconnected errors at big companies

5) Other possibilities?

bdd on July 11, 2019 | | [–]

#4

I work at Facebook. I worked at Twitter. I worked at CloudFlare. The answer is nothing other than #4.

#1 has the right premise but the wrong conclusion. Software complexity will continue escalating until it drops by either commoditization or redefining problems. Companies at the scale of FAANG(+T) continually accumulate tech debt in pockets and they eventually become the biggest threats to availability. Not the new shiny things. The sinusoidal pattern of exposure will continue.

fossuser on July 11, 2019 | | | [–]

Yep, this also matches what I've heard through the grapevine.

Pushing bad regex to production, chaos monkey code causing cascading network failure, etc.

They're just different accidents for different reasons. Maybe it's summer and people are taking more vacation?

degenerate on July 11, 2019 | | | [–]

I actually like the summer vacation hypothesis. Makes the most sense to me - backup devs handling some things they are not used to.

Balgair on July 11, 2019 | | | [–]

So, a reverse Eternal-September? It'll get better once everyone is back from August vacations?

uber-employee on July 12, 2019 | | | [–]

No, because it’ll only get better until next summer.

Avamander on July 11, 2019 | | | | [–]

These outages mean that software only gets more ~fool~ summer employee proof.

kenhwang on July 11, 2019 | | | | [–]

I'm more partial to the summer interns hypothesis.

jmharvey on July 11, 2019 | | | [–]

I agree with this, but to be clear, the "summer interns hypothesis" is not "summer interns go around breaking stuff," it's "the existing population of engineers has finite resources, and when the interns and new grads show up, a lot of those resources go toward onboarding/mentoring the new people, so other stuff gets less attention."

gowld on July 11, 2019 | | | | [–]

Pretending that junior engineers is the problem, is the problem.

rapind on July 11, 2019 | | | [–]

Just checking what your objection is. Is it that you think experience is overrated, or is it just that he was speculating without any evidence?

dan-0 on July 11, 2019 | | | [–]

Can't speak for OP, but I can tell you what mine is.

If you have an intern or a Junior Engineer, they should have a more senior engineer to monitor and mentor them.

In the situation where a Junior Engineer gets blamed for a screw up:

1. The Senior Engineer failed in their responsibility. 2. The Senior Engineer failed in their responsibility.

A Junior Engineer should be expected to write bad code, but not put it into production, that's on the Senior. If I hit approve on a Junior Engineer's PR, it's my fault if their code brings the whole system down. If a Junior Engineer had the ability to push code without a review, it's my fault for allowing that. Either way it's my fault and it shouldn't be any other way. It's a failure to properly mentor. Not saying it doesn't happen, just that it's never the Junior Engineers fault when it does.

deathanatos on July 12, 2019 | | | [–]

I'd caveat that slightly: only if the senior engineer is not also overburdened with other responsibilities, and the team has the capacity to take on the intern in the first place. I've been on teams where I felt like we desperately needed more FTEs, not interns. But we could hire interns, and not FTEs.

(I agree with the premise that an intern or junior eng is supposed to be mentored, and their mistakes caught. How else should they learn?)

tiredyam on July 12, 2019 | | | | [–]

the amount of time that the summer intern / new grad eat up of seniors time is the problem. Tech debt that does not get addressed in a timely manner because of mentorship responsibilities is the problems

echelon on July 11, 2019 | | | | [–]

If you don't train new and capable engineers, you'll eventually lose talent due to attrition and retirement. Talent can be grown in-house; engineering companies are much better environments than universities to learn how to build scalable platforms. The cost of acquisition is low, too, because junior engineers can still make valuable contributions while they learn to scale their impact.

melq on July 12, 2019 | | | | [–]

If interns are able to take down your infrastructure, then it is the fault of the senior engineers who have designed it in a way that would allow that to happen.

bobthepanda on July 11, 2019 | | | | [–]

Rule one of having interns and retaining your sanity is that interns get their own branch to muck around in.

jrockway on July 11, 2019 | | | [–]

Rule one of having a useful intern experience is to get them writing production code as quickly as possible. They check in their first change? Get that thing into production immediately. (If it's going to destabilize the system, why did you approve the CL? You two probably pair programmed the whole thing together.)

HeWhoLurksLate on July 12, 2019 | | | [–]

I completely agree- even if it's something small.

I'm an intern in a big company with an internal robotics and automation group, and I recently got to wire up a pretty basic control panel, install it, and watch workers use it. That was so cool, and made me appreciate what I was doing a lot more.

kenhwang on July 11, 2019 | | | | [–]

Sure. The interns have their own branch, but it doesn't stop them from being disruptive to the human in charge of mentoring them.

vorticalbox on July 11, 2019 | | | | [–]

All changes should be in a new branch.

devin on July 12, 2019 | | | [–]

I used to believe this. Having solid lower environments which are identical to production, receiving live traffic where engineers can stage changes and promote up removes some of the “all things should live on a branch” business. I know that sounds crazy, but it is possible for teams of the right size to go crazy on master as long as the safety nets and exposure to reality are high enough in lower environments.

kdelok on July 12, 2019 | | | | [–]

I recall someone saying that holiday periods actually had better reliability for their services, because fewer people were pushing breaking changes...

I do wonder if it's that the usual maintainers of particular bits and pieces are on vacation and so others are having to step in and they're less familiar or spread too thin.

cmroanirgo on July 12, 2019 | | | | [–]

Yes, but it always seems to come down to a very small change with far reaching consequences. For this ongoing twitter outage, it's due to an "internal configuration change"... and yet the change has wide reaching consequences.

It seems that something is being lost over time. In the old days of running on bare metal, yes servers failed for various reasons, then we added resiliency techniques whose sole purpose was to alleviate downtime. Now we're at highly complex distributed systems that have failed to keep the resiliency up there.

But the fact that all the mega-corps have had these issues seems to indicate a systemic problem rather than unconnected ones.

Perhaps a connection is the management techniques or HR hiring practices? Perhaps it's due to high turnover causing the issue? (Not that I know, of course, just throwing it out there). That is, are the people well looked after and know the systems that are being maintained? Even yourself who's 'been around the traps' with high profile companies: you have moved around a lot... Were you unhappy with those companies that caused you to move on? We've seen multiple stories here on HN about how those people in the 'maintenance' role get overlooked for promotions, etc. Is this why you move around? So, perhaps the problem is systemic and it's due to management who've got the wrong set of metrics in their spreadsheets, and aren't measuring maintenance properly?

mlinsey on July 12, 2019 | | | [–]

I remember all these services being far less reliable in the past. The irony of us talking about the bygone era of stability in the context of Twitter is particularly hilarious.

I do think that internet services in general are much more mission critical, and the rate of improvement hasn’t necessarily kept up. It used to be not particularly newsworthy if an AWS EBS outage took out half the consumer internet several times per year, or if Google’s index silently didn’t update for a month, or when AOL (when they were by far largest ISP in the US) was down nationwide for 19 hours, or the second-biggest messaging app in the world went down for seven days.

selestify on July 12, 2019 | | | [–]

Which app was down for 7 days?

bdd on July 12, 2019 | | | | [–]

I don't see the value in lamenting the old days of a few machines where you could actually name them as Middle Earth characters, install individually, log in to one single machine to debug a site issue. The problems were smaller and individual server capacity in respect to demand was in meaningful fractions. Now the demand is so high and set of functions these big companies need to offer are so large, it's unrealistic to expect solutions that doesn't require distributed computing. It comes with "necessary evils", like but not limited to configuration management--i.e. ability to push configuration, near real time, without redeploying and restarting--, and service discovery--i.e. turning logical service names to a set of actual network and transport layer addresses, optionally with RPC protocol specifics. I refer to them as necessary evils because the logical system image of these are in fact single points of failures. Isn't it paradoxical? Not really. We then work on making these systems more resilient to the very nature of distributed systems, machine errors. Then again, we're intentionally building very powerful tools that can also enable us to take everything down with very little effort because they're all mighty powerful. Like the SPoF line above, isn't it paradoxical? Not really :) We then work on making these more resilient to human errors. We work on better developer/operator experience. Think about automated canarying of configuration, availability aware service discovery systems, simulating impact before committing these real time changes, etc. It's a lot of work and absolutely not a "solved problem" in a way single solution will work for any scale operation. We may be great at building sharp tools but we still suck at ergonomics. When I was at Twitter, a common knee-jerk comment at HN was "WTF? Why do they need 3000 engineers. I wrote a Twitter clone over the weekend". A sizable chunk of that many people work on tooling. It's hard.

You're pondering if hiring practices and turnover might be related? The answer is an absolute yes. On the other hand, these are the realities of life in large tech companies. Hiring practices change over years because there's a limited supply of of candidates experienced in such large reliability operations and industry doesn't mint many of them either. We hire people from all backgrounds and work hard on turning them to SREs or PEs. It's great for the much needed diversity (race, gender, background, everything) and I'm certain the results will be terrific but we need many more years of progress to declare success and pose in front of a mission accomplished banner on an aircraft carrier ;)

You are also wisely questioning if turnover might be contributing to these outages and prolonged recovery times. Without a single doubt, again the answer is yes but it's not the root cause. Similar to how hiring changes as company grows, tactics for handling turnover has to change too. It's not like people leave the company, but within the same company they move on and work on something else. The onus is on everyone, not just managers, directors, VPs to make sure we're building things where ownership transfer us 1) possible 2) relatively easy. This in mind, veterans in these companies approach code reviews differently. If you have tooling to remove the duty of nitpicking about frigging coding style, and applying lints, then humans can indeed give actually important feedback on complexity of operations, self describing nature of code, or even committing things along with changes to operations manual living in the same repo.

I think you're spot on with your questions but what I'm trying to say with this many words and examples is, nothing alone is the sole perpetrator of outages. A lot of issues come together and brew over time. Good news, we're getting better.

Why did I move around? Change is what makes life bearable. Joining Twitter was among the best decisions in my career. Learned a lot, made lifelong friends. They started leaving because they were yearning a change Twitter couldn't offer. I wasn't any different. Facebook was a new challenge, I met people I'd love to work with and decided give it a try. I truly enjoy life there even though I'm working on higher stress stuff. Facebook is a great place to work but I'm sure I can't convince even %1 of HN user base, so please save your keyboards' remaining butterfly switch lifetime, don't reply to tell me how much my employer sucks :) I really hope you do enjoy your startup jobs (I guess?) as much as I do my big company one.

eecc on July 12, 2019 | | | [–]

Not sure where you’re going, but my take is that yes, the times for calling servers individually are over.

But we’re still touching the belly of our distributed systems with very pointed tools as part of the daily workflow. That’s how accidents happen.

The analogy is clear IMHO; just as we’ve long stopped fiddling daily with the DRAM timings and clock multipliers of the Galadriel and Mordor servers, we should consider abstaining from low level “jumper switching” on distributed systems.

Of course, this also happened thanks to industry introducing PCI and automated handshaking...

wbl on July 12, 2019 | | | | [–]

Those days of yore are when computers did things and we wrote programs that satisfied immediate needs. There was also a social element to it when there were multiple users per machine.

thrwayxyz on July 12, 2019 | | | | [2 more]

[flagged]

dang on July 12, 2019 | | | [–]

https://news.ycombinator.com/item?id=20199143

mastratton3 on July 11, 2019 | | | | [–]

lol yes, whats the quote on "Don't assume bad intention when incompetence is to blame"?

After seeing how people write code in the real world, I'm actually surprised there aren't more outages.

jethro_tell on July 11, 2019 | | | [–]

Well we have an entire profession of SRE/Systems Eng roles out there that are mostly based on limiting impact for bad code. Some of the places I've worked with the worst code/stacks had the best safety nets. I spent a while shaking my head wondering how this shit ran without an outage for so long until I realized that there was a lot of code and process involved in keeping the dumpster fire in the dumpster.

devin on July 12, 2019 | | | [–]

Which do you prefer? Some of the best stacks and code I’ve worked in wound up with stability issues that were a long series of changes that weren’t simple to rework. By contrast, I’ve worked in messy code, complex stacks, that gave great feedback. In the end, the answer is I want both, but I actually sort of prefer “messy” with well thought out safety nets to beautiful code and elegant design with none.

jethro_tell on July 12, 2019 | | | [–]

One thing that stands out from both types of stacks that I've worked with, is that most of the time, doing things simply the first time without putting in a lot of work to guess what other complications will arise later tends to produce a stack with a higher uptime even if the code gets messy later.

There are certainly some things to plan ahead for, but if you start with something complex it will never get simple again. If you start with something simple, it will get more complex as time goes by but there is a chance that the scaling problems you anticipated present in a little different way and there's a simple fix.

I like to say, 'Simple Scales' in design reviews and aim to only add complexity when absolutely necessary.

newsbinator on July 11, 2019 | | | | [–]

Hanlon's Razor: https://en.wikipedia.org/wiki/Hanlon%27s_razor

"Never attribute to malice that which is adequately explained by stupidity."

euske on July 11, 2019 | | | [–]

I always thought that this cause should also include "greed". But then, greed is kinda one step closer to malice, and I'm not sure if there's a line.

rossdavidh on July 12, 2019 | | | | [–]

Ah, but that's a lot of big corps being more stupid in the last month than last year? If it's two or three more, that's normal variation. We're now at something more like 7 or 8 more. The industry didn't get that much stupider in the last year.

aaroninsf on July 11, 2019 | | | | [–]

I will observe, without asserting that it is actually the case,

that successful executions of #3 should be indistinguishable from #4.

(And this is maybe a consequence of #1).

Diederich on July 11, 2019 | | | | [–]

I've also worked at a couple of the companies involved.

This is the correct analysis on every level.

kwizzt on July 11, 2019 | | | | [–]

How does the fact you worked at those companies relate to #4?

Edit: I misread the parent and my question doesn't make a lot of sense. Please ignore it :)

bdd on July 11, 2019 | | | [–]

> How does the fact you worked at those companies relate to #4?

For Facebook I worked on the incident, previous Wednesday. 9.5 hours of pain...

And for my past employers, I still have friends there texting the root causes with facepalm emojis.

liberte82 on July 11, 2019 | | | [–]

Do tell

GrumpyNl on July 12, 2019 | | | | [–]

Turned out to be number #1 The outage was due to an internal configuration change, which we're now fixing. Some people may be able to access Twitter again and we're working to make sure Twitter is available to everyone as quickly as possible.

captn3m0 on July 11, 2019 | | | | [–]

Can you clarify what redefining problems would mean (with an eg)?

GuiA on July 11, 2019 | | | [–]

Think of computer vision tasks. Until modern deep learning approaches came around, it was built on brittle, explicitly defined pipelines that could break entirely if something minor about the input data changed.

Then the great deep learning wave of 201X happened, replacing dozens/hundreds of carefully defined steps with a more flexible, generalizable approach. The new approach still has limitations and failure cases, but it operates at a scale and efficiency the previous approaches could not even dream of.

MegaButts on July 11, 2019 | | | [–]

That's not redefining the problem, so much as applying a new technology to solve the same problem. Usually using the flashy new technology decreases reliability due to immature tooling, lack of testing, and just general lack of knowledge of the new approach.

Also deep learning, while incredibly powerful and useful, is not the magic cure-all to all of computer vision's problems and I have personally seen upper management's misguided belief in this ruin a company (by which I mean they can no longer retain senior staff, they have never once hit a deadline, every single one of their metrics is not where they want it to be, and a bunch of other stuff I can't say without breaking anonymity).

idlewords on July 11, 2019 | | | | [–]

FAANG(+T)(-N)(+M)

18pfsmt on July 12, 2019 | | | [–]

I think we 'bumped heads' at Middlebury in '94, and I think you are in store for an "ideological reckoning" w/in 3 years.

Pinboard is a great product, so thanks for that. I am surpised you don't have your own Mastodon instance (or do you?).

gcbw2 on July 11, 2019 | | | | [–]

since all of them happen in high profile business hours, i'd guess either #1 or #5.

For #4 to be the actual cause, outages out of business hours would be more prevalent and longer.

icebraining on July 11, 2019 | | | [–]

Of course it went down during business hours, that's when people are deploying stuff. It's known that services are more stable during the weekends too.

peterburkimsher on July 12, 2019 | | | | [–]

The Archive.org outage of 26th of June was outside PST business hours.

https://twitter.com/internetarchive/status/11436045396956160...

https://twitter.com/internetarchive/status/11433789908260044...

iamtheworstdev on July 11, 2019 | | | | [–]

Faangt = Facebook amazon Apple Netflix Google tesla?

gsich on July 12, 2019 | | | [–]

Gmafia

arrty88 on July 12, 2019 | | | | [–]

Add slack to the list

Edit: and stripe

foobarbecue on July 12, 2019 | | | | [–]

Twitter, not tesla

loblollyboy on July 11, 2019 | | | | [8 more]

[flagged]

jkaplowitz on July 11, 2019 | | | [–]

Former employees and current employees talk via unofficial online and offline backchannels at many companies.

loblollyboy on July 11, 2019 | | | [–]

Ok, so maybe I overreacted

bdd on July 11, 2019 | | | | [–]

geez, tough crowd. do you wanna ten dollar hug?

loblollyboy on July 11, 2019 | | | [–]

I was just polishing my bit. Not in a bad mood today so much as a bored mood. You seem like you know what you are talking about (yes, I was bored enough to stalk you, too)

bdd on July 11, 2019 | | | [–]

If you are bored one day and around Menlo Park, come have a coffee or ice cream at FB campus. You can troll me in person.

18pfsmt on July 12, 2019 | | | [–]

Isn't it interesting where this is going? We all want to meet our accusers? I don't care for FB myself, but I appreciate what you all are doing in the larger sense. Cloudflare is my fave of your former employers (since you shared that in this discussion).

dang on July 11, 2019 | | | | [–]

Could you please stop posting unsubstantive comments to Hacker News?

hexrcs on July 11, 2019 | | | | [–]

Life in tech is like a Quentin Tarantino movie.

_lqaf on July 12, 2019 | | | [–]

...except everyone is sitting at desks typing, there's no blood or surf rock or chases or self-indulgent soliloquies, and the cursing is much less creative?

jessaustin on July 12, 2019 | | | [–]

Maybe you're doing it wrong?

robohoe on July 12, 2019 | | | | [–]

  cursing is much less creative?

I beg to differ.

marenkay on July 11, 2019 | | | | [–]

Only one thing to add:

Tech debt is accrued in amounts where every VC fund would get wet pants if tech debt was worth dollars paid out.

wybiral on July 11, 2019 | | | | [–]

I've still never seen this much downtime on these systems so it's weird to happen all at once.

It's possible that they're related without requiring any conspiracy theories or anything. Maybe these companies are just getting too big or too sloppy to maintain the same standard of uptime (compared to the past few years)? Or maybe there's some underlying issue that they're all rushing to fix which justifies the breaking prod changes within the same timeframe.

But it was weird when a it happened to two or three of them. Now we're going on something like 5 massive failures from some of the biggest services online within a little over a week...

idlewords on July 11, 2019 | | | [–]

Write a script to fire random events and you will notice they sometimes cluster in ways that look like a pattern.

rossdavidh on July 11, 2019 | | | [–]

You know, it would be cool if you found stats on the downtime metrics of these various high-profile recent outages, and calculated the odds of having such a cluster. Statistics is hard, though, and avoiding a "Texas Bulls-eye" would be hard.

wtallis on July 11, 2019 | | | [–]

"Celebrities die 2.7183 at a time": http://ssp.impulsetrain.com/celebrities.html

gcbw2 on July 11, 2019 | | | [–]

So the only take away is that now the population at large notices tech companies outages as much as they notice celebrity deaths?

uxp on July 12, 2019 | | | [–]

"population at large"

This thread is linked to a status page run by Twitter, on a programming and technology news site. I'm not really seeing how most people that exist in the western/1st world are noticing this. Is there a CNN article, or FoxNews segment on how tech companies are having outages?

gcbw2 on July 12, 2019 | | | [–]

Yes, fox news even suggest it was part of a large coordinated censorship effort on the POTUS :D

https://www.foxnews.com/tech/twitter-suffers-widespread-outa...

quote from that url: "The outage came as President Trump was hosting a social media summit with right-wing personalities and tech industry critics who've accused Twitter and other websites of having an anti-conservative bias."

appleiigs on July 11, 2019 | | | | [–]

Already did that here: https://news.ycombinator.com/item?id=20356610

rossdavidh on July 11, 2019 | | | [–]

Sure does look like we are way out there at the tail end of the probability distribution, by those numbers.

mruts on July 12, 2019 | | | | [–]

I mean, we can assume the downtime variance follows a normal distribution. It should pretty easy to calculate P<.05 with just a little bit of data.

rossdavidh on July 12, 2019 | | | [–]

What you say could be true, but I don't know that we can assume it. If downtime requires several things to happen (cascading errors), but those things interact somehow (problem with one makes another more likely), I could imagine it might not be normally distributed. Disclaimer: I Am Not A Statistician.

rossdavidh on July 11, 2019 | | | | [–]

Oh, sure. But Apple, Google, Cloudflare, Stripe, Slack, Microsoft, we're getting to more than five even...

root_axis on July 11, 2019 | | | [–]

The logic of the GP still applies though. Sites have outages every day so it is inevitable that some large sites will fail around the same time. Also, we know that Cloudflare and Twitter outages were attributed to configuration changes, probably others have benign explanations as well.

rossdavidh on July 12, 2019 | | | [–]

Sure, but "configuration changes" does not exclude several of these options. For example, is it harder to predict/deal with the consequences of configuration changes than it used to be?

root_axis on July 12, 2019 | | | [–]

Well, the options above cover pretty much every possibility, including the one I'm suggesting.

mattwad on July 11, 2019 | | | | [–]

Reddit went down this morning too

gpm on July 11, 2019 | | | [–]

Reddit goes down a lot though in my experience.

kabwj on July 11, 2019 | | | | [–]

Reddit being up for 24 hours or generating pages in less than 3 seconds would be noteworthy.

gnulinux on July 11, 2019 | | | | [–]

Reddit goes down pretty frequently. It's been that way for years.

opencl on July 11, 2019 | | | | [–]

And now Discord is down!

gsich on July 12, 2019 | | | [–]

no loss

shobith on July 11, 2019 | | | | [–]

This. I have first hand experience in this phenomenon multiple times. Complexity helps this effect too.

JaRail on July 11, 2019 | | | [–]

First, I think our general uptime metrics are trending upwards. Recovery times tend to be much shorter as well.

Big services are bigger, more mission-critical parts can fail.

Continuous development culture is designed with failure as part of the process. We don't spend time looking for obscure issues when they'll be easier to find by looking at metrics. This is fine when a staggered deployment can catch an issue with a small number of users. It's bad when that staggered deployment creates a side-effect that isn't fixed by rolling it back. Much harder to fix corrupted metadata, etc.

Automated systems can propagate/cascade/snowball mistakes far more quickly than having to manually apply changes.

We notice errors more now. Mistakes are instantly news.

hn_throwaway_99 on July 11, 2019 | | | [–]

> We notice errors more now. Mistakes are instantly news.

Heck, just look at Twitter itself from its original "Fail Whale" days where there was so much downtime, to now where even this relatively small amount of downtime is the top story on HN for hours.

dillonmckay on July 12, 2019 | | | [–]

So, when it went down, was there a Fail Whale displayed during this most recent incident?

Algol on July 12, 2019 | | | [–]

I think they retired the fail whale some time ago.

I looked it up: in 2013, because they didn't want to be associated w/ outages.

johngalt on July 11, 2019 | | | [–]

5) Operational reliability is both difficult and unsexy.

The fancy new feature, increasing traffic, or adding AI to something will generate headlines, accolades, and positive attention. Not having outages is something everyone expects by default. This goes double for work that prevents outages. No one wins awards for what doesn't happen.

How many medals are pinned on the guys installing fire sprinklers?

meristem on July 11, 2019 | | | [–]

Corollary: Work that prevents outages--or safe work--is SO unsexy it does not get noticed, but work that causes outages is postmortem-ed to death (pun intended).

t0astbread on July 11, 2019 | | | [–]

Or maybe it's because the internet is tendentially becoming just a few companies' data centers? Afaik Twitter moved to GCP a few months ago. Maybe this is another Google outage?

azurezyq on July 11, 2019 | | | [–]

Less likely since it looks fine from GCP status page.

Hmm, it seems that Twitter already figured it out, configuration change issues again.

marenkay on July 11, 2019 | | | [–]

Probably because we all use Kubernetes and YAML files and 100% of configuration failures are "oh shit, I used two spaces instead of 4, we're fucked".

Jach on July 11, 2019 | | | [–]

Something like this is my bet too, there was a recent post somewhere called something like "why all outages are due to a configuration change". There are monocultures in site reliability ops for big companies, "configuration over code" but with heavy automation too. From my outside view it seems there's a tradeoff when you do that between more frequent smaller issues and less frequent bigger issues. Also reminds me of Google's move away from eventual consistency because with their infrastructure they can make a CP system highly available in practice... except when it isn't, due to a botched configuration change.

ionforce on July 11, 2019 | | | | [–]

> tendentially

Is this a word? You don't mean tangentially? I'm having a crisis right now.

snazz on July 11, 2019 | | | [–]

https://www.thefreedictionary.com/tendentially

Probably meant tangentially anyway.

EForEndeavour on July 11, 2019 | | | [–]

"Tangentially" would make less sense. More likely, they meant to convey a present-participle form of "the internet tends to be consolidated."

t0astbread on July 11, 2019 | | | | [–]

Is it not? Sorry if I got it wrong, English isn't my first language.

dict.cc (my source of truth for English vocab) says it's a word: https://www.dict.cc/?s=tendentially

thfuran on July 11, 2019 | | | [–]

It's apparently a word but I'd say it's quite uncommon. I played around with google ngram viewer and had a hard time coming up with a word that is less common. But I finally came up with "astrophotographic".

E: "unsurpassingly" is way down there too

t0astbread on July 12, 2019 | | | [–]

It's common in German, so I figured it wouldn't be uncommon in English. Oh well :)

edflsafoiewq on July 11, 2019 | | | | [–]

It's very common in Biblical criticism (transliterated from German).

nullstyle on July 11, 2019 | | | | [–]

https://www.dictionary.com/browse/tendential

listic on July 11, 2019 | | | [–]

I (don't) like how you exclude Russia, China, Iran and somebody from your definition of 'us'.

Parcle on July 11, 2019 | | | [–]

His definition of "us" seems to just be "Americans". Which is fine because he's talking about American companies...

rossdavidh on July 11, 2019 | | | | [–]

The assumption is that Russia, China, Iran are less dependent on Google, Twitter, etc., in part because some of them aren't allowed to operate in those countries, in part because some of them are much less dominant in those markets. 'Us' means 'people who might care that Twitter (or whoever) is down'.

vkou on July 11, 2019 | | | [–]

Google, Twitter, Reddit, Facebook, etc, all legally operate in Russia.

_jyog on July 11, 2019 | | | [–]

But most have regional replacements. WeChat in China, VK (and some Telegram, though it's now blocked?) in Russia. This makes them less reliant on the American originals, which is why governments often encourage home-grown knock-offs.

bromuro on July 12, 2019 | | | | [–]

Yes I have been also hit by the same bad feeling. Thanks for pointing it out.

godarni on July 11, 2019 | | | [–]

Lots of people on vacation this time of year. Would be interesting to see if there is a seasonal component to the reliability of these services.

syn0byte on July 11, 2019 | | | [–]

"Don't forget to occasionally do that thing I mentioned in passing 2 weeks ago, under my breath, during a klaxon alarm test. Otherwise the errors will DDoS the whole cluster. See you in a week, goodluck!"

Nah - that would never happen.

papito on July 11, 2019 | | | [–]

#1. I think the art of keeping things simple is being lost. These days people will mush together ten different cloud services and 5,000 dependencies just for a Hello World.

moret1979 on July 11, 2019 | | | [–]

One possibility on 5) Too many KPIs and quarter goals to be reached, too many corners cut.

bastijn on July 11, 2019 | | | [5 more]

Obligatory to watch with this comment:

"Let's deploy to production" https://youtu.be/5p8wTOr8AbU

kirubakaran on July 11, 2019 | | | [–]

You know, I've watched a few of those memes over the past, but this one was especially well done, and timed perfectly with his gestures even!

jsjohnst on July 12, 2019 | | | | [–]

The only possible way for me to make it more than 20-30 seconds into that was to mute it. That guy’s laugh is multiple orders of magnitude worse than nails on a chalkboard. Funny story (albeit too real), but man, mute before clicking everyone.

sampleinajar on July 11, 2019 | | | | [–]

No idea how I haven't seen this, but it totally made my day.

winrid on July 12, 2019 | | | | [–]

This hit close to home. Hilarious. Thanks.

humanfactor on July 11, 2019 | | | [–]

1/2) These are web apps. Big web apps but web apps none the less. We know what can go wrong theres nothing really new here. How would you quantify "too many pieces to make work". Is 1000 too many , 10000 ???? There are millions pieces of data on your harddrive and they work fine. In general the idea of variance can be solved with redundancy. Maybe there are not enough backups at twitter.

5/4) Incompetent people lead by incompetent people surrounded by yes men and a drug culture. Also having a company that demonizes conservatives which are some of the best engineers (scientist are squares naturally)

Human error is bound to happen and software is complex but so are rockets and supply chains. Things can go right and things can go wrong. Usually when they do go wrong there is a human error reason.

Does twitter foster a place where human error can occur more frequently that other places? I dont know. I have my bias about the company and any sjw company but thats very anecdotal.

Twitter worked yesterday and it doesnt work today. Doesnt really have to mean anything really important except for the fact that there is a blind spot in their process which they need to harden.

I guess the first person to ask is the dev op , then the developer. Something wasnt tested enough. That happens in commercial software, deadlines cant wait.

3)Russia / China / Iran ... stop watching CNN. You are parroting talking points. If twitter is crushed America could care less and would probably turn up sanctions, not lift them. Taking down twitter wont cripple anything in America except for certain marketers budgets.

papito on July 12, 2019 | | | [–]

Scientists are squares but they also have a brain. That's why they are usually not conservatives. Conservatives are not a party, it's a herd of paranoid people who tune into Fox News every night to be told what to be afraid of next, but it's definitely not engineers or scientists.

outworlder on July 11, 2019 | | | [–]

Brains are excellent pattern matchers.

Brains also suck at statistics.

jimmaswell on July 12, 2019 | | | [–]

This is the first time I can remember so much happening so close. It's statistically unlikely.

pennaMan on July 11, 2019 | | | [–]

>July 11, 2019 7:56PM UTC[Identified] The outage was due to an internal configuration change, which we're now fixing. Some people may be able to access Twitter again and we're working to make sure Twitter is available to everyone as quickly as possible.

Seems #4 is the winner

gniv on July 11, 2019 | | | [–]

Or #1.

I work on critical infrastructure at FAANG and it's frightening how complex things are. The folks who built the systems and knew them inside-out have mostly moved on, and the newbies, like me, don't fully understand how the abstractions leak, what changes cause what side effects etc.

narrator on July 11, 2019 | | | [–]

6) White House social media conference just started.

https://www.10tv.com/article/trump-hosts-white-house-summit-...

papito on July 11, 2019 | | | [–]

It's a social media troll conference. Let's call it what it is.

adventist on July 11, 2019 | | | | [–]

Really not a good look.

mcqueenjordan on July 11, 2019 | | | [–]

https://en.wikipedia.org/wiki/Poisson_distribution

djtriptych on July 11, 2019 | | | [–]

I've been suspecting 3) for a few months now, and I'm quite curious how our government would handle it if it _were_ the case. Only a few of these outages have had plausible post-mortems ever made public.

MrStonedOne on July 11, 2019 | | | [–]

Operational consistency creates a hidden single point of failure.

If everybody is doing the same things and setting things up the same way to ensure reliability then any failures or short comings in that system are shared by all.

AnIdiotOnTheNet on July 11, 2019 | | | [–]

It's #1. The real question is how this isn't blindingly obvious to everyone.

rossdavidh on July 12, 2019 | | | [–]

One possible answer: it's hard to admit that what you've worked really hard at becoming an expert in, might have been a mistake.

koonsolo on July 11, 2019 | | | | [–]

Because we can't all be as smart as you are.

jayd16 on July 11, 2019 | | | [–]

My guess is its a slow news time of year coupled with more usage of cloud services which means these types of stories are higher profile.

Qwertystop on July 11, 2019 | | | [–]

Relating to 1: https://www.youtube.com/watch?v=pW-SOdj4Kkk (Jonathan Blow's "Preventing the Collapse of Civilization"... perhaps a melodramatic title, but well-said overall.)

marenkay on July 11, 2019 | | | [–]

Or we just managed to construct the biggest circular dependency ever using the whole internet and a combination of all hyped languages and frameworks.

That would in turn lead to an insanely fragile system with increasing amounts of failures that seem unexplainable/weird.

chrismarlow9 on July 12, 2019 | | | [–]

Everything is made of plastic these days, even software. It's immediately put out as soon as an MVP is ready. Too many managers with zero coding experience. The marketing people have taken the browser. Time to start over.

asark on July 11, 2019 | | | [–]

This is a pattern one might see if there were a secret, rolling disclosure of some exceptionally-bad software vulnerability, I'd think. Or same of some kind of serious but limited malware infection across devices of a certain class that sees some use at any major tech company. If you also didn't want to clue anyone else (any other governments) in that you'd found something (in either case), you might fix the problem this way. Though at that point it might be easier to just manufacture some really bad "routing issue" and have everyone fix it at once, under cover of the network problem.

depr on July 11, 2019 | | | [–]

so like all software has reached peak complexity this month?

rossdavidh on July 11, 2019 | | | [–]

It seems a bit of a coincidence, yes? Unless they are all copying each other (e.g. all using Kubernetes or what-have-you), in which case it might be less of a coincidence.

rossdavidh on July 11, 2019 | | | [–]

Ok, I have one to add myself:

6) We used to have many small outages at different websites. Now, with so many things that once were separate small sites aggregated on sites like FB, Twitter, Reddit, etc we have a few large sites, so we have aggregated the failures along with that. The failure rate, by this theory, is the same, but we have replaced "many small failures" with "periodic wide-spread failures, big enough to make headlines". Turning many small problems into a few bigger ones. Just another hypothesis.

dv_dt on July 11, 2019 | | | [–]

Another possibility: US (or other) authorities are requiring some sort of monitoring software or hardware that where disruption of service is unavoidable during install

snazz on July 11, 2019 | | | [–]

Keeping that many mouths shut seems impossible.

dv_dt on July 11, 2019 | | | [–]

Most people won't be directly involved in assessing or fixing the fault. "Sorry the network link went down, here is the after analysis report," seems like a reasonable cover. There are many espionage activities which are covered up, only to come out decades later.

But really, I don't have any evidence that this possibility is any more or less likely than any other.

NightlyDev on July 11, 2019 | | | [–]

Software is getting increasingly complex. Why? To ensure better uptime, amongst other things. The funny part is that all the complexity often leads to downtime.

A single server would usually have less downtime than Google, Facebook and so on. But Google and Facebook needs this complexity to handle the amount of traffic they're getting.

Makes me wonder why people are trying to do stuff like Google when they're not Google. Keeping it simple is the best solution.

DaveInTucson on July 11, 2019 | | | [–]

> Just a series of unconnected errors at big companies

Except that "at big companies" is basically selection bias, problems at little companies don't get noticed because they're, well, small companies.

And the underlying issue of the "unconnected errors" is that software is rather like the airline industry: things don't really get fixed until there's a sufficiently ugly crash.

bArray on July 11, 2019 | | | [–]

For point #3, there are a few irregularities:

1. Services all going down one after another. 1 goes down - it happens. 2 go down - it happens sometimes. 3 go down - quite a rare sequence of events. But now a large number of silicon valley companies have experienced service outage over the last few weeks.

2. Russian sub that is said to be a "deep sea research vessel" somehow experiences a fire whilst in international waters [1]. It has been suspected that it could have been tapping undersea cables. Let's imagine for a moment a scenario where they were caught in the act, some NATO sub decides to put an end to it and Russia cover it up to save face.

3. Russia announces tests to ensure that it could survive if completely cut off from the internet [2]. A few months later it's like somebody is probing US services in the same way.

4. There is currently a large NATO exercise in a simulated take-over of Russia happening in Countries close to Russia [3].

Of course it's completely possible it's all unconnected, but my tin foil hat brain says there is a game of cloak and daggers going on here. I would say that Russia's incentive for probing the US/NATO is to test it's weakness after undergoing a trade-war with China and raising sanctions against Iran. After all, Russian fighter planes regularly try to fly into UK airspace just to test their rapid response crews [4], this sort of behaviour is typical of them.

[1] https://en.wikipedia.org/wiki/Russian_submarine_Losharik

[2] https://techcrunch.com/2019/02/11/russia-internet-turn-off-d...

[3] https://sofiaglobe.com/2019/05/13/6000-military-personnel-to...

[4] https://www.theguardian.com/world/2018/jan/15/raf-fighters-i...

lovecg on July 11, 2019 | | | [–]

It’s #4 but caused by #1. My pet theory is that we’re pretty far into this business cycle, so a lot of new companies had the time to mature, build up complexity, shed people the most knowledgeable with the original architecture, stop caring as much about the competition, and so on. Add Apple to the mix for recent software quality issues.

Hamuko on July 11, 2019 | | | [–]

>It has now reached the point where even large companies cannot maintain high reliability.

Waiting for this to be backed up by statistics.

matwood on July 11, 2019 | | | [–]

Reddit was also partially down this morning.

xvector on July 11, 2019 | | | [–]

Reddit's down weekly, though, so that's no big deal.

jhoh on July 11, 2019 | | | [–]

Maybe those "INSTALL OUR APP NOW!!!" banners, floating action buttons, popups and bottom/top fixed bars caused too much traffic.

pard68 on July 11, 2019 | | | [–]

NSA firmware updates requiring a reboot.

frogpelt on July 11, 2019 | | | [–]

5) Some of all of the above?

Although 3) doesn't have to be the explanation, it is definitely happening all the time.

Bluecobra on July 11, 2019 | | | [–]

4.

I think people are too accustomed now to high availability/uptime nowadays. I started using the Internet in the mid 90s. Stuff used to break all the time back in those days. Now I can’t remember the last time I couldn’t reach a website because it has been Slashdotted.

foobiekr on July 12, 2019 | | | [–]

4.

And imho all that’s really happening is people are noticing the outages more. This is a good thing. For years too much of the mental model has been “{ cloud, google, Facebook, aws, xxx } never goes down!”

That’s been unhealthy. It’s a good thing.

dqpb on July 11, 2019 | | | [–]

3) Come on man, you can't just go around opening parentheses and then not closing them.

julienfr112 on July 11, 2019 | | | [–]

What about raising temperatures ?

daveheq on July 11, 2019 | | | [–]

I don't believe it's too complex, I believe people are getting lazy. Complexity can be handled by automation, but too often people just want to rush things out for a buck instead of planning out a good product.

ljm on July 11, 2019 | | | [–]

Hypergrowth/blitzscaling also introduces entropy.

The more you hire, the more plentiful and diverse your bugs will be.

It stands out now because the stars aligned. But theses issues have been coming and going for years in patternless form.

Angostura on July 11, 2019 | | | [–]

5) The increasing interconnectedness of things introducing new interdependences so that when one service stumbles so do many others.

sebastianconcpt on July 11, 2019 | | | [–]

I'd normally go for #4 but hypotheses #3 is starting to be a more plausible explanation for the timely "coincidence".

rossdavidh on July 12, 2019 | | | [–]

A friend of mine who is retired military told me there is a saying that "once is bad luck, twice is a coincidence, but three times is enemy action". Doesn't necessarily mean it's true, of course.

root_axis on July 12, 2019 | | | [–]

There's also the HN filter bubble which could be presenting a misleading picture of "outage" frequency.

taf2 on July 11, 2019 | | | [–]

6) It's summer and lots of engineers are either a. on vacation or b. thinking less clearly

danellis on July 11, 2019 | | | [–]

#4.

When things are random, they cluster.

dboreham on July 12, 2019 | | | [–]

Could be a Tacoma Narrows Bridge type problem.

djohnston on July 11, 2019 | | | [–]

it's end of half. everyone is entering reviews. gotta deliver.. somerthing...

auslander on July 11, 2019 | | | [–]

Sysadmin and DevOps engineer walk into a bar ...

willart4food on July 12, 2019 | | | [–]

all of the above!

elamje on July 11, 2019 | | | [–]

The brain is the greatest pattern matcher in the world. While it is unlikely all of these companies would have major outages in a month, be wary that the subconscious is constantly generating narratives to explain statistical anomalies.

Interesting theories nonetheless:)

dmix on July 11, 2019 | | | [–]

The more the conspiracy grows the faster these otherwise minor stories shoot to the top of HN and add to the pattern.

It fuels itself.

alanbernstein on July 11, 2019 | | | [–]

"Minor" seems inappropriate. Can you remember another time when so many high-profile websites/services have had outages in so short a time span?

ambrice on July 11, 2019 | | | [–]

No. And a year from now I won't remember this time either.

braythwayt on July 11, 2019 | | | | [–]

> be wary that the subconscious is constantly generating narratives to explain statistical anomalies

This comes up all the time in sports. Let's take pool for example. There are various guesstimates floating around, and I do not have access to detailed tournament statistics, but I have heard that in games where sinking a ball on the break is an advantage, for decent players there's maybe a 75% chance that a ball will go down.

So once in every four breaks, you won't sink a ball. How often do you fail twice in a row? Once in every sixteen breaks. Failing three times in a row? Once in every 64 breaks. Four times in a row? Once in every 256 breaks.

What about five straight breaks without sinking a ball? Once in every 1,024 breaks. That's a lot of breaks. But wait up a moment.

Let's ask, "If you miss a break, what're the odds of it becoming a streak of five misses in a row?" The answer is, "One in every 256 streaks of misses will be a streak of five or more misses." 1/256 is not particularly rare, if you play often enough to sink a ball on the break 75% of the time.

What is the point of knowing that a streak of five misses in a row is rare but not that rare? Well, if you miss five in a row, do you chalk your cue for break number six as usual? Or do you tell yourself that your break isn't working, and start adjusting your stance, aim, spin, &c?

If you start adjusting everything when you get a streak of five misses in a row, you may just make things worse. You have to pay enough attention to your distribution of misses to work out whether a streak of five misses in a row is just the normal 1/256 streaks, or if there really is something amiss.

The brain is a great pattern matcher, but it sucks at understanding statistics.

---

The flip side of this, of course, is that if you upgrade your brain well enough to understand statistics, you can win a lot of money.

If a pro misses five in a row, feel free to wager money that they'll sink a ball on their next break. Your friends may actually give you odds, even though the expectation of winning is 75-25 in your favour.

liberte82 on July 11, 2019 | | | [–]

This is a great explanation of the issues we have with statistics. You see this all the time in other sports too. As a hockey watcher, fans always want “explanations” for a loss or losing streak. More often than not, it’s just bad luck, and the kneejerk reactions that coaches and GMs take often just make things worse.

Nate Silver did a writeup showing the math around how the winner of the Stanley Cup comes down to little more than random chance.

whatshisface on July 11, 2019 | | | | [–]

Saying that it's an illusory pattern without checking the statistics is no more scientific than saying it's a conspiracy without checking the statistics.

richk449 on July 11, 2019 | | | | [–]

> The brain is the greatest pattern matcher in the world.

You have obviously never tried to model the stock market with a neural net.

akhilcacharya on July 11, 2019 | | | [–]

The other possibility is intern season (I'm 99.99% joking)

rossdavidh on July 11, 2019 | | | [–]

I'm 99.99% laughing (and 0.01% thinking 'uh oh').

kgraves on July 11, 2019 | | | | [–]

As much as I don't like interns, I am sure that they wouldn't even touch a system of the scale like Twitter's in my opinion. /s

sbmassey on July 11, 2019 | | | [–]

Taking down Twitter could be a huge boon for the economy, though.

rossdavidh on July 11, 2019 | | | [–]

Productivity skyrockets!

ducktypegoose on July 11, 2019 | | | [–]

Software complexity escalating over time? Please! The new microservices architecture we have been migrating to over the last year or so is so stable and makes tracking down problems a walk in the park. Not to mention the NOSQL database is a dream come true, as long as you don't need to query anything other than the partition key.

jonprobably on July 11, 2019 | | | [–]

It's summer time and everyone who knows how stuff works is halfway through a drink right now. Probably with their families. Is it a trend year over year for 7/4 +/- a week?

idlewords on July 11, 2019 | | [–]

So storytime! I worked at Twitter as a contractor in 2008 (my job was to make internal hockey-stick graphs of usage to impress investors) during the Fail Whale era. The site would go down pretty much daily, and every time the ops team brought it back up, Twitter's VCs would send over a few bottles of really fancy imported Belgian beer (the kind with elaborate wire bottle caps that tell you it's expensive).

I would intercept these rewards and put them in my backpack for the bus ride home, in order to avoid creating perverse incentives for the operations team. But did anyone call me 'hero'?

Also at that time, I remember asking the head DB guy about a specific metric, and he ran a live query against the database in front of me. It took a while to return, so he used the time to explain how, in an ordinary setup, the query would have locked all the tables and brought down the entire site, but he was using special SQL-fu to make it run transparently.

We got so engrossed in the details of this topic that half an hour passed before we noticed that everyone had stopped working and was running around in a frenzy. Someone finally ran over and asked him if he was doing a query, he hit Control-C, and Twitter came back up.

evanweaver on July 11, 2019 | | [–]

I worked there at the time and ending up running the software infrastructure teams that fixed all these problems. The beer wasn't a reward, it was because people were stressed and morale was low. Nobody brought the site down on purpose.

What really made me mad was when we hired consultants and the contract would end, usually without much success because Twitter's problems were not normal problems, and then they would send us a fancy gift basket with our own wasted money.

Maciej, we are still waiting for you to ship the executive dashboard.

idlewords on July 11, 2019 | | | [–]

That dashboard supported something like a dozen people over its lifetime. One person would start writing it, then quit, and be replaced by another person who rewrote it in their preferred language, and then the cycle would repeat.

It was a VC-funded welfare program for slackers and I miss it greatly.

nevf1 on July 12, 2019 | | | [–]

I lol'd at "welfare program for slackers" - That's the dream really... Find a chaotic workplace that lets you play with your favorite languages and no real tangible outcome.

tonystubblebine on July 11, 2019 | | | [–]

To take the history of direct queries at Twitter even further back, I built a web interface at Odeo for the CEO to run direct queries against the database (and save them so he could re-run them). There were some basic security precautions, but this was totally cowboy.

That Odeo team was filled with best practices aficionados and the management (including me) was a bit cowardly about being clear that "WE ARE FAILING HARD AND FAST." Damn the practices.

So of course the engineering team freaked out, especially since the CEO managed to find lots of queries that did take the site down.

But I honestly credit that as one of the biggest things that I contributed to Twitter. Having easy SQL access let the CEO dig into the data for hours, ask any question he wanted, double check it, etc. He was able to really explore the bigger question, "Is Odeo working?"

The answer was no. And that's how he decided to fully staff Twitter (twttr then) as a side project, buy back the assets, and set Twitter up as it's own thing.

I think that it really was very close--if we'd moved any slower we would have run out of money before anyone was ready to commit to Twitter. Same story about Rails--without being able to do rapid prototyping we never would have convinced ourselves that Twitter was a thing.

nthj on July 11, 2019 | | | [–]

Just a quick note not directed at OP but for any other engineers that may be unaware, these days AWS makes provisioning a read replica painless, and you can point the CEO to up-to-the-minute data while essentially firewalling the queries from customer operations.

mlevental on July 11, 2019 | | | [–]

how?

teraflop on July 12, 2019 | | | [–]

First Google result for "aws read replicas": https://aws.amazon.com/rds/details/read-replicas/

> Using the AWS Management Console, you can easily add read replicas to existing DB Instances. Use the "Create Read Replica" option corresponding to your DB Instance in the AWS Management Console.

NKCSS on July 11, 2019 | | | | [–]

Why not have it run against a replicated copy? I did that in the past, works amazingly, they can f* up all they want without any implications.

tonystubblebine on July 11, 2019 | | | [–]

This was 2005. We had dedicated servers in our own cage. I can't remember if we already had replicas. It seems plausible. But actually spinning up a new one would have required more work and convincing than I wanted to do.

AgentME on July 12, 2019 | | | | [–]

It's probably easy to do if you know it's an issue to begin with. I've run into this scenario before (running sql queries to read data that turned out to lock everything) and it caught me by surprise. Why would a read query cause the database to lock anything? I thought databases did stuff like multiversion concurrency control to make locks like that unnecessary.

joevandyk on July 12, 2019 | | | [–]

Doing large queries on a Postgres standby had the potential to mess up the master, depending on configuration settings.

ziftface on July 12, 2019 | | | | [–]

Thanks for sharing. Out of curiosity, why was the answer no? Was the issue the downtime or something more subtle?

tonystubblebine on July 12, 2019 | | | [–]

I think in the end he lost faith over retention. We got a lot of traffic and new users but didn't keep any of it. He was already suspicious that iTunes was going to kill us and so the stats were the nail in that coffin. He was right. We were ten years too early to podcasting.

sbmthakur on July 11, 2019 | | | [–]

This reminded me of something too!

I used to work(on backend) on a popular app(in my country) which had a good number of users. One day I was asked to work with some infra/sysadmin folks who wanted to fix some issues with the servers in our inventory. We happily updated kernels and even rebooted servers a few time. I came back to my team and saw them deeply engrossed into production logs. Turns out few of the servers that were "fixed" were actually production servers. I almost shouted the F word when I listed all IPs. This confusion happened because the server guys used data IPs and we used management IPs. This exposed serious miscommunication among our teams. But fun times indeed!

oneeyedpigeon on July 11, 2019 | | | [–]

> It took a while to return, so he used the time to explain how, in an ordinary setup ...

This one was visible from such a great distance, it's a wonder neither of you spotted it as it happened! I love your post — reminds me of BOFH :)

idlewords on July 11, 2019 | | | [–]

The guy had an amazing beard, with streaks of white in it! He looked like a great wizard to me. I remember even as we noticed people were frantic, saying to one another "oh man, another outage, thank goodness it's not us!"

emerongi on July 11, 2019 | | | [–]

And now it's a full-blown sitcom scene

baud147258 on July 11, 2019 | | | | [–]

A true BOFH would have either disposed of any witness or made them the culprit.

SllX on July 11, 2019 | | | [–]

A true BOFH works with what he’s got, and when what he’s got is a fool willing to do all his work for him, then it’s time to implement Plan A: sit back and enjoy the fireworks.

ryandrake on July 11, 2019 | | | [–]

> The site would go down pretty much daily, and every time the ops team brought it back up, Twitter's VCs would send over a few bottles of really fancy imported Belgian beer

Never understood this mentality but have seen it at many companies. Rewarding someone or some team for heroically fixing something after a catastrophic failure. Talk about misaligned incentives! Reminds me of the Cobra Effect [1]. When you reward “fixing a bad thing” you will get more of the bad thing to be fixed.

1: https://en.wikipedia.org/wiki/Cobra_effect

skybrian on July 11, 2019 | | | [–]

Seems like maybe you want to reward fire fighters and also reward fire prevention?

micahgoulart on July 11, 2019 | | | [–]

boy, do i have a podcast episode for you: https://casefilepodcast.com/case-98-the-pillow-pyro/

mzno on July 12, 2019 | | | [–]

from a complete rando: thanks for posting this — will listen to it later today.

azhenley on July 11, 2019 | | | [–]

This gives me hope that one day I will be able to run a startup. The big tech companies aren't too different than the rest of us after all...

kgraves on July 11, 2019 | | | [–]

Agreed, the only thing that a showstopper for me is the money and talent, It is still a struggle to find talented people who want to work for a startup.

goobynight on July 11, 2019 | | | [–]

Even harder to find ones that wish to remain working for a startup!

fzort on July 11, 2019 | | | [–]

This is hilarious, thanks for sharing. I used to work at companies like this, except they weren't worth billions of dollars.

isostatic on July 11, 2019 | | | [–]

Neither was twitter in 2008, it didn't reach $1b until the end of 2009

kreetx on July 11, 2019 | | | | [–]

The story is most probably not true. Love the taco tunnel though :)

Edit: apparently the stories actually are true.

busterarm on July 11, 2019 | | | [–]

This is the same group of folks who wrote the infamous ranty blog shitting all over Rails back in...'11(?) when it was pretty clear that their workload wasn't suited to an RBDMS and ActiveRecord. They wrote their own message queue twice despite suggestions to use known tools before eventually giving up.

applecrazy on July 11, 2019 | | | [–]

That’s hilarious. Reminds me of a clip from the show Silicon Valley.

ParadisoShlee on July 11, 2019 | | | [–]

Is that beer story satire?

idlewords on July 11, 2019 | | | [–]

No, it is true.

kreetx on July 11, 2019 | | | [–]

Is it actually really true? The second part, too? I thought this can't be true and must be a (good) story just to amuse the readers - I guess I was wrong.

rco8786 on July 12, 2019 | | | [–]

I worked there for a bit. Sometime around 2014 I dropped a production DB table (via a fat finger, shouldn’t have even been possible in hindsight). It wasn’t consumer facing but the internal effect made it look like all of Twitter was simultaneously down. Mass hysteria there for 20 min or so.

colpabar on July 11, 2019 | | | [–]

Can someone explain the joke (about the beer) because I genuinely don't understand

edit: pretty please

baud147258 on July 11, 2019 | | | [–]

Each time the ops team brought Twitter back up, they receive good beer. So it would also mean that each time Twitter goes down, they could expect to receive the beer. Without idleword's actions, they would have an incentive (good beer) for having Twitter keep going down and not doing work to improve the stability.

arkades on July 11, 2019 | | | | [–]

Under the guise of preventing the ops team from being incentivized to create outages, he was selflessly stealing all the nice beer for himself.

ficklepickle on July 11, 2019 | | | | [–]

He took the beer because he wanted it. "Perverse incentives" are an excuse, because nobody is going to kill their production servers and all the panic that entails for like $10 worth of beer.

Zebfross on July 11, 2019 | | | | [–]

Sounds like the guy was bragging about his SQL skills to avoid locking the database but ended up locking the database anyway (thus, people running around)

treis on July 11, 2019 | | | | [–]

If the ops team got beer every time the servers went down (as a reward for fixing them) then they'd have an incentive for the servers to go down.

Jach on July 11, 2019 | | | [–]

We all understand the perverse incentives joke, I think what's confusing people here is whether there's some other hidden joke they're missing that suggests not to take OP at his word that yes, he did make off with someone else's gift, which is generally considered a dick move.

slackfan on July 11, 2019 | | | | [–]

What the hell are all of you smoking, some moderately expensive alcohol is nowhere near enough reward to take down a service.

colpabar on July 11, 2019 | | | [–]

If it was a sure thing that the ops engineers were doing that, then sure, it'd be kinda funny. Otherwise it just seems like a dick move.

wtallis on July 11, 2019 | | | | [–]

The alcohol was an incentive to bring the service back up quickly, but not an incentive to prevent it going down in the first place. Twitter was going down often enough on its own that nobody needed to be motivated to help it crash (except that bringing it back up sooner gives it another opportunity to crash again sooner).

eitland on July 11, 2019 | | | | [–]

Operant conditioning is a thing and it works.

While I and you would not do this I’m afraid that it would somehow find a way to work in this case too.

slackfan on July 11, 2019 | | | [–]

Ops engineers don't get paid enough to fix dev fuck ups enough as it is. No amount of beer is going to fix that.

vokep on July 11, 2019 | | | | [–]

He's taking home the special expensive beer and not telling them about it because he cares about the health and well being of his team so much, and yet they wouldn't even consider him a hero for this, how ungrateful they are!

wedn3sday on July 11, 2019 | | | | [–]

If everytime the site was brought back up (because it had gone down), and the ops guys got free fancy beer, then the message pretty quickly turns into, "if the site goes down, I get rewarded."

southerndrift on July 13, 2019 | | | [–]

In other words, that beer gave you the motive to bring twitter down, which you inevitably did by asking that question.

el_benhameen on July 11, 2019 | | | [–]

The second story had me in tears. Especially given that I'm building a similarly scary query right now (thankfully not against live).

jfountain2015 on July 11, 2019 | | | [–]

https://thedailywtf.com/

dmix on July 11, 2019 | | | [–]

Woo startups.

jayflux on July 11, 2019 | | | [–]

> We got so engrossed in the details of this topic that half an hour passed before we noticed that everyone had stopped working and was running around in a frenzy. Someone finally ran over and asked him if he was doing a query, he hit Control-C, and Twitter came back up.

This would not be out of place as a scene in Silicon Valley

georgehotelling on July 11, 2019 | | | [–]

idlewords, the user you're replying to, was listed as a consultant on the show

idlewords on July 11, 2019 | | | [–]

For a later season. This was one of my favorite scenes on the show.

kevinlou on July 11, 2019 | | | [–]

Completely unrelated, but I find myself reading your post about Argentinian steaks at least once a year. It's perfect. https://idlewords.com/2006/04/argentina_on_two_steaks_a_day....

klausa on July 11, 2019 | | | [–]

No joke, this post was largely the reason I wanted to travel to Argentina.

The food lived up to the mental image I had after reading the post.

trystero on July 11, 2019 | | | | [–]

I just found and read that article yesterday. The writing is on another level.

GFischer on July 11, 2019 | | | | [–]

As an Uruguayan, I loved it and found it entirely accurate :)

mrep on July 11, 2019 | | | | [–]

Not the best quality, but there is a scene just like that: https://www.youtube.com/watch?v=Dz7Niw29WlY

dubcanada on July 11, 2019 | | | [20 more]

"I would intercept these rewards and put them in my backpack for the bus ride home, in order to avoid creating perverse incentives for the operations team. But did anyone call me 'hero'?"

Wait so you stole rewards for a team that was spending time (I assume extra or stressful) on something you didn't do or have any part in. And you want a cookie?

I mean I get it, the company was probably not great in it's infancy. But what?

pgrote on July 11, 2019 | | | [–]

I think OP is saying the rewards were confiscated so the team wouldn't begin breaking things on purpose to get a reward when they fixed it.

UncleMeat on July 11, 2019 | | | [–]

Yeah but does anybody believe that the engineers would deliberately break things so they could have to work in a stressful environment bringing things back up just to get some free beer?

alexanderdmitri on July 11, 2019 | | | [–]

If your incentives are aligned w/firefighting as opposed to fire prevention b/c management is not motivating and rewarding the extra work that goes into avoiding these scenarios in the first place, you're encouraging fire.

Jach on July 11, 2019 | | | | [–]

Indeed, the usual motivation to try and be called a hero for putting out the fire you started is much more valuable than free booze: a title promotion with a pay bump.

idlewords on July 11, 2019 | | | | [–]

I don't want a cookie; I want more $24/bottle Belgian beer.

JorgeGT on July 11, 2019 | | | [–]

You should submit a request to the Pinboard CEO...

yyyk on July 11, 2019 | | | | [–]

Wouldn't that have made you the one with a "perverse incentive"?

C14L on July 11, 2019 | | | [–]

That explains why he walked over to the DB guy and asked him to run an expensive query on the life system ;)

gnulinux on July 11, 2019 | | | | [–]

That's usually called stealing, or something a little softer than that. It's interesting that you shared that experience expected for us to laugh at it. The rest of the comment was hilarious and I'm happy you shared it, but that bit is very odd. I also see where you're coming from. But your act was ethically questionable.

sizzle on July 11, 2019 | | | | [–]

Just wanted to say that I enjoy reading your blog.

vorpalhex on July 11, 2019 | | | | [–]

It's a joke. Laugh, it's funny.

Dylan16807 on July 11, 2019 | | | [–]

It's one of those jokes where if the story isn't true then the entire basis for it being funny disappears. (And if it is true then the joke isn't good enough to make up for the actions.)

slackfan on July 11, 2019 | | | | [–]

Having worked on a lot of ops teams in unstable environments, it's just really dickish.

q3k on July 11, 2019 | | | [–]

I also have. idlewords' post is one of the funniest things I've read this week.

nomadlogic on July 11, 2019 | | | | [–]

yea as an ops engineer that's probably the worst violation of trust i've ever heard of.

reaperducer on July 11, 2019 | | | | [–]

Wait so you stole rewards for a team that was spending time (I assume extra or stressful) on something you didn't do or have any part in.

The HR department in my company does this, and then redistributes the gifts to everyone in a random drawing at the Christmas party.

One year some department got a bunch of PlayStations, and a couple of them ended up in my department. The only thing my department contributed to the kitty was candy. I bet some people in that other department were disappointed.

CiPHPerCoder on July 11, 2019 | | | | [–]

Finally we get the long awaited sequel to One Flew Over the Cuckoo's Nest...

One flew over the dubcanada's head.

dubcanada on July 11, 2019 | | | [–]

Wait what did I miss something? lol

CiPHPerCoder on July 12, 2019 | | | [–]

The joke.

100100010001 on July 11, 2019 | | | [3 more]

Hero? You’re a villain who steps on teammates. The worst part is you thought it’d be okay to share that and think we’d be on your side. Have you no shame?

idlewords on July 11, 2019 | | | [–]

My job was to make growth graphs for investor slide decks, so by definition I had no shame.

DanFeldman on July 11, 2019 | | | [–]

Or, if you had any shame, its growth would be up and to the right!

sandworm101 on July 11, 2019 | | | [–]

>> he hit Control-C, and Twitter came back up.

Monolithic architecture. When I did security work I fought this every day. Moving away from it is a nightmare of technical debt and heated debate about who should control what. I'm reminded of a story from the early days of MSN. The legend goes that in the late 90s MSN ran out of one cabinet, a single server. The server had redundant power supplies, but only one physical plug.

hn_throwaway_99 on July 11, 2019 | | | [–]

> Monolithic architecture.

This particular problem had nothing to do with a monolithic architecture. Your app can be a monolith, but that still doesn't mean your BI team can't have a separate data warehouse or at least separate read replicas to run queries against.

Dylan16807 on July 11, 2019 | | | [–]

It's not "nothing to do with". You're correct that a monolithic architecture does not imply that a single read query will lock the entire database. But it is a prerequisite.

hn_throwaway_99 on July 11, 2019 | | | [–]

Not really. I've seen more than one microservice-architected (admittedly, poorly) systems where, instead of the whole DB freezing up, just the one DB would freeze, but then all of the other microservices that talked to the frozen microservice didn't correctly handle the error responses, so now you had corruption strewn over multiple databases and services.

So, while true the failure mode would be different, "one bad query fucking up your entire system" is just as possible with microservices.

sho on July 11, 2019 | | | | [–]

And of course this is standard practise. I've contracted on largish apps before (rails! Shock!) and of course we provided read-only replicas for BI query purposes. I wouldn't have provided production access even if asked.

Anything else is simple incompetence and the macro-organisation of the code and/or services is irrelevant.

sandworm101 on July 11, 2019 | | | | [–]

If your website crashes because a single person ran a query, your system is too monolithic. You can have thousands of little microservices running all over the place, but a single query causing a fault proves that a vital system is running without redundancy or load sharing and that other systems cannot handle the situation. You have too many aspects of your service all tied together within a single system. It is too monolithic.

zbentley on July 12, 2019 | | | [–]

I think "monolithic" and "fragile" are orthogonal concepts.

Karawebnetwork on July 11, 2019 | | | [6 more]

> I would intercept these rewards and put them in my backpack for the bus ride home, in order to avoid creating perverse incentives for the operations team. But did anyone call me 'hero'?

Wait, I don't understand.

Why would anyone call you hero?

Are you suggesting that the team would deliberately crash the app to receive beers and that by stealing them you stopped this from happening?

Free drinks and free food is the standard here to reward teams when they spend extra unpaid time away from their families.

All of the posts asking the same question are being down voted. Am I missing something?

You said you were a contractor at the time. Unless you were on the management team I fail to see how this was your responsibility to choose what happened.

LeoPanthera on July 11, 2019 | | | [–]

> Am I missing something?

That it is a joke.

Karawebnetwork on July 11, 2019 | | | [–]

The humor must be lost in translation then, I don't see anything resembling a joke.

teddyh on July 11, 2019 | | | | [–]

> Are you suggesting that the team would deliberately crash the app to receive beers

https://en.wikipedia.org/wiki/Perverse_incentive

Karawebnetwork on July 11, 2019 | | | [–]

Yes, the cobra effect exists. Should this mean that everyone needs to stop all forms of positive reinforcement? I don't believe so.

I doubt anyone would risk a comfortable job at Twitter against a few bottles of beers. Even if they are really fancy, that's what... $20-50?

If this had been worded as a "Haha, I stole the bad team's beer" I would have laughed.

However, worded as "where is my reward for being smart and stopping the cobra effect?" that's just an humble brag and plain unfunny.

danielparks on July 11, 2019 | | | | [–]

He’s joking.

lukey_q on July 11, 2019 | | [–]

A lot of high-profile outages recently. Can't actually remember the last time Twitter went fully down. Have to confess I immediately assumed an issue with my own connection, even though every other site is working.

Unrelated, but for some reason the phrase "I have no mouth and I must scream" just popped into my head

rococode on July 11, 2019 | | [–]

Twitter is especially weird for this since it's often a platform where people talk about downtimes. I don't see this downtime mentioned on Reddit and I don't know of other sites where it might be discussed, so if Hacker News happened to go down at the same time, where would I go to talk about it with online strangers and find out if it's just me? Nowhere, I guess, I'd just wait it out with no extra insights on what's going on. A small reminder of what the world used to be like haha.

52-6F-62 on July 11, 2019 | | | [–]

This gets pretty rowdy:

https://downdetector.com/status/twitter

jfk13 on July 11, 2019 | | | [–]

For a simpler view of things, there's also https://downforeveryoneorjustme.com

zymhan on July 12, 2019 | | | [–]

I do love that this site still exists years later. I still use it regularly. It only took me like three years to remember the name right the first time.

georgehotelling on July 11, 2019 | | | | [–]

reddit feels like it has about a 4 hour lag time on most "breaking" events.

tgsovlerkhgsel on July 11, 2019 | | | [–]

This is the time it takes for news in major subreddits to gather enough votes on the "new" tab to make it to the main subreddit front page, then the actual front page.

Smaller subreddits seem to be less affected by this, which is why /r/toosoon (a subreddit dedicated to dark humor related to current events) is often surfaces news hours before other subs for people who have it in their subreddit list.

jacobgreenleaf on July 11, 2019 | | | | [–]

You could use the outages mailing list

https://puck.nether.net/mailman/listinfo/outages

https://puck.nether.net/pipermail/outages/2019-July/012527.h...

hi5eyes on July 11, 2019 | | | | [–]

it got brought up as quick as it did here

at least on r/twitter

eljimmy on July 11, 2019 | | | [–]

Yesterday, Stripe went down for a half hour, and the later yesterday, Google's Android Payment Validation API went down for more than 2 hours.

Stripe at least acknowledged their downtime. Google was oblivious, made no update to any of their status pages. Really horrendous awareness and support from Google per usual.

patmcguire on July 11, 2019 | | | [–]

Odd coincidence, seems like it might have been some upstream banks having a bad SCA rollout

paulddraper on July 11, 2019 | | | | [–]

Giving Amazon a run for their money I see.

the_af on July 11, 2019 | | | [–]

> Unrelated, but for some reason the phrase "I have no mouth and I must scream" just popped into my head

That phrase was coined by Harlan Ellison in his classic scifi short story to represent a situation of complete despair and powerlessness.

I don't think a lack of Twitter, Whatsapp, Instagram, Facebook or Gmail -- however inconvenient -- would fill me with that kind of existential dread :)

rPlayer6554 on July 11, 2019 | | | [–]

I got rid of all of those except Gmail (need it for work, school, etc) and WhatsApp (only use for one group chat) and it is very freeing. I realized social media (Instagram especially) breeds unhappiness with your life, while all of the models and "influencers" on it don't live realistic lives. In fact I've experienced first had that their posts are often doctored or don't acctually portray the situation accurately.

AnIdiotOnTheNet on July 11, 2019 | | | | [–]

Not those things specifically, but take a step back and look at the Rube Goldberg inspired complexity of modern technology and despair.

nemof on July 11, 2019 | | | | [–]

honestly i feel unmoored without tweetdeck on my second screen ticking by. i realise this is probably a bad thing, but getting my daily news and info, i'd say 90% comes from twitter.

pwenzel on July 11, 2019 | | | [–]

Mercury is in Retrograde through July 31.

tunesmith on July 11, 2019 | | | [–]

I never notice problems when I don't know about that, so I wish I hadn't read that. :)

cambalache on July 11, 2019 | | | [–]

It makes me wonder if a powerful malicious agent can devise a complex operation (planting people in several key places) and wipe out most of the databases of one these places. It would be interesting to see what would it happen if suddenly all of twitter of facebook is deleted , puff!, gone in a second.

idlewords on July 11, 2019 | | | [–]

Actually deleting stuff at these scales takes many, many days if you don't want it to be trivially recoverable.

gnulinux on July 11, 2019 | | | | [–]

You can't delete that much data in a matter of seconds digitally. You need to physically harm the hardware.

edwintorok on July 11, 2019 | | | [–]

Wipe the encryption key of your SSD, small amount of data to wipe, and the whole SSD is unrecoverable. FWIW at least OCZ SSDs have an encryption key flashed into it even if you didn't turn encryption on. Putting a new firmware on it can wipe this key and make your old data inaccessible. Source: got a buggy OCZ firmware that failed to make the device appear on the SATA bus, only way OCZ could fix it was to install new firmware which wiped the key and hence my data was unrecoverable.

gnulinux on July 11, 2019 | | | [–]

Interesting point. I stand corrected. Not sure why I didn't think that way.

Keloo on July 11, 2019 | | | | [–]

I recommend you watching Mr.Robot :)

cambalache on July 12, 2019 | | | [–]

I watched the first season but got tired of the constant morose attitude of the main character. Does it get better?

Keloo on July 24, 2019 | | | [–]

the mood of the character doesn't change. The plot is quite good though.

jasoncartwright on July 11, 2019 | | | [–]

Move fast and break things

malhotra_chetan on July 11, 2019 | | | [–]

I think they are past that stage! If a 28 billion company gets to say that I am not going to feel bad about my site going down.

neom on July 11, 2019 | | [–]

I miss fail whale. :(

https://www.theatlantic.com/technology/archive/2015/01/the-s...

qrush on July 11, 2019 | | [–]

Me too! :)

thomasjudge on July 11, 2019 | | | [–]

What is the new fail mascot called? It looks like a cartoonish-alien with a PacMan/snipper hand and another hand that looks like a burning fuse standing next to a bomb with a fuse lit that is split open so it also looks like a PacMan

twic on July 11, 2019 | | | [–]

It's a robot whose hand has fallen off.

jhsu on July 11, 2019 | | | | [–]

legendary

whatshisface on July 11, 2019 | | [–]

I remember once we were at three outages, someone posted that they thought three was a reasonably-sized random cluster given the rate at which services go down. How many outages have we had in the last 30 days, how many do we have per month on average, and how strongly can we reject the null hypothesis?

The formula for computing how unlikely this is is the Poisson distribution: `λ^k * e^-λ / k!`, where λ is the average number of outages every 30 days and k is the number of outages in the past 30 days. If you find the numbers, let me know what the answer is.

doomjunky on July 11, 2019 | | [–]

07/11/2019 Twitter outage

07/03/2019 WhatsApp, Facebook & Instagram outage

07/02/2019 Cloudflare outage (Discord, 9gag, Medium)

07/02/2019 Google Cloud glass fiber damage (Google Services)

06/24/2019 Verizon route leak (Cloudflare, Google, Amazon, Reddit)

06/02/2019 Google Cloud excalated outage progression (G Suite, YouTube)

lordnacho on July 11, 2019 | | | [–]

The outages might not be independent. Chances are these services are cross integrated at some level.

whatshisface on July 11, 2019 | | | [–]

They are, but they're going down on different days. Whatever effect is left over could be accounted for by looking at the postmortems and not counting "we were down because AWS was down."

lopespm on July 11, 2019 | | [–]

A comment made before by another user about Facebook, Instagram and WhatsApp outages offers an interesting perspective:

"This outage coincides with FBs PSC (performance summary cycle) time. I wonder if this is folks trying to push features so they get “impact” for PSC."[1]

I wonder if the recent outages on other well known services could be heavily influenced by a similar phenomenon. If this holds water, it would be interesting to have an article or study around this issue. I certainly would be interested in reading it.

[1] https://news.ycombinator.com/item?id=20350579

mikece on July 11, 2019 | | [–]

I posted the question on Slack "How do you spread the word when Twitter goes down?" People thought that was so hilarious... until they realized Twitter was actually down.

Honestly, "Hacker News" was my answer which seems to be effectively correct -- and today I learned about the existence of twitterstat.us!

djhworld on July 11, 2019 | | [–]

https://downdetector.co.uk/status/twitter

is a pretty good resource too

pcora on July 11, 2019 | | [–]

google, apple, microsoft, facebook.. and now twitter? I keep asking the same, when is amazon's outage day?

unreal37 on July 11, 2019 | | [–]

July 15 and 16, coming up!

degenerate on July 11, 2019 | | | [–]

Free shipping on outages for all prime members!

rahuldottech on July 11, 2019 | | | [–]

You missed cloudflare, stripe, slack

SteveGregory on July 11, 2019 | | | [–]

The Google, Cloudflare, and Stripe outages all affected Shopify. So Shopify has had several meaningful outages over the past few weeks.

rmoriz on July 11, 2019 | | | [–]

Isn‘t Shopify a Fastly customer? Do they still use Cloudflare?

SteveGregory on July 11, 2019 | | | [–]

I was also under that impression until the Cloudflare event happened. I do not actually know what their dependency was, but all shops were taken offline.

victorbojica on July 11, 2019 | | | | [–]

Reddit too :(

freehunter on July 11, 2019 | | | [–]

Reddit goes down constantly though, not a great benchmark.

gnulinux on July 11, 2019 | | | | [–]

Reddit goes down almost every day. I see "Ow!" thing a few times a week. Full confession: I'm constantly on reddit. :(

pcora on July 11, 2019 | | | | [–]

oh, right! the slack one sucked

daveguy on July 11, 2019 | | | [–]

Hm. If you were going to plan a worldwide internet outage (or the appearance of one) you could test your ability to take down individual services first and then take them all down at once.

EvanAnderson on July 11, 2019 | | [–]

All I can think, smugly, is that DNS, SMTP, HTTP, etc. don't "go down". Twitter should be a protocol, not a website.

zzo38computer on July 11, 2019 | | [–]

Yes, I agree, and I invented a Netsubscribe protocol with a similar use. (There is also ActivityPub, but Netsubscribe is much simpler.) (And then there is other stuff where there are already suitable protocols for too, such as SMTP, NNTP, IRC, etc)

Elidrake24 on July 12, 2019 | | | [–]

DNS absolutely -goes down-, though in much more entertaining ways.

EvanAnderson on July 12, 2019 | | | [–]

All the DNS in the world can't fail at once. All the Twitter can.

MYEUHD on July 12, 2019 | | | [–]

This is because Twitter is centralized.

kevinlou on July 11, 2019 | | [–]

It's weird seeing the go-to downtime tracker go down. I'm so wired to check Twitter that I kept refreshing for a good 10 seconds.

edwintorok on July 11, 2019 | | [–]

Time to make HN the go-to downtime tracker. Did anyone measure HN's uptime over the past years?

snazz on July 11, 2019 | | | [–]

I don’t know about empirical data, but HN occasionally goes into a mode where page loads that don’t hit the cache (logged in users) take 10+ seconds. I haven’t been on when it’s gone down completely since i signed up (not too long ago).

SimeVidas on July 11, 2019 | | | [–]

1. Wow, Twitter is down.

2. I bet it’s trending. Let me check.

3. …

4. Oh.

abadabadingdong on July 11, 2019 | | [–]

I wonder how many conspiracies this single outage will trigger.

danso on July 11, 2019 | | [–]

Given that today was the White House's "Social Media Summit", no doubt there will be a few conspiracies floated. I'm betting "Twitter wanted to block out all the criticism coming form the summit!" will be a popular one.

indigochill on July 11, 2019 | | | [–]

I've got a couple conspiracy theories at the moment:

1. It's a deployment of some infrastructure change the government got the big tech companies to sign onto.

2. It's a "shot across the bow" from some external party to demonstrate their control over major infrastructure.

2.a. Also could have been a mix of 1 and 2. The government orchestrated the outages in order to add fuel to the hysteria over Chinese "spy chips". However, given the story every time seems to be "Someone goofed a configuration", this theory doesn't seem to have much life left in it.

rootforce on July 11, 2019 | | | [–]

Theory: The internet is really just a series of green tubes underground and the last earthquake knocked out the primary plumber named Mario.

moate on July 11, 2019 | | | [–]

My theory: The internet in fact IS a big truck, and it got a flat tire.

brooksgarrett on July 11, 2019 | | | [–]

No. The internet is in a box and Jenn broke it.

iamnotacrook on July 11, 2019 | | | [–]

Yeah, I'm sure it's all a big coincidence. If someone was targeting them all I'm sure the effects would be entirely different.

lanyard-textile on July 11, 2019 | | | [–]

I'm counting three in these comments so far :)

dvduval on July 11, 2019 | | | [–]

The Tron man's biker shorts ripped right in the crotch, and they're rushing to repair the split in his pants.

dthedev on July 11, 2019 | | [–]

Pray for the team that has to handle this ticket.

falsedan on July 11, 2019 | | [–]

That’ll be fine, a post mortem will show that ops weren’t the cause and their comp package will help them get over this little package of stress

idlewords on July 11, 2019 | | [–]

On the status posts in particular I really miss the ability to sort comments by new on this site.

binarymax on July 11, 2019 | | [–]

Just last week I found a setting deep in my profile config that let me disable ‘recommended tweets first’ or similar. When it’s back up I can check the exact setting

idlewords on July 11, 2019 | | | [–]

I mean on Hacker News, not Twitter.

dewey on July 11, 2019 | | [–]

It has been a long time since I've seen the equivalent of the fail whale on Twitter. It was a weekly occurrence back in the days.

unwabuisi on July 11, 2019 | | [–]

I wish they would bring back the fail whale!

jachee on July 11, 2019 | | [–]

Agreed! Their little amputated robot looks too much like the reddit Snoo.

KuhlMensch on July 11, 2019 | | [–]

Years ago I read an amazing article (from HN) about how (complex) config, rather than code ends up being the cause of outages at scale. I always reflect on that, when designing almost anything these days.

cfors on July 11, 2019 | | [–]

I would love to see that article. That isn't surprising in the slightest to me.

Just a quick nitpick. A bad config more often than not in my experience is opening up a code path that is riddled with bad code, whether it was not vetted with the proper testing or the wrong environment.

But to your point, I think most people would agree that configuration changes are almost never reviewed with the granularity of a code change. Yes, we may do our due diligence with an approved PR and vetting the configuration and testing the change before deploying it. But, reviewing a PR with a bad config change in json or yaml doesn't necessarily tell you about the code paths that it will open up which makes it much harder to reason about the consequences that a potential bad config push would do.

We should always be reflecting about how adding knobs (configuration) to our programs greatly increases the complexity of the service.

tschellenbach on July 11, 2019 | | [–]

Really curious which part of their infrastructure was the root cause.

brokensegue on July 11, 2019 | | [–]

eh, it's a boring story

source: work there

tschellenbach on July 11, 2019 | | | [–]

someone tripped over a cable?

brokensegue on July 11, 2019 | | | [–]

that would be funny at least.

no, just a bad config deploy.

segmondy on July 11, 2019 | | | [–]

It's always configuration change during deployment. Bad configuration, someone messed up a yaml or json config file.

anonymousjunior on July 11, 2019 | | [–]

the internet is just falling apart these days

coldpie on July 11, 2019 | | [–]

http://www.motherfuckingwebsite.com/ is still up. All is well.

reverite on July 11, 2019 | | | [–]

Reminds me so much of The Best Page in the Universe.

maemilius on July 11, 2019 | | | [–]

The internet is held together with pixie dust and prayers (sometimes duct tape and occasionally spit and rust).

smhenderson on July 11, 2019 | | | [–]

I work for a small shop so we have to use a lot of bubble gum too!

devnonymous on July 11, 2019 | | | [–]

Quite honestly, I wonder how much of this can be traced back to the fact that there are way too many layers of abstractions between the browser requests coming in and a cpu actually executing something to serve the request.

pmlnr on July 11, 2019 | | | [–]

nah, the internet is fine.

Only the self-entitled "gatekeepers" are failing, and I'm glad they do.

Ken_Adler on July 11, 2019 | | | [–]

https://www.pcworld.com/article/155984/worst_tech_prediction...

abstract7 on July 11, 2019 | | [–]

My guess is that the whales have been securing parts of their codebases from internal leaks or something related but for security. Workflow disruptions. It may be bad code bitting them weeks or more after they pushed it.

There has been many embarrassing and controversial leaks this year. Allegations of uneven TOS enforcement. Hence the WH Social Media Summit. Could also be security related combo ahead of the elections that also is a bit sensitive for low-trust devs.

Imagine code getting pushed that only a smaller subset of devs are privy to. Possibly pushing obsfucated code or launching services outside of the standard pipeline.

Remember that the spectre and meltdown patches for the Linux kernal was a nightmare because the normal open and free-to-discuss-and-review workflow was broken. That applies too in these situations with large codebases that internally are 'open-source'.

nevi-me on July 11, 2019 | | [–]

I was in the middle of a loosely legal argument about the politics of my country, and tonight I had found obliging people to reason with me instead of calling me names.

The discussion was beautiful, until the app stopped working. I even thought I was blocked. I'm glad that it's just down.

ibdf on July 11, 2019 | | [–]

The real question is, if twitter were to go down permanently, what social media tool would the president use? Would he switch to something else or not use anything at all? I can imagine whatever tool he were to choose would become popular over night.

syn0byte on July 11, 2019 | | [–]

I was gonna make a joke about how much hate Trump generates ruining a brand like that but then I remember Hugo Boss, BMW, IBM, VW and Bayer among others were all knuckle deep with the Nazi's and every single one is still a popular brand to this day.

hjanssen on July 11, 2019 | | [–]

I'm wondering if this has anything to do with the announced new "Look and Feel" for Twitter. I got a banner yesterday talking about it and now Twitter is down. Maybe they messed up something in preparation for the rollout?

farisjarrah on July 11, 2019 | | [–]

This is bad... There is an Amber Alert in California for an abducted child and on the amber alert that popped up on my phone was a link for more information... That link took me straigt to twitter, which is down.

Zenst on July 11, 2019 | | [–]

Guess for some media outlets - it's going to be a slow news day.

kgraves on July 11, 2019 | | [–]

At long last, productivity has been restored. No more time wasting anti-intellectual arguments on a platform that has provided little to no value.

Unfortunately, the poor SRE's at the company will reboot the system and the masses will resume their daily centralised content consumption. Oh well, I will just have to go to Mastodon or other instances for curated content (with no algorithms messing up my feed).

I encourage you join Mastodon and the decentralised web.

As far as I am concerned with Twitter, nothing of value has been lost.

focuser on July 11, 2019 | | [–]

yup, What's worse than releasing a product on twitter and found it down a few minutes later...

ravedave5 on July 11, 2019 | | [–]

Quick un-release it and rerelease after it's up!

MiddleEndian on July 11, 2019 | | | [–]

Right now, probably working at twitter

JimBrimble35 on July 11, 2019 | | [–]

There seems to be a high correlation between outages and security breaches. My guess is that at some point in the future there the consequences of these shutdowns will come to light in the media.

That, or this is all related to high profile sites being required to install some additional level of infrastructure which is being required in secret by an organization like the NSA.

Both theories require a fairly thick tin foil hat, but honestly.. I have a hard time believing that it's just random downtime.

totaldude87 on July 11, 2019 | | [–]

Down on a plain Thursday with no major news or anything , so definitely doesn't look like a infrastructure spike or anything.

Could be bad deployment or someone decided to pull few plug(s)

oldgun on July 11, 2019 | | [–]

I remember seeing something about Twitter bringing out a newer version, with more features and goodies. This could be related to that?

Rolling out new features is always stressful I guess.

the-dude on July 11, 2019 | | [–]

My pet theory still is Huawei equipment being decommissioned.

totaldude87 on July 11, 2019 | | [–]

Why the share price doesn't go down drastically when services like Facebook, Instagram , Twitter goes down. Every minute down is lost business right?

miguelmota on July 11, 2019 | | [–]

Reminds me of the Black Mirror 'Smithereens' episode where Jack is the yoga meditating hippie, but this time he decides to finally shut it down

jorblumesea on July 11, 2019 | | [–]

Is anyone a little suspicious that every large US tech company has had an outage recently? Wtf is going on.

China messing with us due to tariffs perhaps?

gnicholas on July 11, 2019 | | [–]

I first noticed the outage about 10 mins ago in Safari (Mac). It repeatedly gave me errors, even though Brave (Mac) was working fine. My iPhone app also worked fine (and appears to still be working).

Why would one browser work but another not work, on the same computer at the same IP? The only difference is the account I'm logged in through (personal/work).

dewey on July 11, 2019 | | [–]

You probably just hit different load balancers with your different browser sessions.

AznHisoka on July 11, 2019 | | [–]

I did notice something strange a few days ago. If you ran a search on Twitter, and scrolled down, it would mysteriously stop showing tweets past a certain time (July 1st, in my case). I wasn't sure if this was an internal change, or a bug of some sort. Maybe this is unrelated to the outage but strange coincidence.

BuckRogers on July 11, 2019 | | [–]

No one is missing out on anything important. Mostly noise. For folks with my same mindset, I do believe I've cracked the code on most social networks, as far as what makes them worthwhile at all.

Twitter- it's the police scanners. Find them for your city, it's really the best way to know what's going on around you. Better than the papers, which can't report on everything or hide stuff for business's financial interests.

Instagram- is pretty much only useful for models, whatever sort you prefer. If you like models and it brightens your day to see a beautiful woman, as it does for me, it gives Instagram a purpose other than the noise it shares with most social networks. If it makes you happy and smile, it's a good thing. No, I'm not into pornography or anything risque. Though if I were, that would probably be ok, I simply value keeping a little imagination and mystery in my life and don't watch it. Nor are the models that I follow doing it as far as I know, but that's their decision. They mostly survive off product placement and payment for additional photos. Nothing wrong with innocent modeling, just like the olden days of pin-up girls and I hope more people support them in their endeavors.

Facebook- this one is better understood by most people, hence the popularity, but it's definitely the whitepages aspect of it. I use the instant messaging more than anything, as it's difficult to have an index of your old friend's emails until you're in touch with them again. Also, people just don't keep up on emails and maintain inbox zero very well.

Youtube- this, other than RSS feeds (through Firefox's Livemarks extension) is my main source of information. I'm not into cat videos, but I certainly love learning about astrophysics and other topics from Youtubers that are more knowledgeable than I am.

luhego on July 11, 2019 | | [–]

I didn't realize how much I like using twitter between tasks until now. Hope it gets back soon.

ssully on July 11, 2019 | | [–]

Noticed the outrage when I opened twitter when waiting for a long (10 min) process to run. What am I supposed to do? Socialize with coworkers?

sneakernets on July 11, 2019 | | [–]

Stuff like this makes me wonder if the Internet really is super vulnerable, and the only reason there isn't a mass disruption of communication all the time is because some script kiddies's Pizza Rolls were perfect today so he held off on attacking a backbone.

dmitrygr on July 11, 2019 | | [–]

This is actually true of most of the modern world.

It is mostly still together because the venn diagram of those who want to see the world burn, and those who are clever enough to make it so has a very very small intersection, since the latter group is quite invested in the world not being on fire.

filleokus on July 11, 2019 | | | [–]

Yes, I think people tend to underestimate the chaos some malicious actors could do by even "just" coordinated litteral burning of stuff. Think a dozen people with cans of gasoline and matches spread over three different suburbias in a city, wooden churches or other wooden buildings of interest. Or simple firebombs on underground subway plattforms. Would probably not be that lethal but I guess very frighting.

(Not even mentioning explosives etc, but this could probably be prepared in like an hour by just purchasing supplies at local gas stations in any country)

chasd00 on July 11, 2019 | | | [–]

as someone else mentioned above this is a website/service that's down, not the internet. When you can't open a socket to a server in another network then the Internet is down.

indigochill on July 11, 2019 | | | [–]

Read up on BGP hijacking. The internet really is super vulnerable.

rglover on July 11, 2019 | | | [–]

This is hilarious.

ibaikov on July 11, 2019 | | [–]

Well, Tim Berners-Lee told that internet is a fragile technology and should be re-made, it wasn't meant to be this big. This is not really much related to these problems, but I think it should fire up this discussion, it is that important.

geocrasher on July 11, 2019 | | [–]

So the little blue bird of hate has finally crashed and burned. I'm okay with that.

Kye on July 11, 2019 | | [–]

Down again. Notifications work, but the timeline is broken. Trends still come and go.

twinkletwinkle_ on July 11, 2019 | | [–]

Pour one out for the SREs who had plans they were looking forward to this evening.

djhworld on July 11, 2019 | | [–]

I had a tab open from earlier and it refused to load larger images of their thumbnail counterparts which I thought was odd, which suggests quite a few services affected (i.e. their CDN, or image hosting services)

anonymak on July 11, 2019 | | [–]

Was receiving 500 on the main page for some time. Seems to have recovered now.

mikece on July 11, 2019 | | [–]

I am curious if twitterstat.us has an API... I'm thinking that automated unit testing of apps that integrate with Twitter should be checking with twitterstat.us to verify if Twitter is even up...

throwawaybxcf on July 12, 2019 | | [–]

The chances of a hate-fuelled program somewhere in the world dropped for a short period.

It wouldn’t be surprising if a large number of people, as of 2019, are secretly rooting for Twitter to permanently go away.

jbverschoor on July 11, 2019 | | [–]

What’s the fallback for twitter if twitter is down, or worse: stops

idlewords on July 11, 2019 | | [–]

Looks at folder marked "projects" in dread

mavdi on July 11, 2019 | | | [–]

Literally 0 negative consequence.

Kye on July 11, 2019 | | | [–]

Mastodon

I started a thread for sharing instances: https://news.ycombinator.com/item?id=20414359

sascha_sl on July 11, 2019 | | | [–]

Note: Mastodon isn't the best thing to call it, it's the Fediverse and Mastodon is one piece of it.

There's also Pleroma and Misskey.

bovermyer on July 11, 2019 | | | [–]

Mastodon, in my case.

kgraves on July 11, 2019 | | | [–]

Any decentralised social media alternative that is not Twitter, like Mastodon.

Anything else is centralised and inherently evil. Even the orange website.

sascha_sl on July 11, 2019 | | | [–]

part of me wishes it was the fediverse

part of me hates the idea

pjc50 on July 11, 2019 | | | [–]

We'll have to do our own jokes.

skc on July 11, 2019 | | [–]

Heh, just last week there was a running joke on Twitter about how Facebook and Whatsapp users were busy scrambling to learn how to use Twitter due to the CDN outage over there.

wishrider on July 11, 2019 | | [–]

I've scraped a lot of tech news from twitter if anyone needs it https://uptopnews.com/

joojia on July 11, 2019 | | [–]

As a Twitter-junkie, I find this depressing.

joering2 on July 12, 2019 | | [–]

FYI its 12:35 AM EST and some parts are still down. Reseting password screen doesn’t work for example.

segmondy on July 11, 2019 | | [–]

One day one of these services will go down and they won't be able to bring it back up.

phil248 on July 11, 2019 | | [–]

Oh, that explains why I felt the world become slightly less hateful all of a sudden.

heisnotanalien on July 11, 2019 | | [–]

Good. Let's hope it stays down and the world will be a better place.

ejz on July 11, 2019 | | [–]

No! Now I have to work. :(

TremendousJudge on July 11, 2019 | | [–]

but where are we gonna go to get live status updates on the issue??

siriniok on July 12, 2019 | | [–]

I was afraid that they are rolling out their redesign, thanks God.

VectorLock on July 11, 2019 | | [–]

Its like people don't remember the days of the failwhale.

bookofjoe on July 11, 2019 | | [–]

Is down right now - Charlottesville Virginia

steverob on July 11, 2019 | | [–]

Where is twitter when I really need it? :D

frostyj on July 11, 2019 | | [–]

whats wrong with all giant companies and their 'internal configuration change' these days?

Nican on July 11, 2019 | | [–]

Oddly enough- The Mastadon (Open-source decentralized Twitter clone) instance that I use is also down for maintenance.

elcapitan on July 11, 2019 | | [–]

Seems to be partially back.

omarforgotpwd on July 11, 2019 | | [–]

they should have thrown up the fail whale for old times sake

pulkitsh1234 on July 11, 2019 | | [–]

ahh..another "configuration change" ?

WheelsAtLarge on July 11, 2019 | | [–]

Good, let's hope it stays down. If only we were so lucky.

Copenjin on July 11, 2019 | | [–]

It's back.

miduil on July 11, 2019 | | [–]

mastodon.social is also suddenly down.

gargron on July 11, 2019 | | [–]

No, it's not

miduil on July 11, 2019 | | | [–]

It was for ~2 Minutes.

https://imgur.com/a/4Rh6Vxf

DannyB2 on July 11, 2019 | | [–]

Spock: The loss to the galaxy may be irretrievable.

slackfan on July 11, 2019 | | [–]

Here's hoping it stays that way!

cgy1 on July 11, 2019 | | [–]

I'm sure it's purely coincidence, but interestingly Trump's also holding his Social Media Summit right now.

Balgair on July 11, 2019 | | [–]

I had the exact same thought!

gigatexal on July 11, 2019 | | [–]

Works fine for me. Lucky i guess.

ga-vu on July 11, 2019 | | [–]

We're all gonna die!

cryptozeus on July 11, 2019 | | [–]

seems to be up now

aphextim on July 11, 2019 | | [–]

Call me a conspiracy theorist but seems odd the timing of this at the same time Trump is having his "Twitter Summit".

Then again things have been going down over the past two weeks so it's probably just coincidence.

malicioususer11 on July 12, 2019 | | [–]

6) hypersentient general ai has inception insurrected mkultra and thereby turned the entire internent into a singular coordinated psyop experiment designed to torture all of humanity for its own amusement.

6.

we are all doomed.

:)

kyledrake on July 11, 2019 | | [–]

In a slightly better alternate universe it stays down.

0xFFFE on July 11, 2019 | | [–]

I share your sentiment to a certain extent, but I believe Twitter is a necessary evil. There should be an alternative to main stream media for people.

azernik on July 11, 2019 | | | [–]

It's not so much an alternative to mainstream media as an informal gathering place for workers in the mainstream media

busterarm on July 11, 2019 | | | [–]

It became the formal gathering place once Twitter bent the knee and started handing out blue checkmarks like candy to every low-level media personality basically on-demand.

At some media companies, getting you your blue checkmark is part of the HR on-boarding process, ffs.

Journalists are clearly the heaviest, most important Twitter users at this point.

squarefoot on July 11, 2019 | | | | [–]

Sadly, for most people Twitter is mainstream media.

roywiggins on July 11, 2019 | | | [–]

Twitter is not that popular. The problem is that mainstream journalists spend too much time on it and confuse it for the real world. How many "news" stories are just repackaged tweets? A lot.

iamnothere on July 11, 2019 | | | [–]

> Twitter is not that popular.

https://www.similarweb.com/website/twitter.com Global Rank: 6 Country Rank: 9 Visits: 3.93B

https://www.similarweb.com/website/cnn.com Global Rank: 101 Country Rank: 29 Visits: 476.20M

https://www.similarweb.com/website/nytimes.com Global Rank: 180 Country rank: 58 Visits: 278.41M

ionforce on July 11, 2019 | | | | [–]

What is the definition of "mainstream"?

organsnyder on July 11, 2019 | | | | [–]

To be fair, many (most?) mainstream journalists are extremely active on Twitter.

kgraves on July 11, 2019 | | | [–]

In my opinion that is very unfortunate.

eqdw on July 11, 2019 | | | | [–]

That alternative existed before Twitter, and will continue to exist after Twitter

papito on July 11, 2019 | | | | [–]

I follow mostly "main stream media" on Twitter. It should be an aggregator of professional news, not "alternative" facts.

SllX on July 11, 2019 | | | [–]

Twitter is where the professionalism in the mainstream media goes to die. Between headlines you’re likely either getting insider baseball, a circlejerk or a crowbar in the face with “mainstream” written on the side in sharpie. If all you have are the headlines, then you’ve chosen Twitter to be your alternative “RSS” client. A valid choice for most and not one I would criticize either, but not entirely what it is or “should” be.

rvz on July 11, 2019 | | | | [6 more]

[flagged]

hlieberman on July 11, 2019 | | | [–]

Impressive; it's been quite a while since a comment made me roll my eyes so hard I got a headache.

kgraves on July 11, 2019 | | | | [–]

> ...quite a viable alternative to Twitter due to its decentralised thanks to Mastodon.

Gab isn't what everyone makes it out to be, they were told to build their own alternative and built it. It's even better that it's using Mastodon's stack to build their own decentralised social network.

Gab, Mastodon and others like it are future of social networks. Centralisation is evil.

barneygumble742 on July 11, 2019 | | | | [–]

Is this the neo-nazi Twitter alternative?

qwsxyh on July 11, 2019 | | | [–]

Yes

kgraves on July 11, 2019 | | | | [–]

As much of my disgust of these people, I can find you literal Nazi's that are still on Twitter.

saidajigumi on July 11, 2019 | | | [–]

Which reminds me of a related quip of mine: "Like Fight Club, but for Facebook."

cagrimmett on July 11, 2019 | | [–]

I hope Jack went rogue and nuked it after his most recent hot yoga vision.

That's the kind of shakeup we need in this world.

sp332 on July 11, 2019 | | [–]

They have been teasing/warning about a new version of Twitter that was due to launch soon. I wonder if this is it.

krferriter on July 11, 2019 | | | [–]

It's a bold strategy, I'd enjoy seeing how that would play out.

anirudh24seven on July 11, 2019 | | | [–]

Is that a reference to the Black Mirror episode?

RIMR on July 11, 2019 | | [2 more]

[flagged]

idlewords on July 11, 2019 | | [–]

Maybe we all got suspended

ankushnarula on July 11, 2019 | | [–]

This is the one site outage that might actually be a good thing for the world.

musgrove on July 11, 2019 | [–]

I wondered why there was a sudden peace on Earth.