Apple cloud services outage

octernion · on June 2, 2016

Days when multiple outages happen always remind me of this wonderful article: http://www.stilldrinking.org/programming-sucks

mxuribe · on June 2, 2016

I've never read this til now...it. is. absolute. genius!!! I wish i could give you multiple upvotes for this. Thanks for sharing this!

martinp · on June 3, 2016

You should also read James Mickens' USENIX articles if you haven't: http://mickens.seas.harvard.edu/wisdom-james-mickens

The humour and tone is similar to the "Programming Sucks" article. I recommend starting with "The Night Watch".

Fej · on June 3, 2016

I've read that before, and even have it bookmarked, but it was still worth it to read it again. It's a masterpiece.

...that's something scary to say about the tech industry, right there.

akulbe · on June 2, 2016

Maybe I'm crazy, but it seems too coincidental that multiple services

(Amazon, Facebook, Apple... that I know of, so far)

have been affected, on the same day.

zalmoxes · on June 2, 2016

It's related to Akamai DNS.

kornish · on June 2, 2016

Do you have a source?

zalmoxes · on June 2, 2016

Trying to access some APIs I work with (DEP,V PP) earlier today and seeing 'akadns' as the common providrer.

geerlingguy · on June 2, 2016

Charter (ISP) was also out for over an hour—first time this year it was a complete outage and not just a DNS issue.

feld · on June 2, 2016

I've been on Charter all day VPN'd into work with no interruption. Although I have a business connection at home, so it's possible I'm avoiding whatever middleboxes they jam residential traffic into.

vonklaus · on June 3, 2016

my ISP Cox, had server failure for several hours about 2 weeks ago. It was a DNS issue as well. I don't want to take 2 unrelated things and call them similar, but one has to imagine if some of the above posters are substantiated several huge players losing peering/connectivity and 2 very large ISPs having non-trivial multi-hour multi-region DNS outages is bad. Horrible coincidence.

I have been losing google docs intermittenly all day (and I have been working nearly exclusively in google docs all day) maybe just me, but I have been unable to upload photos (same one as yesterday to gdocs) and from drive/docs dash I would lose connectivity and have sever lag loading up a doc I just worked in. Had to close browser several times.

Keyframe · on June 2, 2016

Maybe one of the backbone carriers has a bad day?

jlgaddis · on June 2, 2016

Perhaps, but I (network engineer at an ISP) haven't heard anything like that, and all of these services have multiple upstreams and private peerings.

astrodust · on June 2, 2016

Comcast has as few peerings as they can legally get away with. Redundancy isn't high on their list of things to care about, apparently.

jlgaddis · on June 3, 2016

> ... as they can legally get away with.

Can you clarify a bit? I'm not sure what you mean.

An ISP isn't, by law, required to peer with anyone.

astrodust · on June 4, 2016

If they didn't peer with anyone they'd be a closed network and it would be illegal to advertise that as "internet access".

They're legally required to live up to the claims of their product. If they say X Mbit/s then there needs to be at least a plausible case made that they can achieve those results. Anything significantly less is to invite a class-action lawsuit.

Karawebnetwork · on June 3, 2016

Well that was strange. Wonder what it was.

draw_down · on June 2, 2016

cookiecaper · on June 2, 2016

New NSA program rolling out? O_O

kchoudhu · on June 2, 2016

You laugh, but I guarantee you that the NSA has systems reliability issues just like the rest of us.

jrockway · on June 2, 2016

The last person that guaranteed me something about the NSA is now exiled in Russia ;)

cookiecaper · on June 3, 2016

This comment turned out to be surprisingly controversial for a pretty obvious joke. I've observed it at +4, down to +1, back up to +3, and now it's at 0! Heh. Didn't mean to rustle jimmies of any NSA contractors that may read HN. I'm sure in real life your rollouts are much smoother than this. :P

linkregister · on June 3, 2016

I'm sure the NSA employees and contractors are just happy to see non-threatening post about the agency.

The downvotes probably came from people tired of the joke. It is an oft-repeated joke.

draw_down · on June 4, 2016

Well, I agree it got too much attention, that's for sure.

sam_pointer · on June 2, 2016

No. That was the Syria/Iraq/Iran BGP outages from earlier today.

api · on June 2, 2016

Are they all on EC2? Maybe EC2 is throwing a fit.

luhn · on June 2, 2016

Amazon is, but Apple and Facebook both operate their own data centers.

Edit: I take that back, Apple uses several cloud providers, including AWS.

therein · on June 2, 2016

Apple does use S3 for storage. You didn't hear it from me.

ghshephard · on June 2, 2016

You just have to run littlesnitch, and see all the AWS servers that Apple applications hit. I don't think it's a secret that a ton of Apple applications are backended by AWS.

niels_olson · on June 3, 2016

All hail Little Snitch. So much insight available in that app. I wish there were rule sets for Little Snitch like there are for AdBlock Plus.

AckSyn · on June 3, 2016

lost -i in the Terminal.app will do this as well.

With a little grep-fu you could have yourself a good watchdog.

blowski · on June 3, 2016

I get 'command not found' so I'm assuming I have to install something? But I can't find anything when I search Google because of the generic name.

syassami · on June 3, 2016

I believe he meant to write `lsof -i`

blowski · on June 3, 2016

That works a lot better! Thanks both.

AckSyn · on June 3, 2016

oh i'm still not used to the autocorrect on OS X. Thank you for correcting my error.

blowski · on June 3, 2016

Indeed, I've read multiple articles about it, some of which seem to have been fed by the Apple PR machine:

http://www.zdnet.com/article/apple-cuts-aws-spending-signs-w...

RKearney · on June 2, 2016

They also use AT&T and Azure for storage too.

You did hear that from me, or anyone else with a firewall running on OS X.

conradev · on June 3, 2016

and Google Cloud Storage as well!

tedmiston · on June 3, 2016

...AT&T for data storage?

sethhochberg · on June 3, 2016

https://www.business.att.com/enterprise/Portfolio/cloud/

They're a massive company. When you already own telecom networks and run your own datacenters, selling cloud infrastructure services isn't too far fetched.

rconti · on June 2, 2016

They also use METRIC TONS of ultra high-end storage.

Actually that's not saying much, because a single cabinet is around a metric ton.

So it was unintentionally accurate.

nstj · on June 3, 2016

Apple discusses who it uses for cloud storage when discussing iCloud in their security guide (valid as of May 2016):

> The encrypted chunks of the file are stored, without any user-identifying information, using third-party storage services, such as Amazon S3 and Windows Azure.

[0]: https://www.apple.com/business/docs/iOS_Security_Guide.pdf

api · on June 2, 2016

BGP? Tier-1 issues?

alanh · on June 2, 2016

And also Azure

akulbe · on June 2, 2016

AWS status page is all green.

http://status.aws.amazon.com/

0xmohit · on June 3, 2016

I've heard that the Doppler effect https://en.wikipedia.org/wiki/Doppler_effect is responsible for AWS status page to show up green perennially.

On a serious note, why would AWS admit that it's services were not quite working well and subject itself to potential liabilities.

AWS reacts shamelessly even when your EC2 instance goes down on it's own. For a sample of how they react, see https://news.ycombinator.com/item?id=11822298

cookiecaper · on June 2, 2016

Just for the record, our company has found that Amazon occasionally omits incidents from their status page, though I doubt they could get away with omitting something this significant.

bigiain · on June 3, 2016

"Occasionally"? <laugh>

jhardy54 · on June 3, 2016

</laugh>

Holy shit, you were laughing for 28 minutes straight. Are you alright?

inopinatus · on June 3, 2016

You can tell who grew up in the XHTML era rather than SGML or HTML5.

bigiain · on June 3, 2016

Cause you know <laugh /> is just a gulp shortcut for <chuckle> </chuckle> (which is fully IE6 compatible)

jhardy54 · on June 7, 2016

XHTML 1.0 Strict all day.

adt2bt · on June 2, 2016

All of the cloud providers have an incentive to downplay incidents and only post them to the status page when they're really bad. The bar is pretty high to get something on there.

NeutronBoy · on June 3, 2016

I mean, they outline them in their fineprint. Amazon doesn't consider the unavailability of 1 EC2 availability zone an outage because their position is that you should be architecting around that. 2 or more AZ outages in the same region and it'll show up on their status page IIRC.

evanriley · on June 2, 2016

The only Apple thing that I know was on AWS was the beats1 radio thing when it launched, not sure if it still is.

ghshephard · on June 2, 2016

Photos uses AWS for storage.

koenigdavidmj · on June 2, 2016

Do these things ever go red?

I mean, I understand the disincentive to do this, but for something like AWS (where error icons are even more subtle than this) I care about accuracy way more than looking scary.

Analemma_ · on June 2, 2016

Red? Not bloody likely. We should be cheering Apple for even using yellow, most of the time when things are on fire we don't even get that. Instead we get "green, with the tiny blue 'i' in the corner". Status pages are a bad joke.

cookiecaper · on June 2, 2016

Yeah. Like it or not, the status page reflects on the brand identity and having a status page that reflects the real condition of the infrastructure is immensely harmful to sales. I've found that most status pages suppress information about important events more than they publish it.

I wonder if there's an "engineer's feed" out there somewhere that looks like mumbo-jumbo to the layperson but could actually be useful in allowing companies to disclose issues without the softer side of the company flipping their lids.

michaelbuckbee · on June 2, 2016

I really appreciate Heroku's Status page - https://status.heroku.com/

Separately, many of the Status pages out there setup with the Statuspage.io service also incorporate "Pingdom" like features where heartbeats and/or checks are automatically collected and notifications and updates set without intervention.

snuxoll · on June 2, 2016

Salesforce is one of the odd-balls that is 100% honest about issues with their service. One of their sandbox instances can have minor issues for 5 minutes (not often mission critical, just performance degradations) and I get an email - even for instances I do not use. I usually get a couple emails a day, and the trust.salesforce.com site shows the full history (including a colored timeline) of the availability of each individual instance.

It's nice to see they've carried on this same policy with Heroku, for all the flaws the company has they put their money where their mouth is.

artursapek · on June 3, 2016

Heroku's has always impressed me

yardie · on June 2, 2016

I deployed a status board once. I was forced by the higher ups to take it down. Something about it sending the wrong message to our clients. Sending out emails once an hour with status updates was way more productive.

bigiain · on June 3, 2016

I have the "internal status dashboard" (read: "notifications of individual instance or app downtime") for us, and the "external status dashboard" (which is the app status from the front of the load balancer) for the higher ups to show to customers.

yardie · on June 3, 2016

Our network monitoring software has a dashboard we use internally. Our external monitor was a Google Apps project that was manually updated. Never got far enough into it to automate it, yet.

chillacy · on June 2, 2016

Github is pretty transparent with theirs too: https://status.github.com/

All the numbers usually tell a story

BuildTheRobots · on June 3, 2016

For all their faults (and there are many), OVH's weathermap [1] and status page [2] are immensely good and seem surprisingly honest.

On the flip side, I've had a total inability to use EE in North Yorkshire for about 2 weeks and yet their status checker reports absolutely no problems in my area (whilst Level 2 support admits they won't have an engineer on-site until next Wednesday now).

Your suggestion for an engineer feed would be greatly appreciated; there's nothing worse than spending half a day trying to narrow down an issue only to find out it's a known and unreported fault... "to err is to be human, to lie about it is ruddy infuriating".

[1] http://weathermap.ovh.net/

[2] http://status.ovh.net/

lttlrck · on June 2, 2016

Rackspace seems pretty honest in my experience.

msisk6 · on June 2, 2016

No better than anyone else here really. The sad reality is that often these status pages are manually updated by the service teams themselves and when there's an issue you're busy trying to fix the problem, not update the status page.

Jyaif · on June 2, 2016

At least they have the red icons ready ( <span class="serviceoutage"></span> to see what it looks like)

colinbartlett · on June 2, 2016

Very occasionally:

https://statusgator.com/services/apple

(Side project of mine)

tuna-piano · on June 3, 2016

Really cool idea / site.

Most startups / side projects these days seem to use a cloud hosting service... But you're project is meant to work especially when the cloud hosting providers are down.

Where are you hosted? Or do you host on multiple services?

cmrx64 · on June 2, 2016

FYI: "An error occurred during a connection to statusgator.com. The OCSP server experienced an internal error. Error code: SEC_ERROR_OCSP_SERVER_ERROR"

colinbartlett · on June 2, 2016

Weird, thanks for the heads up. Will investigate.

avn2109 · on June 3, 2016

Quis custodiet ipsos custodes?

cookiecaper · on June 2, 2016

That's cool, but it'd be cooler if it did its own tests on what can be seen from the outside instead of exclusively mirroring data from corporate-run status pages, which can't be trusted.

colinbartlett · on June 2, 2016

There are a few services that do that, I was more interested in getting alerted when services reported problems.

Some find it useful, some don't.

vox_mollis · on June 2, 2016

Could be due to the limited nature of the outage. They state only some users are affected, which I can verify - App Store works fine for me, for instance.

_puk · on June 2, 2016

App store (UK) was down for purhases for me ~1hr ago, but seems to be back up (though asked for a password after restart)

einarvollset · on June 2, 2016

ITunes Connect and TestFlight in particular has been an absolute shambles for at least the last 3 months. Nobody there seems to give a shit or take responsibility (ranging from support up thru to the highest levels).

api · on June 2, 2016

Total tangent but: I just had a hideously bad experience trying to get something done with my developer account. It's not the first time. The IRS is easier to deal with than the app store.

If iOS is their vision of the future of personal computing, then the future of personal computing is a walled garden presided over by a bureaucracy about as functional as a badly run DMV office.

There is absolutely no excuse for this kind of terrible customer support from a company with this kind of money. Their good UX near-monopoly and success seems to be making them lax toward their developer community... much like Microsoft became.

wwweston · on June 2, 2016

> There is absolutely no excuse for this kind of terrible customer support from a company with this kind of money.

Given the attention given to customer experience in other areas, one assumes Apple knows how to treat their customers.

One reasonable conclusion is that they don't see developers as customers.

"Sharecropper" is probably a much closer match to what they see.

eliaspro · on June 2, 2016

Their developer portal is horrible. It sometimes feels like it's run by a 10 people backyard shop:

- doesn't allow to use password managers because their login forms are full of terrible JavaScript making sure nothing gets pasted

- using "security questions" for account recovery which could be easily "cracked" using social engineering

- SMS based 2FA

- max password length of 32 chars

- fixed keysize of 2048/RSA for APNS certificates, 4096 will be rejected

traskjd · on June 3, 2016

In fairness, 10 people would probably do a great job. It's either 1 person, or far far too many people :-)

Someone1234 · on June 2, 2016

In fairness to the IRS, they're actually REALLY easy to deal with. Or they used to be, I'm not sure how the budget cuts have impacted them but I found their CS quite good.

api · on June 2, 2016

Yeah, it's totally unfair to compare the IRS to the Apple developer program. That's low.

dragonwriter · on June 2, 2016

IME, recently, they are still really easy to deal with, but often glacially slow to act. (Which can be unnerving when there is an outstanding issue, since they periodically, automatically send out dire-sounding notices in the interim.)

mjcl · on June 3, 2016

I've still found them to be incredibly helpful when you get through to someone, but it can take an hour or two on hold. They're also helpful via mail correspondence, but letters seems to have a 30-45 day turnaround.

f0under · on June 2, 2016

Timely..been trying to apply a software update and download garageband for the last hour. Should've checked their status page. Guess, i'll wait it out.

Any relation between this outage and Amazon's from earlier?

smoreilly · on June 2, 2016

Unlikely but a few of their datacenters are down the street from one another so could be related. We'll know when/if they tell us the cause.

visarga · on June 2, 2016

> a few of their datacenters are down the street from one another

They have backup power systems and multiple backbone connections.

dantiberian · on June 2, 2016

Apple uses Azure and AWS for a lot of their compute. It's not unlikely that an Amazon outage could affect their systems too.

toomuchtodo · on June 2, 2016

But is in the process of moving to their own datacenters:

http://fortune.com/2016/02/02/apple-data-center-move/

retbull · on June 2, 2016

Unlikely they have their own data centers.

lallysingh · on June 2, 2016

"Unlikely, they have their own data centers" or "Unlikely [that] they have their own data centers" ?

dreamsofdragons · on June 2, 2016

It doesn't matter, neither is factually correct.

Karawebnetwork · on June 2, 2016

I'm also getting some bad outage with Facebook's services. www.messenger.com outputs an error and the facebook website has half the components missing.

pgrote · on June 2, 2016

As am I.

There might be something larger going on ... amazon.com product search is down again, too.

achalkley · on June 2, 2016

Yep, my notifications were jacked.

vikrantvm · on June 2, 2016

Same here, but seems better now.

sigzero · on June 2, 2016

I am getting that as well.

rdl · on June 2, 2016

I noticed my iTunes Cloud music wasn't working earlier. Reminded me how much better all the iCloud stuff has actually gotten over the past 2y.

twinkletwinkle · on June 2, 2016

Some poor SREs have a shitty night ahead of them.

thanatropism · on June 2, 2016

Docker too.

https://news.ycombinator.com/item?id=11822562

pi-squared · on June 2, 2016

What if they are relying on Docker https://news.ycombinator.com/item?id=11822562?

carthief · on June 2, 2016

thank god game center is still up!

snake_case · on June 2, 2016

Could this be related? Today we got an email from Amazon that our EC2 instance server has degraded hardware. We have until June 16 to spin up a new server before they kill this one.

Sanddancer · on June 3, 2016

No. An email like that would be regarding a single box giving hardware warnings, like a stick of RAM sending out more ECC correction alerts than their threshold for acting. It wouldn't be because of activities much higher than that.

DanielDent · on June 3, 2016

I've gotten emails from them about a degraded instance and having until date X to spin up a new server.

... only I had already spun up a new server. Because the existing instance wasn't working. Getting their system to terminate the old instance required several attempts and force termination before it succeeded.

Considering their scale, I would expect better handling of the edge cases.

jorblumesea · on June 2, 2016

We were hit by a few AWS DNS issues over the past few days. Almost certainly a freaky coincidence.

tehwebguy · on June 2, 2016

So this is why my AppleTV won't connect to Netflix / Hulu right now?

I want a refund.

brazzledazzle · on June 2, 2016

You need a connection to Apple to use third party apps you already have installed? If true that's ridiculous.

scwoodal · on June 2, 2016

I was watching a movie purchased via iTunes and it stopped playing due to the outage. I switched over to Netflix without any problems.

brazzledazzle · on June 3, 2016

Okay, it seemed a bit outrageous.

zepto · on June 2, 2016

It's not true

pbarnes_1 · on June 2, 2016

Might be an Akamai outage.

mxuribe · on June 2, 2016

"...Just kidding!" https://www.youtube.com/watch?v=ucdZHR75iCM

kelvich · on June 2, 2016

Migrated to FoundationDB? =)

gk1 · on June 2, 2016

For those wondering about context: Apple acquired FoundationDB last year. http://techcrunch.com/2015/03/24/apple-acquires-durable-data...