Hacker News new | past | comments | ask | show | jobs | submit login
Apple cloud services outage (apple.com)
242 points by DAddYE on June 2, 2016 | hide | past | favorite | 119 comments



Days when multiple outages happen always remind me of this wonderful article: http://www.stilldrinking.org/programming-sucks


I've never read this til now...it. is. absolute. genius!!! I wish i could give you multiple upvotes for this. Thanks for sharing this!


You should also read James Mickens' USENIX articles if you haven't: http://mickens.seas.harvard.edu/wisdom-james-mickens

The humour and tone is similar to the "Programming Sucks" article. I recommend starting with "The Night Watch".


I've read that before, and even have it bookmarked, but it was still worth it to read it again. It's a masterpiece.

...that's something scary to say about the tech industry, right there.


Maybe I'm crazy, but it seems too coincidental that multiple services

(Amazon, Facebook, Apple... that I know of, so far)

have been affected, on the same day.


It's related to Akamai DNS.


Do you have a source?


Trying to access some APIs I work with (DEP,V PP) earlier today and seeing 'akadns' as the common providrer.


Charter (ISP) was also out for over an hour—first time this year it was a complete outage and not just a DNS issue.


I've been on Charter all day VPN'd into work with no interruption. Although I have a business connection at home, so it's possible I'm avoiding whatever middleboxes they jam residential traffic into.


my ISP Cox, had server failure for several hours about 2 weeks ago. It was a DNS issue as well. I don't want to take 2 unrelated things and call them similar, but one has to imagine if some of the above posters are substantiated several huge players losing peering/connectivity and 2 very large ISPs having non-trivial multi-hour multi-region DNS outages is bad. Horrible coincidence.

I have been losing google docs intermittenly all day (and I have been working nearly exclusively in google docs all day) maybe just me, but I have been unable to upload photos (same one as yesterday to gdocs) and from drive/docs dash I would lose connectivity and have sever lag loading up a doc I just worked in. Had to close browser several times.


Maybe one of the backbone carriers has a bad day?


Perhaps, but I (network engineer at an ISP) haven't heard anything like that, and all of these services have multiple upstreams and private peerings.


Comcast has as few peerings as they can legally get away with. Redundancy isn't high on their list of things to care about, apparently.


> ... as they can legally get away with.

Can you clarify a bit? I'm not sure what you mean.

An ISP isn't, by law, required to peer with anyone.


If they didn't peer with anyone they'd be a closed network and it would be illegal to advertise that as "internet access".

They're legally required to live up to the claims of their product. If they say X Mbit/s then there needs to be at least a plausible case made that they can achieve those results. Anything significantly less is to invite a class-action lawsuit.


Well that was strange. Wonder what it was.


Why?


New NSA program rolling out? O_O


You laugh, but I guarantee you that the NSA has systems reliability issues just like the rest of us.


The last person that guaranteed me something about the NSA is now exiled in Russia ;)


This comment turned out to be surprisingly controversial for a pretty obvious joke. I've observed it at +4, down to +1, back up to +3, and now it's at 0! Heh. Didn't mean to rustle jimmies of any NSA contractors that may read HN. I'm sure in real life your rollouts are much smoother than this. :P


I'm sure the NSA employees and contractors are just happy to see non-threatening post about the agency.

The downvotes probably came from people tired of the joke. It is an oft-repeated joke.


Well, I agree it got too much attention, that's for sure.


No. That was the Syria/Iraq/Iran BGP outages from earlier today.


Are they all on EC2? Maybe EC2 is throwing a fit.


Amazon is, but Apple and Facebook both operate their own data centers.

Edit: I take that back, Apple uses several cloud providers, including AWS.


Apple does use S3 for storage. You didn't hear it from me.


You just have to run littlesnitch, and see all the AWS servers that Apple applications hit. I don't think it's a secret that a ton of Apple applications are backended by AWS.


All hail Little Snitch. So much insight available in that app. I wish there were rule sets for Little Snitch like there are for AdBlock Plus.


lost -i in the Terminal.app will do this as well.

With a little grep-fu you could have yourself a good watchdog.


I get 'command not found' so I'm assuming I have to install something? But I can't find anything when I search Google because of the generic name.


I believe he meant to write `lsof -i`


That works a lot better! Thanks both.


oh i'm still not used to the autocorrect on OS X. Thank you for correcting my error.


Indeed, I've read multiple articles about it, some of which seem to have been fed by the Apple PR machine:

http://www.zdnet.com/article/apple-cuts-aws-spending-signs-w...


They also use AT&T and Azure for storage too.

You did hear that from me, or anyone else with a firewall running on OS X.


and Google Cloud Storage as well!


...AT&T for data storage?


https://www.business.att.com/enterprise/Portfolio/cloud/

They're a massive company. When you already own telecom networks and run your own datacenters, selling cloud infrastructure services isn't too far fetched.


They also use METRIC TONS of ultra high-end storage.

Actually that's not saying much, because a single cabinet is around a metric ton.

So it was unintentionally accurate.


Apple discusses who it uses for cloud storage when discussing iCloud in their security guide (valid as of May 2016):

> The encrypted chunks of the file are stored, without any user-identifying information, using third-party storage services, such as Amazon S3 and Windows Azure.

[0]: https://www.apple.com/business/docs/iOS_Security_Guide.pdf


BGP? Tier-1 issues?


And also Azure


AWS status page is all green.

http://status.aws.amazon.com/


I've heard that the Doppler effect https://en.wikipedia.org/wiki/Doppler_effect is responsible for AWS status page to show up green perennially.

On a serious note, why would AWS admit that it's services were not quite working well and subject itself to potential liabilities.

AWS reacts shamelessly even when your EC2 instance goes down on it's own. For a sample of how they react, see https://news.ycombinator.com/item?id=11822298


Just for the record, our company has found that Amazon occasionally omits incidents from their status page, though I doubt they could get away with omitting something this significant.


"Occasionally"? <laugh>


</laugh>

Holy shit, you were laughing for 28 minutes straight. Are you alright?


You can tell who grew up in the XHTML era rather than SGML or HTML5.


Cause you know <laugh /> is just a gulp shortcut for <chuckle>&nbsp;</chuckle> (which is fully IE6 compatible)


XHTML 1.0 Strict all day.


All of the cloud providers have an incentive to downplay incidents and only post them to the status page when they're really bad. The bar is pretty high to get something on there.


I mean, they outline them in their fineprint. Amazon doesn't consider the unavailability of 1 EC2 availability zone an outage because their position is that you should be architecting around that. 2 or more AZ outages in the same region and it'll show up on their status page IIRC.


The only Apple thing that I know was on AWS was the beats1 radio thing when it launched, not sure if it still is.


Photos uses AWS for storage.


Do these things ever go red?

I mean, I understand the disincentive to do this, but for something like AWS (where error icons are even more subtle than this) I care about accuracy way more than looking scary.


Red? Not bloody likely. We should be cheering Apple for even using yellow, most of the time when things are on fire we don't even get that. Instead we get "green, with the tiny blue 'i' in the corner". Status pages are a bad joke.


Yeah. Like it or not, the status page reflects on the brand identity and having a status page that reflects the real condition of the infrastructure is immensely harmful to sales. I've found that most status pages suppress information about important events more than they publish it.

I wonder if there's an "engineer's feed" out there somewhere that looks like mumbo-jumbo to the layperson but could actually be useful in allowing companies to disclose issues without the softer side of the company flipping their lids.


I really appreciate Heroku's Status page - https://status.heroku.com/

Separately, many of the Status pages out there setup with the Statuspage.io service also incorporate "Pingdom" like features where heartbeats and/or checks are automatically collected and notifications and updates set without intervention.


Salesforce is one of the odd-balls that is 100% honest about issues with their service. One of their sandbox instances can have minor issues for 5 minutes (not often mission critical, just performance degradations) and I get an email - even for instances I do not use. I usually get a couple emails a day, and the trust.salesforce.com site shows the full history (including a colored timeline) of the availability of each individual instance.

It's nice to see they've carried on this same policy with Heroku, for all the flaws the company has they put their money where their mouth is.


Heroku's has always impressed me


I deployed a status board once. I was forced by the higher ups to take it down. Something about it sending the wrong message to our clients. Sending out emails once an hour with status updates was way more productive.


I have the "internal status dashboard" (read: "notifications of individual instance or app downtime") for us, and the "external status dashboard" (which is the app status from the front of the load balancer) for the higher ups to show to customers.


Our network monitoring software has a dashboard we use internally. Our external monitor was a Google Apps project that was manually updated. Never got far enough into it to automate it, yet.


Github is pretty transparent with theirs too: https://status.github.com/

All the numbers usually tell a story


For all their faults (and there are many), OVH's weathermap [1] and status page [2] are immensely good and seem surprisingly honest.

On the flip side, I've had a total inability to use EE in North Yorkshire for about 2 weeks and yet their status checker reports absolutely no problems in my area (whilst Level 2 support admits they won't have an engineer on-site until next Wednesday now).

Your suggestion for an engineer feed would be greatly appreciated; there's nothing worse than spending half a day trying to narrow down an issue only to find out it's a known and unreported fault... "to err is to be human, to lie about it is ruddy infuriating".

[1] http://weathermap.ovh.net/

[2] http://status.ovh.net/


Rackspace seems pretty honest in my experience.


No better than anyone else here really. The sad reality is that often these status pages are manually updated by the service teams themselves and when there's an issue you're busy trying to fix the problem, not update the status page.


At least they have the red icons ready ( <span class="serviceoutage"></span> to see what it looks like)


Very occasionally:

https://statusgator.com/services/apple

(Side project of mine)


Really cool idea / site.

Most startups / side projects these days seem to use a cloud hosting service... But you're project is meant to work especially when the cloud hosting providers are down.

Where are you hosted? Or do you host on multiple services?


FYI: "An error occurred during a connection to statusgator.com. The OCSP server experienced an internal error. Error code: SEC_ERROR_OCSP_SERVER_ERROR"


Weird, thanks for the heads up. Will investigate.


Quis custodiet ipsos custodes?


That's cool, but it'd be cooler if it did its own tests on what can be seen from the outside instead of exclusively mirroring data from corporate-run status pages, which can't be trusted.


There are a few services that do that, I was more interested in getting alerted when services reported problems.

Some find it useful, some don't.


Could be due to the limited nature of the outage. They state only some users are affected, which I can verify - App Store works fine for me, for instance.


App store (UK) was down for purhases for me ~1hr ago, but seems to be back up (though asked for a password after restart)


ITunes Connect and TestFlight in particular has been an absolute shambles for at least the last 3 months. Nobody there seems to give a shit or take responsibility (ranging from support up thru to the highest levels).


Total tangent but: I just had a hideously bad experience trying to get something done with my developer account. It's not the first time. The IRS is easier to deal with than the app store.

If iOS is their vision of the future of personal computing, then the future of personal computing is a walled garden presided over by a bureaucracy about as functional as a badly run DMV office.

There is absolutely no excuse for this kind of terrible customer support from a company with this kind of money. Their good UX near-monopoly and success seems to be making them lax toward their developer community... much like Microsoft became.


> There is absolutely no excuse for this kind of terrible customer support from a company with this kind of money.

Given the attention given to customer experience in other areas, one assumes Apple knows how to treat their customers.

One reasonable conclusion is that they don't see developers as customers.

"Sharecropper" is probably a much closer match to what they see.


Their developer portal is horrible. It sometimes feels like it's run by a 10 people backyard shop:

- doesn't allow to use password managers because their login forms are full of terrible JavaScript making sure nothing gets pasted

- using "security questions" for account recovery which could be easily "cracked" using social engineering

- SMS based 2FA

- max password length of 32 chars

- fixed keysize of 2048/RSA for APNS certificates, 4096 will be rejected


In fairness, 10 people would probably do a great job. It's either 1 person, or far far too many people :-)


In fairness to the IRS, they're actually REALLY easy to deal with. Or they used to be, I'm not sure how the budget cuts have impacted them but I found their CS quite good.


Yeah, it's totally unfair to compare the IRS to the Apple developer program. That's low.


IME, recently, they are still really easy to deal with, but often glacially slow to act. (Which can be unnerving when there is an outstanding issue, since they periodically, automatically send out dire-sounding notices in the interim.)


I've still found them to be incredibly helpful when you get through to someone, but it can take an hour or two on hold. They're also helpful via mail correspondence, but letters seems to have a 30-45 day turnaround.


Timely..been trying to apply a software update and download garageband for the last hour. Should've checked their status page. Guess, i'll wait it out.

Any relation between this outage and Amazon's from earlier?


Unlikely but a few of their datacenters are down the street from one another so could be related. We'll know when/if they tell us the cause.


> a few of their datacenters are down the street from one another

They have backup power systems and multiple backbone connections.


Apple uses Azure and AWS for a lot of their compute. It's not unlikely that an Amazon outage could affect their systems too.


But is in the process of moving to their own datacenters:

http://fortune.com/2016/02/02/apple-data-center-move/


Unlikely they have their own data centers.


"Unlikely, they have their own data centers" or "Unlikely [that] they have their own data centers" ?


It doesn't matter, neither is factually correct.


I'm also getting some bad outage with Facebook's services. www.messenger.com outputs an error and the facebook website has half the components missing.


As am I.

There might be something larger going on ... amazon.com product search is down again, too.


Yep, my notifications were jacked.


Same here, but seems better now.


I am getting that as well.


I noticed my iTunes Cloud music wasn't working earlier. Reminded me how much better all the iCloud stuff has actually gotten over the past 2y.


Some poor SREs have a shitty night ahead of them.



What if they are relying on Docker https://news.ycombinator.com/item?id=11822562?


thank god game center is still up!


Could this be related? Today we got an email from Amazon that our EC2 instance server has degraded hardware. We have until June 16 to spin up a new server before they kill this one.


No. An email like that would be regarding a single box giving hardware warnings, like a stick of RAM sending out more ECC correction alerts than their threshold for acting. It wouldn't be because of activities much higher than that.


I've gotten emails from them about a degraded instance and having until date X to spin up a new server.

... only I had already spun up a new server. Because the existing instance wasn't working. Getting their system to terminate the old instance required several attempts and force termination before it succeeded.

Considering their scale, I would expect better handling of the edge cases.


We were hit by a few AWS DNS issues over the past few days. Almost certainly a freaky coincidence.


So this is why my AppleTV won't connect to Netflix / Hulu right now?

I want a refund.


You need a connection to Apple to use third party apps you already have installed? If true that's ridiculous.


I was watching a movie purchased via iTunes and it stopped playing due to the outage. I switched over to Netflix without any problems.


Okay, it seemed a bit outrageous.


It's not true


Might be an Akamai outage.



Migrated to FoundationDB? =)


For those wondering about context: Apple acquired FoundationDB last year. http://techcrunch.com/2015/03/24/apple-acquires-durable-data...




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: