I've been on Charter all day VPN'd into work with no interruption. Although I have a business connection at home, so it's possible I'm avoiding whatever middleboxes they jam residential traffic into.
my ISP Cox, had server failure for several hours about 2 weeks ago. It was a DNS issue as well. I don't want to take 2 unrelated things and call them similar, but one has to imagine if some of the above posters are substantiated several huge players losing peering/connectivity and 2 very large ISPs having non-trivial multi-hour multi-region DNS outages is bad. Horrible coincidence.
I have been losing google docs intermittenly all day (and I have been working nearly exclusively in google docs all day) maybe just me, but I have been unable to upload photos (same one as yesterday to gdocs) and from drive/docs dash I would lose connectivity and have sever lag loading up a doc I just worked in. Had to close browser several times.
If they didn't peer with anyone they'd be a closed network and it would be illegal to advertise that as "internet access".
They're legally required to live up to the claims of their product. If they say X Mbit/s then there needs to be at least a plausible case made that they can achieve those results. Anything significantly less is to invite a class-action lawsuit.
This comment turned out to be surprisingly controversial for a pretty obvious joke. I've observed it at +4, down to +1, back up to +3, and now it's at 0! Heh. Didn't mean to rustle jimmies of any NSA contractors that may read HN. I'm sure in real life your rollouts are much smoother than this. :P
You just have to run littlesnitch, and see all the AWS servers that Apple applications hit. I don't think it's a secret that a ton of Apple applications are backended by AWS.
They're a massive company. When you already own telecom networks and run your own datacenters, selling cloud infrastructure services isn't too far fetched.
Apple discusses who it uses for cloud storage when discussing iCloud in their security guide (valid as of May 2016):
> The encrypted chunks of the file are stored, without any user-identifying information, using third-party storage services, such as Amazon S3 and Windows Azure.
Just for the record, our company has found that Amazon occasionally omits incidents from their status page, though I doubt they could get away with omitting something this significant.
All of the cloud providers have an incentive to downplay incidents and only post them to the status page when they're really bad. The bar is pretty high to get something on there.
I mean, they outline them in their fineprint. Amazon doesn't consider the unavailability of 1 EC2 availability zone an outage because their position is that you should be architecting around that. 2 or more AZ outages in the same region and it'll show up on their status page IIRC.
I mean, I understand the disincentive to do this, but for something like AWS (where error icons are even more subtle than this) I care about accuracy way more than looking scary.
Red? Not bloody likely. We should be cheering Apple for even using yellow, most of the time when things are on fire we don't even get that. Instead we get "green, with the tiny blue 'i' in the corner". Status pages are a bad joke.
Yeah. Like it or not, the status page reflects on the brand identity and having a status page that reflects the real condition of the infrastructure is immensely harmful to sales. I've found that most status pages suppress information about important events more than they publish it.
I wonder if there's an "engineer's feed" out there somewhere that looks like mumbo-jumbo to the layperson but could actually be useful in allowing companies to disclose issues without the softer side of the company flipping their lids.
Separately, many of the Status pages out there setup with the Statuspage.io service also incorporate "Pingdom" like features where heartbeats and/or checks are automatically collected and notifications and updates set without intervention.
Salesforce is one of the odd-balls that is 100% honest about issues with their service. One of their sandbox instances can have minor issues for 5 minutes (not often mission critical, just performance degradations) and I get an email - even for instances I do not use. I usually get a couple emails a day, and the trust.salesforce.com site shows the full history (including a colored timeline) of the availability of each individual instance.
It's nice to see they've carried on this same policy with Heroku, for all the flaws the company has they put their money where their mouth is.
I deployed a status board once. I was forced by the higher ups to take it down. Something about it sending the wrong message to our clients. Sending out emails once an hour with status updates was way more productive.
I have the "internal status dashboard" (read: "notifications of individual instance or app downtime") for us, and the "external status dashboard" (which is the app status from the front of the load balancer) for the higher ups to show to customers.
Our network monitoring software has a dashboard we use internally. Our external monitor was a Google Apps project that was manually updated. Never got far enough into it to automate it, yet.
For all their faults (and there are many), OVH's weathermap [1] and status page [2] are immensely good and seem surprisingly honest.
On the flip side, I've had a total inability to use EE in North Yorkshire for about 2 weeks and yet their status checker reports absolutely no problems in my area (whilst Level 2 support admits they won't have an engineer on-site until next Wednesday now).
Your suggestion for an engineer feed would be greatly appreciated; there's nothing worse than spending half a day trying to narrow down an issue only to find out it's a known and unreported fault... "to err is to be human, to lie about it is ruddy infuriating".
No better than anyone else here really. The sad reality is that often these status pages are manually updated by the service teams themselves and when there's an issue you're busy trying to fix the problem, not update the status page.
Most startups / side projects these days seem to use a cloud hosting service... But you're project is meant to work especially when the cloud hosting providers are down.
Where are you hosted? Or do you host on multiple services?
FYI: "An error occurred during a connection to statusgator.com. The OCSP server experienced an internal error. Error code: SEC_ERROR_OCSP_SERVER_ERROR"
That's cool, but it'd be cooler if it did its own tests on what can be seen from the outside instead of exclusively mirroring data from corporate-run status pages, which can't be trusted.
Could be due to the limited nature of the outage. They state only some users are affected, which I can verify - App Store works fine for me, for instance.
ITunes Connect and TestFlight in particular has been an absolute shambles for at least the last 3 months. Nobody there seems to give a shit or take responsibility (ranging from support up thru to the highest levels).
Total tangent but: I just had a hideously bad experience trying to get something done with my developer account. It's not the first time. The IRS is easier to deal with than the app store.
If iOS is their vision of the future of personal computing, then the future of personal computing is a walled garden presided over by a bureaucracy about as functional as a badly run DMV office.
There is absolutely no excuse for this kind of terrible customer support from a company with this kind of money. Their good UX near-monopoly and success seems to be making them lax toward their developer community... much like Microsoft became.
In fairness to the IRS, they're actually REALLY easy to deal with. Or they used to be, I'm not sure how the budget cuts have impacted them but I found their CS quite good.
IME, recently, they are still really easy to deal with, but often glacially slow to act. (Which can be unnerving when there is an outstanding issue, since they periodically, automatically send out dire-sounding notices in the interim.)
I've still found them to be incredibly helpful when you get through to someone, but it can take an hour or two on hold. They're also helpful via mail correspondence, but letters seems to have a 30-45 day turnaround.
Timely..been trying to apply a software update and download garageband for the last hour. Should've checked their status page. Guess, i'll wait it out.
Any relation between this outage and Amazon's from earlier?
I'm also getting some bad outage with Facebook's services. www.messenger.com outputs an error and the facebook website has half the components missing.
Could this be related? Today we got an email from Amazon that our EC2 instance server has degraded hardware. We have until June 16 to spin up a new server before they kill this one.
No. An email like that would be regarding a single box giving hardware warnings, like a stick of RAM sending out more ECC correction alerts than their threshold for acting. It wouldn't be because of activities much higher than that.
I've gotten emails from them about a degraded instance and having until date X to spin up a new server.
... only I had already spun up a new server. Because the existing instance wasn't working. Getting their system to terminate the old instance required several attempts and force termination before it succeeded.
Considering their scale, I would expect better handling of the edge cases.