This isn't a Cloudflare-specific issue. Level 3/CenturyLink are in trouble. Affecting other providers (see, for example, Fastly's Status page: https://status.fastly.com/).
"The IP NOC with the assistance of the Operations Engineering team confirmed a routing issue to be preventing BGP sessions from establishing correctly. A configuration adjustment was deployed at a high level, and sessions began to re-establish with stability. As the change propagates through the affected devices, service affecting alarms continue to clear"
It’s a point, but it’s repeated ad nauseum in every thread. It’s the nature of having CDNs or cloud services in the first place. If you want to outsource your uptime, you sacrifice certain freedoms you had in exchange for hypothetically lower costs for operations.
a lot of people feel that the outsourcing was unnecessary for a lot of low-traffic sites, and was mainly the result of marketing pushing cloudflare to everyone. It doesn't make sense that some tiny blogs go down when there's a cloudflare outage. And it creates side effects too: e.g. cloudflare's anti-spam checks make it nearly impossible to create a functioning link fetcher/previewer unless you re a big enough site to ask for a manual exception.
Well, for most smaller websites using something like Cloudflare means they are more decentralized, more easily, than less (e.g. instead of using their single source of failure own server).
>The problem is that in aggregate you end up with an internet with a single point of failure.
And the other problem, which I'm pointing at, is that without using something like Cloudflare, you end giving yourself more points of failure (more pressure, DDoS, lack of easy load balancing, most costs and devops to implement those yourself, etc).
And each site doesn't care if 10000 others go down together -- if anything that's good, if their competitors go down for a while. They care for their own status...
>And even if as it is now isn't a big problem, who's to say that one day it is a huge problem?
I'd say periodic mass failures should inform our usage and dependance patterns of the internet so that we're not 100% dependent on it 24/7, in which case sites going down together never becomes "a huge problem".
In other words, one way to never have it be a huge problem is to make the internet perfectly decentralized (which is impossible anyway -- first because sites people care about is a power law distribution, e.g. Google, MS, Amazon, stores, app stores, etc, so if Google goes down there's a disruption to billions of people, even if millions of lesser sites are up that much fewer care about, -- and second because critical instrastructure is shared, e.g. undersea cables etc.
The other way to never have it be a huge problem is to learn and adapt to situations when sites might be done, and build resilient alternative ways of operation (analogue, if need be).
But if the issue is with transit (as it appears to be) and Cloudflare peer with multiple transit providers - how is it worse?
Of course in other circumstances a single dominant player may be an issue - but it doesn’t look to be the issue here - and if anything their level of peering would allow faster recovery than other smaller content hosts (used in the loosest sense) which may have less peering.