> - SLA guarantee (yes this a hidden cost, e.g. if your DC power is out, you owe...

devy · on Dec 16, 2019

Yes, everyone could and will go down at one point or another. Are you saying that your in-house team can manage infrastructure equally well or better than these big public Cloud vendors? They have thousands of site reliability engineers, don't they?

The crux of this is how do you compare in managing site reliability. Perhaps you have a world class team you could do that better than yes my point there is moot. But 99% of the time, it's not.

latch · on Dec 17, 2019

Amazon has done a fantastic job of making people think the choice is between them or managing your own infrastructure. Your concerns make no sense in the context that your parent presented: a managed DC.

And, frankly, your parent was being generous. Even if you only look at elastic workloads, the workload has to be a) extremely elastic and b) not fall into some pretty common patterns for cloud to make any kind of sense.

joshuamorton · on Dec 17, 2019

To match the reliability of something like an aws managed service, you usually need two or three managed DC's.

latch · on Dec 17, 2019

A number of managed providers have multiple DCs within a region as well as DCs in multiple geographic locations.

Also, many have been in operation since before AWS was a thing, and some are larger. So I can't imagine what AWS knows about running a datacenter that others don't.

Now maybe in theory if you can build something to be fully one with the cloud, considering all edge cases, and limiting yourself to only cloud-zen tools (or building your own, or doing vendor lock in). In theory, with enough money, I guess the cloud lets you maybe achieve higher reliability.

The fundamentals of EC2 (lack of dual NICs, dual power supplies and BBU RAID + virtualization and general complexity) means that a single instance is way less reliable (let alone, much worse value) than a single dedicated box. The complexity you need to throw on top of that building block (in the shape of lock-in, compromise, money, latency, application complexity or a combination of these) is pretty significant.

retrovm · on Dec 17, 2019

It's adorable that you think batteries and dual PSUs are something that makes a node more reliable, rather than less reliable.

zielmicha · on Dec 17, 2019

What's the point of dual PSU if not reliability?

retrovm · on Dec 17, 2019

I'm not sure what the point is, actually. The variety of things that can go wrong with them is astonishing. For one thing, among many others, BMC cards will power-cap the max clock speed of CPUs when a machine is running on only one PSU, which can cause a degradation that's worse than if the machine had just halted. There are a zillion other edge cases like that.

manigandham · on Dec 17, 2019

Managed service (like S3) is different than managed infrastructure. EC2 VMs only run in a single zone too and require cost and complexity for more redundancy.

Building out your application across multiple regions in AWS is not much different than using multiple DCs from a managed host. The clouds provide live migration, spot instances, and fast global VPC networks that can make it much easier, but you also pay the premium for it.