Hacker News new | past | comments | ask | show | jobs | submit login
2020 Cloud Report: AWS vs. GCP vs. Azure (cockroachlabs.com)
266 points by dilloc on Dec 16, 2019 | hide | past | favorite | 122 comments



They calculated the costs based on 3 years running at the hourly rate.

That's kinda weird. How about including multi-year discounts? These are available to everyone.

[1] https://azure.microsoft.com/en-ca/pricing/reserved-vm-instan...

[2] https://aws.amazon.com/ec2/pricing/reserved-instances/

[3] https://cloud.google.com/compute/docs/instances/signing-up-c...


Elastic rates are really what you should be comparing when using cloud IaaS services, though. That's where the price works out in favor of using cloud IaaS hosts in the first place, after all.

If you have a stable set of instances and a known lifetime for them, then, before trying to calculate whether AWS or GCP is cheaper, step back and plug those same numbers into a regular non-cloud DC managed-hardware-leasing pricing page.


> a regular non-cloud DC managed-hardware-leasing pricing

You have to factor in the total cost of ownership (TCO) to make a fair comparison, which in almost ALL the cases, you are more than likely to overspend on bare metal boxes on your own DC. Some of the TCO components are:

- DC staff salaries

- Electricity

- Networking bandwidth

- SLA guarantee (yes this a hidden cost, e.g. if your DC power is out, you owe your customers fees depending on your SLA).

- etc.


Note, I didn't say "your own DC", I said managed hosting. As in, leasing a physical 2U server from a DC provider (just not a cloud DC provider), that you "temporarily own" (sort of like you "temporarily own" a condo you're leasing), but where the DC staff still has BMC access to the box, and will handle hardware going bad, etc., so you never have to drive out to the DC.

You know, the primary offering of DC providers like Softlayer, Hetzner, etc.

With a managed service, "utilities" (salaries, electricity) are factored into the lease. And bandwidth, as it turns out, is cheap-enough that many DCs will give it to you unmetered, since you can't use enough through the limited links they give you to dent their uplink.


Bit of an aside, but a lot of people in AWS or Azure can't run their workloads in Hetzner, OVH or what-have-you for compliance / paperwork related reasons.

Now SoftLayer I'm not so sure about - interested to hear from anyone offering services to say, gov or health from managed hosting and how that compares cost and experience-wise to AWS, Azure, GCP.


I've never had to deal with this, but there are tens of thousands of managed providers out there, so I figured some of them must have this type of compliance.

The first two that I looked at, Hivelocity and ReliableSite both seem to have a number of certifications, as does our current provider, LeaseWeb.

Is there a specific certification that's really sets AWS/Azure/GCP apart?


> - SLA guarantee (yes this a hidden cost, e.g. if your DC power is out, you owe your customers fees depending on your SLA).

But Amazon and GCP can go down too and their SLA does not necessarily fully insure against your SLA.


Yes, everyone could and will go down at one point or another. Are you saying that your in-house team can manage infrastructure equally well or better than these big public Cloud vendors? They have thousands of site reliability engineers, don't they?

The crux of this is how do you compare in managing site reliability. Perhaps you have a world class team you could do that better than yes my point there is moot. But 99% of the time, it's not.


Amazon has done a fantastic job of making people think the choice is between them or managing your own infrastructure. Your concerns make no sense in the context that your parent presented: a managed DC.

And, frankly, your parent was being generous. Even if you only look at elastic workloads, the workload has to be a) extremely elastic and b) not fall into some pretty common patterns for cloud to make any kind of sense.


To match the reliability of something like an aws managed service, you usually need two or three managed DC's.


A number of managed providers have multiple DCs within a region as well as DCs in multiple geographic locations.

Also, many have been in operation since before AWS was a thing, and some are larger. So I can't imagine what AWS knows about running a datacenter that others don't.

Now maybe in theory if you can build something to be fully one with the cloud, considering all edge cases, and limiting yourself to only cloud-zen tools (or building your own, or doing vendor lock in). In theory, with enough money, I guess the cloud lets you maybe achieve higher reliability.

The fundamentals of EC2 (lack of dual NICs, dual power supplies and BBU RAID + virtualization and general complexity) means that a single instance is way less reliable (let alone, much worse value) than a single dedicated box. The complexity you need to throw on top of that building block (in the shape of lock-in, compromise, money, latency, application complexity or a combination of these) is pretty significant.


It's adorable that you think batteries and dual PSUs are something that makes a node more reliable, rather than less reliable.


What's the point of dual PSU if not reliability?


I'm not sure what the point is, actually. The variety of things that can go wrong with them is astonishing. For one thing, among many others, BMC cards will power-cap the max clock speed of CPUs when a machine is running on only one PSU, which can cause a degradation that's worse than if the machine had just halted. There are a zillion other edge cases like that.


Managed service (like S3) is different than managed infrastructure. EC2 VMs only run in a single zone too and require cost and complexity for more redundancy.

Building out your application across multiple regions in AWS is not much different than using multiple DCs from a managed host. The clouds provide live migration, spot instances, and fast global VPC networks that can make it much easier, but you also pay the premium for it.


well, kind of. I think it is pretty common to have a base amount of compute you need, but still need the flexibility to elastically scale up when required. So real workloads are generally a mix of elastic and reserved rates. But how much of each really depends on the workload, so cockroachdb did the easy thing, and just compared the on-demand prices.

Also, there are a lot of things that the big cloud providers manage for you. I haven't dealt with DC managed-hardware, but I imagine that you probably have to do a lot more to set up networking, provision VMs on raw hardware, etc.


> If you have a stable set of instances and a known lifetime for them

I do but I don't even know how to assemble my own computer, much less deal with bare metal for servers. I want to stay as far away from hardware as possible.


You can just rent servers. My problem is that I don't know the CPU time I need to run my app. If you say an app needs 100 or 100,000,000 CPU hours a month, I wouldn't be able to really verify that.

I don't know how cloud providers even measure the CPU-time. Probably from VMs. What about services like logging, health-checks and load balancer? Is there a position for that too?

At the end of the month I get an invoice from my providers that say that I needed X processing power. I have to believe it and just accept the price if it is worth it.

I am sure there is elaborate performance monitoring software out there, but I doubt many developers really verify the bills they get.

Providers could just randomly add a few dollars on my bills and I heavily doubt that I would notice. Not wanting to give them any ideas here...

So a rented server in the end gives you much more control about unknowns related to costs. Doesn't mean it has to be cheaper and is as easy to maintain.


What's so difficult about clicking buttons on a web form?


What's an elastic rate?


The price without sustained use or commitment discounts. The raw cost of a compute unit for a random day/hour/minute.


Pay-as-you-go rate - either by the minutes or by the hour.


> That's kinda weird.

Not really.

On-demand capacity is (a) what the cloud is known for and (b) a reasonable common denominator.

Reservations get a lot more complicated with policies around usage accounting and transferring and selling them.

Obviously there is a lot more available than hourly VPS (reserved, interruptable), but that's a decent benchmark to start with.


Yeah, and without comparing spot/interruptible prices, these results are meaningless for me.


You wouldn't typically run your database VMs on spot/interruptible instances.


You wouldn’t typically run your databases in a cloud environment on your own VMs - you would use a managed service.


That depends on several factors. The database you might want to run may not be available for instance.


I'm still waiting on that managed PostgreSQL 12 service....


Almost....

PostgreSQL 12.0 Now Available in Amazon RDS Database Preview Environment

https://aws.amazon.com/about-aws/whats-new/2019/11/postgresq...


I would never use a managed database service again.


Managed databases are quite good. RDS is very mature and does a very good job with fail over, backups and is very easy to setup.

I don't want to manage anything that needs to be clustered. Technologies like that often need trained engineers with good knowledge of how the product behaves if we want to manage it ourselves.


Alas it's the only way to get decent efficiency on certain cloud platforms. However, you often can't get the custom plugin and modules. On amazon bare metal ephemeral disk instances are decent high performance alternatives but you can't beat aurora for most pg usecases.


This sounds like the first line in a novel.


Unfortunately, "Nor would I use cloud-based document storage" is the next, and last, sentence.


Well, plenty of people do...


Why?


But you would run expensive machine learning jobs on them, and these (GPU instances) tend to offer more significant absolute savings when they're pre-emptible.


why not? Just kidding, but it's getting more common :)


In distributed databases with appropriate grace under churn... Sorry I wasn't clear in my earlier post


Yeah but my database costs are a tiny sliver of my overall expenses and the instances will be located in whatever cloud provider hosts the rest of my infrastructure.

How many people are running a cloud account with ONLY a cockroachdb in it?

Your point seems pretty obtuse tbh.



These reports behind an email wall frustrate me to no end. I don't want to receive your spam to read a single article!


Its rather expensive (and time consuming) to produce reports like this. They are doing it to generate leads.


If there is a cost associated with producing content and that cost can’t be recouped by the presentation of that content (speaker fees at conferences, advertisements, sponsorships, etc) then the content should be sold for a fair market rate.

That’s basically the entire point of GDPR. Stop making people trade personal information for “free” content.


No that's not the point of GDPR, it's about choice and control over data.

You're free to not share your email if you don't want to read the report. The rest of us can make our own decision.


GDPR literally prohibits you from requiring personal information in exchange for something that doesn’t require that personal information in order to function. You’re right it’s about choice and control over data and “you can only have this thing if you give us unrelated personal data” is not choice nor control.


It's not that simple. GDPR is purposely vague and regulated "in principle" rather than by the letter. Otherwise anyone operating a digital storefront, email newsletter or even subscription paywall would be in violation.

GDPR doesn't have much standing against gated content and many top law firms agree. There's also a combination of legitimate interest, business vs personal emails, and contract in effect to sign up for communications as workarounds. And to be extra clear, GDPR has no enforcement outside of the EU so it can't do anything against a US-based database company anyway.


Narrator: No they aren’t.

They’re going to generate unsubs and spam reports.


Those can be filtered out quickly. Remaining are potentially valuable leads.


My understanding is that gated content is not compliant with GDPR which requires consent to be given freely and unconditionally. Can anyone comment?


That's a misunderstanding. You're not entitled or reliant on this report and are completely free to enter your email or leave the page.


You don't have to access the content, that's your feel free choice. Not an expert, but that's my take.


> ...in 2018, AWS outperformed GCP by 40%, which we attributed to AWS’s Nitro System present in c5 and m5 series... In 2020, we see a return to similar overall performance in each cloud.

How did GCP and Azure catch-up to AWS Nitro in one year which IIRC is a coming together of minimalistic micro-vms, hardware-accelerated network and IO cards, hardware offload for encryption and other maintenance tasks... a work that was 5 years in the making [0]?

[0] https://www.youtube-nocookie.com/embed/rUY-00yFlE4?t=1m45s


Meltdown and Spectre did a real number on the c5 and m5 instance types.

Amazon's original fix reduced performance substantially, which lot's of people noticed, but no one knew why till the vulnerabilities were announced.

Meanwhile Google built a clever workaround which minimized the pain. It was really a mic-drop moment from Google and I wish someone would dig in deep and tell the whole story.

At this point I'm sure AWS has merged Google's patches, but it showed how much Google was investing in their GCP offerings.


It is less to do with Google investing in GCP offerings than their investment in general in Operating Systems and their sheer quality of engineers.


Which I am sure will also work at Amazon or have worked at Amazon.


Maybe


Unlikely.


Interesting! Do you know of any resource where I can read more about this?


The tech is called Retpoline, and this blog has some pointers: https://www.blog.google/topics/google-cloud/protecting-our-g...


There was a few interesting bits in a post the other days about their new compute VM class, #4 in particular https://cloud.google.com/blog/products/compute/understanding... Its pretty light on details but was news to me.


This is not related to Spectre/Meltdown, which was over a year ago.


> GCP’s network looks much better than either AWS or Azure’s networks.

> Not only do their top performing machines beat each network’s top performing machines, but, so to do their bottom performing machines. Even their least performant machine (n1-highcpu-16 in figure 10) is consistent with AWS’ maximum network throughput as seen in our tests.

> This is especially impressive because last year, AWS outperformed GCP in our network tests. It is a credit to GCP that they have improved their network performance and we are left wondering exactly how they accomplished this improvement.


It's more likely that all of them started working on it at about the same time and AWS came out first.


The writeup states that their observation is that compute perf is the biggest factor in the TCP-C benchmark. Nitro AFAIU is mainly about networking.

I think all the clouds use the same basic physical CPU sets, so I'm more surprised about the perf being much different in previous years, than I am about the perf being similar this year. Maybe the hypervisor layer was much more efficient at AWS or something?


I remember that GCP claimed they will follow Moore's Law and drop the price for the same instance type over time. It was just BS marketing, unfortunately I believed them.


AWS also stopped to drop instance price periodically. Sadly Moore law has stopped some time ago affecting everyone.


As we're starting to see, it was mostly Intel not keeping up, actually. Moore's law is mostly holding up till today, and probably for a couple more years before we really have to switch paradigms.


The definition of doubling performance every few years stopped holding up a while ago. The formal definition of doubling the amount of transistors somewhat holds up, see the huge increase in core counts in recent years, but that doesn't benefit most applications much. Incidentally cloud providers are billing by core count so adding more cores doesn't help the bill.


Nope. It's just that GPUs carried the mantle. If you have a massively parallel number crunching application you should bite the bullet and port it to GPU, and you'll see a truly massive FLOP/$ increase. And unlike processors GPU performance is still rising 35-45% per generation, although that will slow if NVidia gets too far ahead.

Also, if your goal is to get optimal cost/FLOP and you are computing pretty much constantly then you shouldn't be using the cloud, to be honest. If you are IO limited or if you have burst use then maybe, but for cost/FLOP the kings are still consumer GPUs and Threadrippers, by very far and large.


Most applications that run on a server create a process or thread per request/session and scale reasonably well.


I’ve observed price cuts over time for Azure and AWS but very hard to get good published data.


Sad story


Well done Cockroach labs team. This data is very valuable and much appreciated.

A small nitpick if you're reading. Fig 19: Sotrage: Azure Write Throughput the y axis has an incorrect label of "1,00" when it should be "1,000"


Cockroach here: Thank you! We'll get this fixed.


A most minor point (and not to detract from the great effort you to went to produce the report): it uses yellow for AWS (good, their logo is yellow/orange) but red for Azure (bad, Azure is ... azure.)


Looking at the Azure networking numbers I wonder if they got their configs correctly optimized. Azure networking was always tricky to get best performance when I used it heavily at my last co.


No, I contacted them directly after seeing this report and confirmed they did not even use accelerated networking (which is a free, but opt-in service on Azure), much less things like placement proximity groups (which only recently went public). So machines were communicating across availability zones (separated physically by miles and logically by additional networking layers) and a bunch of networking was happening in CPU rather than offloaded to FPGA.

Disclosure: I work for Azure networking.


(I'm sending this feedback here because Azure feedback via official channels is printed out and then fed directly into a shredder.)

Can you answer why -- for the love of God -- why Azure IPv6 networking is doled out in microscopically small /124 blocks (16 addresses)!?

The standard is a /64 at a minimum for residential connections, and /48 is recommended for most premises, particularly business connections. Azure could easily obtain a /32 for each of their regions, providing a very roomy 4 billion /64 scopes per data centre.

Right now, if I want to "embrace IPv6" and all of its advantages, such as a flat address space and the elimination of NATs, I will have to either:

1) Juggle a bunch of /124 prefixes and carefully allocate services to them. This is a load of fiddly scripting or manual work.

2) Probably be forced to NAT anyway!

3) Pay for addresses that ought to be too cheap to meter.


Sorry, I'm pretty far removed from low-level networking, on CDN team and fairly new at that. I'd have no idea who to ask.


Why is accelerated networking opt in? It seems like making it default would improve these comparisons a lot and would help a lot of your customers who likely haven't found this setting either.


I don't know offhand, it's a different team that runs it. I'd guess it's something related to backwards compatibility and the general "roll big changes out slowly so it doesn't break" credo. And it may be tied to a certain kind of HW that supports it, which doesn't exist everywhere, since cloud providers can't just throw out all their HW each time a new feature comes in (as much as we wish we could). But that's just a guess.


Yeah was about to say - I'm sure I've seen numbers 5x that even on small azure instances.


I suspect they are bottlenecked on CPU - most of these instance types are capable of higher throughput than they've captured, but you'd need more than 1 Iperf process to measure it on Azure.


I wish they had benchmarked the arm instances on AWS, I'd like to see a 3rd party validate AWS's claims about performance.



I saw this just now on Twitter (performance/cost building FreeBSD): https://twitter.com/cperciva/status/1206688489518985216


This is a good report, but I'd also be interested to see focus on GPU performance on both classic GPU intensive workloads and machine learning workloads.


> cloud

> compares virtual machines

I mean cloud is different for everyone, but for me a VM is far from cloud. It's the least differentiable item in each of their suite.


Did this use the latest 36% higher performance EBS systems on AWS?

https://aws.amazon.com/about-aws/whats-new/2019/12/amazon-ec...


I think they're using local storage not EBS.


Five out of 12 AWS models they tested were EBS based.


After clicking on this link, I now get ads about this report.


GCP seems to have gotten the "most improved" award, but is Azure winning anything?


Support and confidence of Big Enterprise....


Pentagon DoD contracts — Jedi.


Performance is only half the battle. Companies need high SLAs, compliance, easy integration, familiarity, and a whole lot more.


They should add Alicloud.


I wonder why they stuck to the "standard" machine types on GCP. Unlike with AWS you get to vary the config there somewhat, so if your workload benefits from more CPU you can add jus the CPU. Same for RAM. That will affect the per-dollar figures because you can tailor your instance to your workload pretty exactly.


Sorry for the non-technical comment here, but why do I have to provide my email address to read the full report PDF? I suppose the marketing department at Cockroach Labs proscribed this workflow.

Is this a viable strategy? Might it be better to provide the report without the annoying engineers with the email requirement? How many readers of the report are interested in Cockroach Labs, rather than the benchmarks?


>, but why do I have to provide my email address to read the full report PDF?

... because the people who made the PDF content want to make a trade: your email address for their PDF whitepaper.

The industry jargon for this technique is "inbound marketing".[1]

>Is this a viable strategy?

Yes, it often is. But I do get that it annoys a lot of people. That's why 99.9% of the time, I also don't bother giving my email address just to read pdf whitepapers.

It's a voluntary trade -- and like you, I usually choose not to do it.

[1] https://www.google.com/search?q=%22inbound+marketing%22


You can always use a disposable email address like 10 Minute Mail: https://10minutemail.com/


Notice how the responses are honest ones, where people would rather skip the report than supply their real email address? The target demographic are tech savvy, so 90% of the leads are going to be 10MM email addresses.

The popularity of this thread illustrate that there is a need/use for this content - so Cockroach nailed the bait, but have massively overshot the type of trap required to convert these savvy mice.


Nah be realistic 99% of people will give their work email to read it and not care.


That 10minutemail workaround is easy to block, just require a business domain and a use a verification link.


It's even simpler to not care - somebody using 10minutemail wouldn't react anyways, thus they send out some automated mail after a while and if there's no reaction kick the address of their active leads ... tracking down invalid mail domains is effort, sending out a few mails more is no effort.


you can get email/verifications from many temp email services.


Like many similar problems, it's a cat and mouse game. For example, it's equally easy to detect adblockers, but they are still really effective.


Cockroach is a DB company. Customer acquisition cost is in the thousands if not tens of thousands or hundreds of thousands.

The steps I mentioned above are what, maybe a few hundred dollars. Trust me, their lead generation pipeline most likely already subscribes to this and utilizes it with minimal concern.

There are plenty of CRM tools that can flag out BS domains and/or low value not worth it domains.

If you are a person who gets by using 10minuteemail, you are not their target market. So will their target demographic.

The extra hop is very necessary for prevalidation. And they will be just fine with it.

You on the other hand, being the cheapskate that you are, have to commit an extra 5 minutes every time you generate these BS emails. Obviously you don't value your time as much. And that is why, again, you are not the target market because you extract no business value from accessing the knowledge they spent time and money aggregating for you. Your getting the PDF is inconsequential to both them and yourself.

So go ahead, no one really cares.

EDIT: Lol, you was not referring to you specifically, just the folks who think they are being smart or outsmarting corps by using these services like 10minutemail or whatever it's called


Your comment is particularly vitriolic.

I am squarely in their target demographic and yet refuse to provide my email. I like keeping noise low in my work inbox - if I will decide I'd like to buy something from cockroachDB, I'll approach them. It's purely negative value to have them approach me - I have never received any useful messages by being on a marketing list.

It takes way less than 5 minutes to generate a temporary email address.

Also, I fail to see how refusing to provide your email address correlates with frugality. If anything, people who are cheapskates would be happy to pay with anything but money.


Ad budgets are much bigger portion of acquisition cost than content marketing. Do you have an equally emotional reaction to people who decide to use ad blockers?


Right! 10mm and many others are on our blacklist when we collect email (2-opt-in newsletter and sign-up)


I guess maybe they only care to deal with people who want this report badly enough to jump through some hoops?


They should also have included oracle which will soon surpass google (making it 5th most popular) cloud


> They should also have included oracle which will soon surpass google (making it 5th most popular) cloud

I seriously doubt it. Only companies already heavily invested in Oracle are taking their offering seriously. They are pretty late to the party.


Surely, if we were including other clouds, we'd include IBM before Oracle... :))


I would add Alibaba Cloud before them too :)


Oracle's cloud is an also-ran. Just follow the investment the cloud giants are making. Oracle is not even in the same league. From:

https://www.platformonomics.com/2019/02/follow-the-capex-clo...

"Google’s incremental CAPEX spend in 2018 is about what cloud pretender Oracle has spent on CAPEX throughout their entire history."


Sure, and I'll just patch together a few of my home PCs and we can call that a cloud too


What rankings are you referring to?


Revenue


google sponsored article.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: