What's the Best Cloud? Probably GCP

StreamBright · on March 10, 2016

I am not sure about the technical merit of this link. Best of show:

"Google probably has the best networking technology on the planet."

How do we quantify this?

"This is important for several reasons. On EC2, if a node has a hardware problem, it will likely mean that you'll need to restart your virtual machine."

I would much rather create a service that can tolerate single node outages than relying on "live migrations". I am not sure what he meant by the SSD comparison, Amazon EBS that can be SSD but still it is a network mounted storage.

"Most of GCP's technology was developed internally and has high standards of reliability and performance."

Guess what AWS was developed for.

I like hand-wavy, articles as much as any other guy, but it seems to me they picked GCP and wrote an article to justify it, an cooked up some numbers with single dimension comparisons to make it look like scientific. I wish I was working on single dimension problems in real life, but it is always more complex than that. I am more interested in worst case scenarios and SLAs than micro-benchmark results when comparing cloud vendors. Discarding Azure was purely arbitrary, in fact, Azure is more than happy running Linux or other non-Windows operating systems, I am not sure where he got the idea of " Linux-second cloud".

https://azure.microsoft.com/en-us/blog/running-freebsd-in-az...

thesandlord · on March 10, 2016

Disclaimer: I work for Google Cloud

> "Google probably has the best networking technology on the planet." How do we quantify this?

In the article they did a bunch of tests. Quote: GCP does roughly 7x better for the comparison of 4-core machines, but for the largest machine sizes networking performance is roughly equivalent.

There is also https://github.com/GoogleCloudPlatform/PerfKitBenchmarker if you want to benchmark things yourself.

Seriously, try it yourself. I think you will be pleasantly surprised.

> I would much rather create a service that can tolerate single node outages than relying on "live migrations".

Services should tolerate node failure even on GCP, live migration does not really help with that. It's more about reducing ops. With AWS, you have to manually reboot your machines when a infra upgrade happens. With GCP it is automatic.

> I am not sure what he meant by the SSD comparison, Amazon EBS that can be SSD but still it is a network mounted storage.

I'm not too sure what your question is?

> Discarding Azure was purely arbitrary

Agreed, would love to know more about why they didn't consider Azure

StreamBright · on March 10, 2016

Disclaimer: I used to work for Amazon, does not own any AMZN anymore

1. AWS does explicitly tells you up front that smaller instance sizes come with smaller network throughput. This is well known and well communicated even when you browse the instance offerings. Doing 7x better for a 4 core instance is hardly relevant (depending on the actual CPU type though), being able to saturate your pipe would probably consume much of your CPU time and you could hardly do anything else on the box. You can prove me wrong on this one. Synthetic benchmarks are not really relevant for production use cases.

A good read in the subject: http://www.brendangregg.com/activebenchmarking.html

2. On reducing OPS. You are implying that these OPSy things are not automated. You should ask your SRE co-workers about this one. For running a website this scale, you absolutely need to automate cases when the server is rebooted. Meaning, on shut down it needs to remove itself from the load-balancer or from the resource pool, and when it comes back it has to put itself back. Worst case scenario you can just terminate the instance and let auto-scaling do its job. All of these are completely human attention free operations in most cases, but I do understand that some smaller customers are not so advanced with automation and GCP might be optimizing for those clients.

3. I do not have any question, as I pointed out that in the article the author is talking about EBS while it might appear to the reader that he is talking about some sort of local SSD.

4. Great! I would like to know it too! We should petition together. :)

jsolson · on March 10, 2016

(Disclaimer: I work on the hypervisor that lives under Google Compute Engine)

1. PerfKitBenchmarker includes meaningful benchmarks for things like Redis, Aerospike, Memcache, etc. We expect GCE to score well on these when measured in terms of performance/$, and chunk of why we expect that is from superior network performance. Even small instance sizes tend to saturate their provisioned network long before they saturate provisioned CPU; GCE provisions more network (up to 2 Gbps/vCPU per our public docs).

This also applies to custom VM shapes. This allows workloads like memcache (which require very little CPU per request, typically) to be provisioned on small instances that still have relatively beefy networks with oodles of RAM with costs proportioned appropriately.

2. GCE handles instance failures differently from EC2. Certainly both platforms will have instance failures that cannot be solved with migration; this is absolutely something software stacks must work around. Live migration allows us to drive down the number of failure modes which cause an discontinuity in instance lifecycle, but obviously they cannot be eliminated entirely.

That said, when an instance in GCE fails it is by default restarted as quickly as possible (possibly on another host). To the guest this appears as an unplanned reboot. My understanding is that you can accomplish the same on EC2 by 'recovering' and instance[0], and that further you can automate this recovery with CloudWatch, but none of that is required on GCE.

I think we're in full agreement in terms of automating OPS, I'm just of the (obviously strongly biased) opinion that GCP is ahead in terms automating things on behalf of customers "out of the box".

[0]: I previously worked at Amazon, but in Retail at a time when the deployment tools for EC2 were... somewhat exotic. I lack experience with what the general best practices recommended to external customers is.

StreamBright · on March 10, 2016

1. Thanks Jon, this is exactly the sort of comment I was looking for. Yes I totally agree, if you have a memcache use case your are going to hit network limitations before you hit CPU. I was just pointing out that HTML rendering is different from running memcache or a distributed disk persisted key-value store. Amazon figured out the need for different use cases and introduced R3 instance types with few cores, large amount of memory and enhanced networking support. This is why I found a little-bit unfortunate the make general statements like "4 core instance has better networking on GCP". Depends which instance type you are using.

https://aws.amazon.com/about-aws/whats-new/2014/04/10/r3-ann...

2. Agreed, making it easier for the customers is always better.

Heh, I was working there when Retail moved to EC2, much fun! :)

vgt · on March 10, 2016

Google Cloud platform offers Custom Machine types specifically to help you configure the most optimal CPU/RAM combinations:

https://cloud.google.com/custom-machine-types/

Quizlet's post alludes to Google's attitude as well. With exception of GPU instances, Google's VMs are generic. You are able to get incredibly fast SSDs, best in class networking, etc, on just typical instances. Benefits are pricing is simpler, spot instance/preemptible VM market is simpler, and you get much more architectural flexibility.

(Disclaimer - work on Big Data @ Google Cloud)

abrookewood · on March 11, 2016

That should probably be emphasised a bit more in both the article & in general. It's fairly common to have wasted RAM or CPU or whatever because you had to pick a particular instance type in AWS ("I need better networking, so I'll have to pick a larger instance ... pity I don't need those extra cores").

_pfwi · on March 10, 2016

1. For micro-instance and 32 core instance, the difference AWS and Google is not a big deal. For rest of the instances, Google Cloud is 4 - 7 times faster. That's not a "synthetic" benchmark.

2. Yes, everyone should automate Ops, but if the Cloud provider takes away some of the pain, its a win.

StreamBright · on March 10, 2016

1, I am not arguing with that, I arguing it is not relevant for most of the use cases, you get the conclusion based on a synthetic benchmark. Real life example: running a service for rendering HTML, you use most of your CPU time for the actual rendering and some for the communication, you are not network bound even on a 4-7 times slower network. Again, you might find a use case that use very little CPU and all of the network IO. In that case it is relevant that GPC is 4-7 faster.

2. Sure and this is hardly relevant to me because I automate most of my work. For small customers with less automation it is more relevant as I pointed out.

mikecb · on March 10, 2016

If you're interested in datacenter networking, can't miss ONS. Amin Vahdat, the guy in charge of networking at Google, has keynoted the past two years, with hardware in 2015[1] and software in 2014[2]. Mark Russinovich also spoke about Azure's SDN, but it doesn't come close to Amin's presentation.

Every once in a while you'll hear some random spec from one of these companies, and it's always pretty surprising, but Amin's team has achieved 5 Petabit/s in bisectional bandwidth. It's more than surprising.

[1] https://www.youtube.com/watch?v=FaAZAII2x0w [2] https://www.youtube.com/watch?v=n4gOZrUwWmc [3] https://www.youtube.com/watch?v=RffHFIhg5Sc

StreamBright · on March 10, 2016

Thanks for the links! This is a pretty interesting topic and I am going to watch these videos.

_pfwi · on March 10, 2016

We did a performance analysis of Google Cloud vs AWS. The results are in line with what is published in the post. The biggest thing that we can not quantify is "ease of use". Google Cloud is a pleasure to work with. AWS feel so clunky compared to Google Cloud. Don't take my word for it. Create a VM, login into it on AWS and Google Cloud, you will change your opinion about what a good cloud is.

autotune · on March 10, 2016

If you're using the GUI to manage your resources rather than going the Infrastructure As Code route, you're probably doing it wrong. You should be using a tool like Terraform, which lets you use multiple cloud providers (https://www.terraform.io/docs/providers/), and can actually tell you if there are any immediate errors before attempting to launch a resource, so is friendly with Jenkins or any other CI tool you prefer to use as a result as well.

_pfwi · on March 10, 2016

We don't use GUI to manage our resources. We use CloudFormation for AWS and Deployment Manager for Google. Let me tell you a couple of things about those services. In AWS some resources are zonal, some regional and some are global. It's a mess to work with. For example, same AMI image has different ids in different regions. You need to create maps and stuff to make your code work across regions. Come to Google Cloud, no more zonal/regional/global fuss. An image is a global resource. It's available by the same id in all regions. Your infrastructure template looks much cleaner. Combine the power of Jinja, you can create far powerful templates and evaluate them on the fly. AWS has "three" queuing systems, "two" storage solutions with different API's and different quirks. Google just has one and its nails the use cases for queuing and storage. AWS micro-instance go poof, without any notice. Their NATS are known for being unreliable. Load balancers can't scale. Every service that I looked into, Google is way better than AWS.

matwood · on March 10, 2016

AWS definitely feels like a product that has grown organically and needs some house cleaning. GC was able to look at AWS, take those lessons, and improve from the start.

StreamBright · on March 10, 2016

Being late for the party has its perks. The also need more work on convincing people moving over and they do a good job with Spotify and other companies who talk openly about moving to GCP.

mayank · on March 10, 2016

> AWS has "three" queuing systems, "two" storage solutions with different API's and different quirks. Google just has one and its nails the use cases for queuing and storage.

This is not currently true. Google has: Datastore, Cloud SQL, Bigtable, BigQuery and Cloud Storage [1]. Each is intended for a different use case, as are Amazon's offerings.

[1] https://cloud.google.com/datastore/docs/concepts/overview#da...

jsolson · on March 10, 2016

(disclaimer: I work on Google Compute Engine)

For queuing AWS has at least SQS and SNS, both of which solve roughly half of what's commonly desired from a queuing system. Google Cloud PubSub coalesces both of these behind a single API that provides clear support for common queuing patterns (1:1, 1:n, n:1, n:n).

In terms of storage, I think what the OP was referring to was S3 versus Glacier when compared against Cloud Storage (which offers competitors to both S3 and Glacier within the same API -- just mark a bucket as Nearline as pay less for cold stored objects).

If you count all of the additional AWS services that are logical equivalents to the Google ones mentioned you have SimpleDB, RDS, DynamoDB, and Redshift. So yes, many options for many different use cases, but Google coalesces things under a single API where the "verbs" are the same (as in the case of blob storage).

defen · on March 10, 2016

For me, GCP comes with unquantifiable existence risk. As in, how do I know that it won't get shut down in 5 years when some VP sees that it's not bringing in as much money as it should? I trust Amazon more in this regard, and their offering is not "so bad" that I feel a need to switch.

noahl · on March 11, 2016

I understand your point, but GCP does address this in the terms of service (disclaimer: I work on Google Cloud):

7. Deprecation of Services

7.1 Discontinuance of Services. Subject to Section 7.2, Google may discontinue any Services or any portion or feature for any reason at any time without liability to Customer.

7.2 Deprecation Policy. Google will announce if it intends to discontinue or make backwards incompatible changes to the Services specified at the URL in the next sentence. Google will use commercially reasonable efforts to continue to operate those Services versions and features identified at https://cloud.google.com/terms/deprecation without these changes for at least one year after that announcement, unless (as Google determines in its reasonable good faith judgment):

(i) required by law or third party relationship (including if there is a change in applicable law or relationship), or

(ii) doing so could create a security risk or substantial economic or material technical burden.

The above policy is the "Deprecation Policy."

seanp2k2 · on March 10, 2016

Not just the whole platform, but specific APIs / services being deprecated / shut down too. Amazon isn't untouched by this problem (see also: VPC migration from EC2 Classic) , but I agree that given their reputation, I don't trust Google to keep even very useful widely-loved stuff around forever.

AdamN · on March 10, 2016

Existence risk here is HUGE. If GCP doesn't move the needle, Google will shut it down. AWS is a much more living organism and I can't see Amazon shutting it down before their drones take over ...

mikecb · on March 10, 2016

To shut down GCP, they would be shutting down the same services and infrastructure that power their own services. The dogfooding memo from years back is being taken to heart, and you're seeing more and more exposure of internal services and infrastructure.

The scary thing would be if it ends up reducing the rate of innovation because they worry about changing APIs or services too much; obsolete accumulates quickly when you're serving large numbers of people because most business want to write once run till it's dead. But this is true of any service that exposes anything but an extreme abstraction.

sofaofthedamned · on March 11, 2016

Exactly. If it wasn't making the money they'd just be shutting the public interface to this thing. It's not like it's orthogonal to their usual business, which is lots of computers and storage doing random workloads.

dcgoss · on March 10, 2016

Right after they sealed the huge deal with Spotify? Seems like a near-term shutdown is unlikely.

abrookewood · on March 11, 2016

Yes, unfortunately, GCE's reputation is tarnished by Google's approach to consumer-level services. I hope this changes over time - we need more competitors in this space.

abrookewood · on March 11, 2016

Long-time user of AWS. I've never had a NAT service fail, nor a micro-instance disappear. I haven't had much exposure to GCE, but the removal of zones etc is interesting. How do you guarantee that your servers aren't sitting in the same data centre?

dcgoss · on March 10, 2016

Terraform has worked incredibly well for us so far. Definitely deserves a look by anyone.

StreamBright · on March 10, 2016

I have created more than a single VM on AWS, if I add together all of the companies that I used to work for it is close to 8000 instances (5000+3000). I am not sure what I am doing wrong not running into clunky stuff but I guess it is automation that makes the difference. With projects like Ansible, Terraform or even aws cli creating and managing these large clusters is a breeze. I understand that you are using the UI and having trouble with UX but it does not mean that every user experiences that or they have the same sentiments or conclusions.

frostmatthew · on March 11, 2016

> Guess what AWS was developed for.

EC2 being developed for internal use is more myth than fact. The original idea was for internal use, but didn't exist much beyond a "short paper"[1] until it was green-lighted (by Bezos) as an external/sellable service.

[1] http://blog.b3k.us/2009/01/25/ec2-origins.html

pbreit · on March 10, 2016

Author provided a lot of information. I wouldn't call it a "hand wavy" article at all.

StreamBright · on March 10, 2016

Well, providing lots of information vs. providing meaningful in-depth analysis are very different. I see your point though.

slashdev · on March 10, 2016

Best networking is a dubious claim. On the bigger AWS instances you can bypass the hypervisor with SR-IOV. AFAIK you still can't do that with GCP. So if you really need maximum network performance AWS will likely win, especially on latency.

zbjornson · on March 11, 2016

On the contrary, this is one of the factors where GCP wins hands-down. In addition to the benchmark shown in the OP, check out one I posted on slide 14 of [1]. GCP achieves Gbit speeds on almost all instance types, and has higher speeds than AWS's biggest machines.

[1] https://docs.google.com/presentation/d/1B1jvWWh0ACaDv4ryEzLl...

slashdev · on March 11, 2016

I stand corrected. It also appears from those charts that the bigger GCP machines have more than 10gbps connection. It looks like 2x10gbps. That would explain why instances of all sizes are able to push more network traffic.

Mind you the benchmarks are for bulk transfer with 9001 MTU. With more jittery workloads with lots of small packets, like a webserver has to deal with, then you see the benefit of SR-IOV. So AWS may still have the advantage on some workloads, and some measures (maybe latency, maybe CPU usage per packet.) However, it's clear if Google can support SR-IOV in the future they will mop the floor with AWS on networking because their network infrastructure is obviously superior.

gricardo99 · on March 10, 2016

Anyone know where one can go to get a more quantitative, price/performance comparison of the various cloud services?

I know, lots of dimensions for that comparison, but probably picking a few dimensions, or letting users select a few, could give a reasonable ranking of services and prices.

gricardo99 · on March 10, 2016

answering my own question, this looks pretty cool (no affiliation): https://www.cloudorado.com/cloud_server_comparison.jsp

Eugr · on March 10, 2016

They forgot to mention another nice feature of GCE - custom machine types. You can choose number of vCPUs and amount of memory and also the amount of local (ephemeral in AWS speak) storage in 375GB increments.

This is a huge advantage. For instance, some of our jobs are computationally-intensive but relatively light on memory. In GCE I can run 32 core machine with 28GB RAM and it will cost me $887.68/month (without any sustained use discounts).

In AWS, the closest option I have is c4.8xlarge (36 cores / 60 GB RAM) which will cost $1,226.10/mo.

And if I need local (ephemeral) storage in AWS, I'm severely limited in instance types I can choose from, while in GCE you can attach local SSD to any instance type, including custom.

If you factor in per-minute billing in GCE and automatic sustained use discounts, we are talking about serious savings without any advance planning (required for using reserved instances).

EC2 still has some advantages - it supports GPU-equipped instances, for example, but for our computational pipelines GCE is a clear winner for now (and Cloud Dataproc is so much nicer than EMR!).

StreamBright · on March 10, 2016

This is good to know, I was running into the problem of not having the right instance type for my workload before. We ended up changing up our stack to make it fit better on AWS.

abrookewood · on March 11, 2016

That's pretty annoying ... happened to us as well though. We had a requirement for PostgreSQL with 512GB of RAM. Can't do that on AWS, so we had to shard the database.

Eugr · on March 11, 2016

To be fair, GCE tops at 208 GB RAM currently, so you won't get your instance type there as well.

abrookewood · on March 15, 2016

Wow .. that's even lower than AWS (max is 244GB): http://www.ec2instances.info/

partiallypro · on March 10, 2016

"Azure was eliminated since its a Linux-second cloud"

I have a feeling this person never really dove into Azure, and just wrote it off because it had Microsoft services built in; and of course various sysadmins still have a strong bias against Microsoft, especially if they are Open Source advocates. Seems like the entire article is mostly just comparing AWS to GCP instead of giving an actual overview of the cloud landscape, just brushes off every other provider (that's not AWS or GCP) without diving into an actually reason -why.-

pbbakkum · on March 10, 2016

To be more explicit on this point - I think Azure is a good product and the growth that Microsoft is seeing speaks for itself. However, for our use case, which is at least somewhat representative of a rapidly growing Linux-based startup, we didn't see any compelling advantage in using the Azure compute product (we may use one of their ML products in the future). Hence, it made sense to narrow the focus somewhat to what we thought were the best options. We eliminated Azure from our list based on the fact that our preliminary analysis didn't uncover any big advantages to use it over the other clouds, we wanted a cloud focused more on linux, and we don't currently use any products in the Microsoft ecosystem.

Yes I know Azure runs Linux, let me unpack that point: We had run previously on a cloud that wasn't focused on Linux hosting as their flagship OS. The effect we observed was that Linux was a second-class citizen in terms of features and performance. Perhaps its unfair to project that onto Azure, but I think its true that AWS and GCP think about Linux first, and Azure doesn't. Running a company on the cloud means relying on the compute product (GCE/EC2) as the foundation for your infrastructure, so we think this makes a difference.

It would be valuable for a lot of people to see more comprehensive stats across all clouds - I would love to see this personally and I think it would help people make better decisions about cloud infrastructure.

amaks · on March 10, 2016

Azure is also slower in block device speeds: http://www.cbronline.com/news/cloud/public/aws-vs-google-clo...

brianwawok · on March 10, 2016

Honest question. If you are a 100% linux shop, what do you gain with Azure? Do they have better linux chops than GCE or AWS?

StreamBright · on March 10, 2016

You gain the exact same things as you gain on other cloud platforms. You do not need to worry about the followings:

- data center power

- data center networking

- hardware provisioning

I don't think they are better thn GCE or AWS in terms of Linux support (maybe minor things) but they are not significantly worse either. What comes after that, pricing, machine types etc. is a different question. I see lots of companies using Azure because they got free credit for it.

shireboy · on March 10, 2016

To me, the more interesting aspect of any of the clouds is the PaaS offerings. I like the idea of not knowing or caring what OS my stuff is running on or how many VMs back it. Throw some Node.js up into a cloud and have it run and scale automatically without me having to harden, patch, and maintain machines. Same with data - flip a switch and have a geo-redundant well managed database service as opposed to configuring and monitoring such a beast myself.

I like Azure Web Apps, Sql, and Storage PaaS offerings, as well as hosted Mongo and similar 3rd party services. In general, my experience is that they are cheaper and better managed than most stuff my customers roll themselves.

I would suggest that any "100% anything" shop look at the PaaS offerings of the various clouds and see if the benefits outweigh the risks.

boulos · on March 10, 2016

I don't understand your conclusion (Azure is best?). App Engine is far and away the most battle-hardened PaaS, going from tons of tiny toy apps to the scale of Snapchat.

I don't disagree that having a PaaS and "marketplace" is important, but I don't see how you seem to conclude that GCP is less relevant here.

Disclaimer: I work on Compute Engine.

jacques_chester · on March 11, 2016

If you want a PaaS to run on your IaaS, Cloud Foundry can run on AWS, OpenStack, vSphere, Azure and, pretty soon, it should be on GCE.

Disclaimer: I work for Pivotal, which donates the majority of engineering effort on Cloud Foundry.

_delirium · on March 10, 2016

I'm not sure, but that's why I would want to read a comparison to find out! For example, if their Linux VMs had a price/performance advantage in some category (networking, CPU, whatever), that would be interesting information to know. Or if the tools were better/worse.

boulos · on March 10, 2016

I believe GCE has a price/performance advantage in basically all categories against both AWS and Azure (one wrinkle is we don't offer small slices of local SSD, instead pushing people to either our 375 GiB per unit local SSD or our Persistent Disk SSD product). For example:

https://cloudplatform.googleblog.com/2016/01/Happy-New-Year-...

As jsolson mentions elsewhere, we run PerfKit and compare ourselves all the time (both on raw performance and price/performance).

Disclosure: I work on Compute Engine.

brianwawok · on March 10, 2016

That was on conclusion as well. I launched my baby startup on Digital Ocean and moved to GCE when I needed to do things like floating IPs (before DO had such a thing).

Overall GCE has worked well. I did have trouble fighting the beta logging agent (used 300+ MB of ram and 5% of my CPU when under no load), but logging is in beta so I guess I can't complain :)

kyledrake · on March 10, 2016

I see a lot of these sort of articles, and I really have to bring this up, because I don't understand why people don't realize it when they decide where to host their infrastructure:

Bandwidth on GCP (and AWS and most of the other providers) is really, really, really expensive. $0.12 per gigabyte, upwards of $0.19 per gigabyte for Asia. Paying $0.12 for every time you send an Ubuntu ISO is crazy. A bored script kiddie could just run up your bandwidth costs to thousands of dollars just for the hell of it. A DDoS could make you declare bankruptcy.

I have a server with OVH I can theoretically push 100+TB per month through and only pay $100. I get DDoS protection included. It may not be perfect DDoS, but it's not the $6000/mo I'd need to pay for Cloudflare to get the same thing with GCP (I need wildcards), plus the $0.12 per GB for anything not cached by them.

I know from people in the industry that they pay less than a cent per GB. Google, if you want to differentiate your cloud services, start charging better prices for bandwidth and do something about DDoS (project shield should be baked into your offerings). $0.02 would be reasonable and you'll still make a profit. That goes for all the other "great value" cloud services that are actually very expensive for anybody doing work that actually needs bandwidth on the internet.

virtuallynathan · on March 10, 2016

To put those numbers into standard transit pricing using US-East (AWS):

Up to 10 TB / month - $30/Mbps

Next 350 TB / month - $16.50/Mbps

Traffic within the same Region - $3.50/Mbps

Traffic to another region - $6.50/Mbps

The outbound traffic starts at $45/Mbps in AsiaPac and $85/Mbps in Latin America.

In the US and most of the EU, at >1Gbps (~350TB/mo) volume, transit pricing is well under $1/Mbps. Most of Asia should be under $10/Mbps, and south america is quite a bit higher, but not $70/Mbps.

See: https://www.telegeography.com/press/press-releases/2015/09/0...

http://blog.telegeography.com/bandwidth-and-ip-pricing-trend...

kyledrake · on March 10, 2016

$1/Mbps seems closer to what I would expect the price to be. 30x over market is an astonishing price markup. I don't understand how any startup I've built that used a lot of bandwidth could take that risk on infrastructure. I would actually be concerned about being successful.

sofaofthedamned · on March 11, 2016

I used to be with OVH, but now my servers are with online.net and I see what you are saying.

If I had a startup - i'd have a few of these servers as the baseline, then i'd scale up with AWS. I assume i'd be using Kubernetes for this or something similar. Basically i'd be using the cloud for what it's supposed to be - taking the extra load off.

Does anybody do this?

gizzlon · on March 10, 2016

"A bored script kiddie could just run up your bandwidth costs to thousands of dollars just for the hell of it"

This is really interesting and I wonder if it's true? Do you know of this happening? I don't. Is that just because no-one thought about it or is it maybe not as easy as it seems? Or is there another reason?

The bandwidth costs under normal circumstances should be trivial to calculate, right? I guess many services do not serve that much outgoing data, especiall after caching. But, of course, use the right tool for the job etc :) If the job is serving ISOs, then maybe PaaS it not the right tool.

kyledrake · on March 10, 2016

We're entering a new phase of the web, where almost every home internet is going to have 1Gbps connections, upwards of 10Gbps in some areas (US Internet has already started providing 10Gbps to home customers in Minneapolis).

The idea that datacenter egress bandwidth can continue to be this expensive is ridiculous. A company using AWS or GCP is missing out on opportunities that are about to be created by very fast internet connections. It's an entire "disruptive tech" innovation that these cloud services will be ineligible to compete with (16-30x markups!) I've run the numbers on switching to AWS and GCP numerous times, and the numbers never add up to something I could sustain for Neocities.

I might consider AWS if I'm just making internal apps for a giant company that thinks it's a great deal because their previous vendor was charging 10x more, but as a small startup doing something internet-facing, there's no way I could ever operate safely with that infrastructure risk. I would need success insurance or something. Short term I'd be fine, but long term AWS would be eating my profit margin and possibly even my company.

To say nothing of malicious bandwidth leeching attacks. It's just dangerous all around. I'm not even sure this has a name yet - Economic Service Attack? I remember reading a story of how GreatFire got DDoSed by China and got a $10-30k+ bill from Amazon because of it.

The rest of their offerings are more or less reasonable (their EC2 instances are a bit overpriced IMHO, but reasonable). But the bandwidth prices are just simply not. GCP could get massive switchover from AWS if they simply lowered their bandwidth egress prices.

It's fairly telling to me, lastly, that AWS/GCP/etc. charge nothing for incoming bandwidth and then charge a LOT for outgoing. Just making a backup of the sites on Neocities from S3 to another service would cost over $20 each time I did it (I can do it based on timestamps if I track all the files stored there in a database (double databases == yuck), but I'd much rather have access to something like integrated rsync support to make this process simpler and much more efficient).

gizzlon · on March 10, 2016

I'm not arguing that cloud is always the best option, but clearly there are many examples where profit per client far exceeds the cost per client. And btw, bandwidth is probably the simplest thing to calculate :) There are pros and cons with cloud, no doubt, but you seem to be ignoring the pros.

Anyway, on to the much more intereting question of misuse. I found these links interesting:

http://serverfault.com/questions/231116/amazon-ec2-bandwidth...

https://forums.aws.amazon.com/thread.jspa?messageID=294632 (linked from the 1st)

Seems like the answer is that you must deal with it yourself, or get cloudflare or similar to help you. I'm my limited experience, other most data centers / hosting providers charge for traffic, AWS etc. are just more expensive.

Edit: This is not "Denial Of Service Attack" btw, it's a "Bankrupt by Cloud Costs Attack" :D

jacques_chester · on March 11, 2016

> Bandwidth on GCP (and AWS and most of the other providers) is really, really, really expensive.

It's cheaper if you use the CDNs they provide for this purpose.

schmichael · on March 10, 2016

Edit: I can't read or delete this post. Ignore me.

kyledrake · on March 10, 2016

http://i.imgur.com/RWRJ27u.png

I see GB not TB?

outside1234 · on March 10, 2016

This should really be titled "A comparison of AWS and GCP."

It totally wrote off Azure (2nd in market size) because its a "Linux second" cloud (what does that even mean in a virtualized world).

Also, you forgot to analyze support and SLAs around functionality. Good luck with GCP when something goes wrong or they decide to sunset a feature.

_pfwi · on March 10, 2016

Support is much better on GCP than AWS. Reference: https://news.spotify.com/us/2016/02/23/announcing-spotify-in...

One of the reasons why Spotify went with Google Cloud is because of their superior support.

outside1234 · on March 10, 2016

Spotify is one datapoint, and I hate to be cynical, but was probably paid off (in free / drastically reduced priced cloud services ala Netflix on AWS) by Google to write all of that.

Even if not, they are such a large well known name that they probably got special treatment. The real proof in the pudding is the support that the 99% get, not the special case 1%.

vgt · on March 10, 2016

At the end of the article, Quizlet noted their experience with GCP support:

"Overall it was a smooth transition and we're very glad that we picked GCP as our provider - we've received excellent support and scaled up our deployment with few incidents."

dastbe · on March 10, 2016

Could you quote the line you're referencing? The key differentiator in that article sounds like data services (dataproc et al) and I don't see anything mentioning support.

unusximmortalis · on March 10, 2016

ok but the point about the title still stands :)

   <<This should really be titled "A comparison of AWS and GCP.">>

neom · on March 10, 2016

This is the best post on this subject I've read in awhile. If you're building your application in a modern manner, half the stuff I see in ridiculous comparison posts shouldn't matter. Disposable infrastructure is a thing. If you're making a choice of "the best cloud" (k...) with that fact in mind you should mostly be considering cost over time and capability to innovate on core offerings. Given unit economics and the continual drop in price of commodity hardware, everything is going to become utility pricing, and services like lambda will help you optimize your costs. Personally I'd put all my chips on AWS over GCP.

Also, nice to see someone finally identify DigitalOcean as a B2C provider.

vgt · on March 10, 2016

Another great article previously written by Quizlet on their Google Cloud efforts:

https://quizlet.com/blog/287-million-events-per-day-and-1-en...

TL;DR: One engineer leveraged Google BigQuery's Streaming API to build a pipeline to analyze ~300 million events per day in realtime.

manishsharan · on March 10, 2016

I don't agree with the OP ; however GCP's sub-hour billing is nice. I need to process a lot of tasks that take longer that 5 minutes which makes them unsuitable for AWS Lambda. With GCP , I only end up paying for a maximum of 10 ~15 minutes-- which is a nice cost saving. Dear AWS -- if you are reading this, match this and I will never ever leave you, not even for 10 minutes.

edit : typo

sunsu · on March 10, 2016

Unless you rely on UDP!!! https://code.google.com/p/google-compute-engine/issues/detai...

We had a gold level support ticket open about this for months and they recently responded that they are making it a "feature request". Yes, proper UDP packet reassembly is a " feature request".

StreamBright · on March 10, 2016

Not all the clouds are equal. :) DC networking is much fun, even when you are on SDNs.

pfarnsworth · on March 10, 2016

When we were using GCP, it would live-migrate our db almost once a day, which caused us problems that were hard to figure out. I don't believe the frequency of a live-migrate was anywhere near as close on AWS. I don't know if GCP still does that as often since this was about 1-2 years ago.

boulos · on March 10, 2016

The rate of host failure on AWS versus the rate of our live migrations isn't apples-to-apples. There are a lot of reasons we perform live migration, and host failure is just one of them (see https://cloudplatform.googleblog.com/2015/03/Google-Compute-...).

While I'm surprised by your "almost once a day" (seems high), we have also made a lot of improvements in the last year to make them even less impactful.

Disclosure: I work on Compute Engine.

mwcampbell · on March 10, 2016

I wonder why Quizlet didn't just stick with Joyent but switch from SmartOS to the newer Linux-based infrastructure containers and/or Docker containers. Joyent put a lot of effort into reviving LX-branded zones on Illumos precisely to address the concern that this article raises with using an OS other than Linux.

Also, why dismiss DigitalOcean as a niche provider for hobbyists? The simple pricing, with lots of data transfer included, should appeal to a lot of businesses too.

bcantrill · on March 10, 2016

(I'm the CTO of Joyent.)

Sadly (and despite repeated pleading), Quizlet didn't bother to do anything -- at all -- with LX-branded zones. This was a bit dispiriting because they were part of the motivation for the work (namely, a customer of ours that was upfront with the "impossible" demand of the performance they saw in a SmartOS container but with their Linux stack). I think that even by the time the LX-branded zone work was clearly on a production trajectory (i.e., late 2014), they had already implicitly decided to move away from Joyent to a more established brand. That's fine, and I don't fault them for it (and I definitely appreciate their kind words for Joyent in general and our support and engineering teams in particular) -- but I do wish they'd been more upfront about their rationale.

wmf · on March 10, 2016

It seems clear from the article that they want actual Linux, not "I can't believe it's not Linux" LX-branded zones, not Linux running in a VM on Solaris, and not Linux running in a VM on Hyper-V. This is somewhat irrational, but people are that way.

cortesoft · on March 10, 2016

My main concern with picking GCP would be that Google has a history of shutting down projects. I feel like they are more likely to shutter GCP than Amazon is with AWS.

mwfj · on March 10, 2016

http://bits.blogs.nytimes.com/2015/10/30/amazon-shutting-dow...

http://www.cpcstrategy.com/blog/2015/08/amazon-product-ads-d...

http://www.cpcstrategy.com/blog/2015/10/amazon-text-ads-disc...

http://www.cnbc.com/2015/09/09/amazon-discontinues-disappoin...

http://uk.businessinsider.com/amazon-discontinues-amazon-ele... :)

StreamBright · on March 10, 2016

Nice, is there anything in the AWS side of the business that was shut down? I can't remember if there was.

mwfj · on March 11, 2016

https://aws.amazon.com/devpay/

See e.g. https://forums.aws.amazon.com/thread.jspa?threadID=162701

jasonjei · on March 10, 2016

``Google Cloud CTO Urs Hölzle has said publicly that `One day, this could be bigger than ads. Certainly, in terms of market potential, it is.' Diane Greene now leads the division that includes GCP---from our perspective it’s nice to know that the cloud division has a seat on the Google Board of Directors.''

It seems that if Google believes in GCP being almost as big as AdWords, the likelihood of GCP Compute being shut down is as likely as GMail for Business being shut down. Not saying that it couldn't happen, but with Spotify and Quizlet using GCP instances, I find it highly unlikely the compute platform would go away, especially with paying users. A free product on the other hand could die on a whim.

brazzledazzle · on March 10, 2016

I think if they demonstrate that they dogfood it the way Amazon does it will go a long way. Based on talks I've seen I know some internal teams use it (or at least did when it was free for them).

boulos · on March 10, 2016

We don't talk about our own usage a lot, because we mostly care about customers (like Quizlet!) talking about their usage in their own terms. However, I personally worked with Chrome's Clusterfuzz team when we launched Preemptible VMs (https://cloudplatform.googleblog.com/2015/05/Introducing-Pre...) and they're still there today.

Disclosure: I work on Compute Engine (and launched Preemptible VMs).

profeta · on March 10, 2016

they also used google reader for gmail (it was integrated on the top of the inbox). it didn't stop them from killing it.

boulos · on March 10, 2016

This has been said a lot, but:

- Reader wasn't a product that was explicitly bringing in lots of revenue, Cloud is (and we just hired Diane to run the business).

- Lots of us regret that Reader was shut down ;).

Disclosure: I work on Compute Engine.

profeta · on March 11, 2016

The point is not the entire product. Everyone is afraid that the product will still be around, but the feature that drove them to it will be killed.

and there is no way you can disprove that fear from google products. You guys do that even with search.

wowoc · on March 10, 2016

My main concern with GCP is its unreliability. BigQuery API randomly giving 40x/50x errors (that has been happening for over a year), signed URL API returning series of 50Xs every few days, CPU on instances going up to 100% for some unknown reason (they stay in that state until manually rebooted), and many other bigger or smaller issues. And they never respond to your questions.

The UI is also weird (at least by my taste), for example it is not possible to search instances by their addresses, it is not possible to spin up more than one instance at once, and so on. AWS has an ugly console, but it feels more productive.

boulos · on March 10, 2016

Sorry to hear you've had a bad time. All of our Generally Available services have an SLA, including BigQuery:

https://cloud.google.com/bigquery/sla

which has a 99.9% monthly uptime target. Are you seeing errors more often than that?

Additionally (and this isn't required), do you have a support package? I'm curious where you've been asking questions without response. If it's StackOverlow, that is best effort, but we do really try.

Disclosure: I work on Compute Engine.

wowoc · on March 16, 2016

I asked my co-worker who usually contacts support, and in most cases we actually got some response (mine never got any reply though), but it was always full of buck passing. Like saying that these routing issues you're writing about must be caused by your browser, etc., then ignoring further questions.

atombender · on March 10, 2016

Google's cloud does look extremely promising, but the one thing blocking us from migrating (which we would ultimately like, I think) is the lack of PostgreSQL support in CloudSQL. AWS's RDS is mature and pretty great.

nottednelson · on March 10, 2016

In a similar vein: http://journalofcloudcomputing.springeropen.com/articles/10....

vgt · on March 10, 2016

And some of Spotify's experiences with Google Cloud:

https://labs.spotify.com/2016/03/10/spotifys-event-delivery-...

dastbe · on March 10, 2016

Does this line strike anyone else as just not true based on their numbers?

  Quizlet is now the ~50th biggest website in the U.S.

gregmac · on March 10, 2016

Actually 150th in the US and 794th world-wide, according to Alexa: http://www.alexa.com/siteinfo/quizlet.com

Not to say 150th isn't impressive or likely a lot of traffic.. but if you're going to post a number and claim like that, it should be accurate.

jvehent · on March 10, 2016

> Not to say 150th isn't impressive or likely a lot of traffic..

Ballpark, I'd say that's between 700 and 1,000 hits/second on the main frontend. I sort of doubt either AWS or GCP is so much faster than the other for this kind of load.

boulos · on March 10, 2016

They mentioned 200k/minute (just over 3k/second) in the post.

dberg · on March 10, 2016

Yes http://www.alexa.com/siteinfo/quizlet.com

vdaniuk · on March 10, 2016

Don't trust Alexa results, they are mostly useless.

Quantcast is usually more accurate and Quizlet US Rank on Quantcast is 37.

mirashii · on March 10, 2016

On the other hand, https://www.quantcast.com/quizlet.com

dastbe · on March 10, 2016

oof, someone there needs to look into their mobile experience. The homepage is fine but the minute I search for something it goes to junk.

http://imgur.com/TcuxQ6t

tingley · on March 10, 2016

The disparity in disk snapshot performance was surprising.

dcgoss · on March 10, 2016

As others have said, the simplicity of GCP has made it a pleasure to use.

It has a great UI (material design), and the UX makes sense (the dashboard shows you a summary of your resources, resources are organized by project, notification/status icon animates when resources are changing, etc). Going back to the AWS dashboard feels clunky.

There aren't a million different image types for each region and zone - simple, autoupdated base images are available for Ubuntu, CoreOS, etc.

It has easy to understand base machine types and custom machine types with tailored specs can be created if needed. Product/service naming is clear (ex. Compute Engine vs EC2, Cloud Storage vs S3).

Addons like one-click secure web SSH sessions and Cloud Shell are amazing, no more key pairs to worry about.

Google Container Engine, with a hosted Kubernetes master, is a great concept and more transparent than closed source AWS ECS.

Their on-demand per minute pricing with sustained usage discounts is almost always significantly cheaper than AWS on demand instances, and your discounts are given automatically. Try the two calculators for yourself: Google (https://cloud.google.com/products/calculator/), AWS (https://calculator.s3.amazonaws.com/index.html).

Also, I have seen Google engineers all over HN (look at the comments on this post!) and other sites responding, commenting, and blogging - they seem actively engaged while I have seen very little from AWS.

That is not to say GCP is without problems. AWS IAM is still superior - it is easier to grant access to specific services for specific users, or have an account for a web server to upload to S3. Part of that is due to the fact that there is more plug-and-play tooling available for AWS today - boto comes to mind (boto GCP integration isn't as seamless as with AWS), as well as WAL-E. AWS's new certificate manager with free, auto-renewed SSL certs and installation on EC2 is awesome. S3 is cheaper than Google Cloud Storage. AWS has a longer free tier.

Luckily, tools like Terraform allow us to mix and match services from each cloud.

atombender · on March 11, 2016

> It has a great UI (material design) ...

I disagree here. I find Google's UI to be the clunkier one. Sure, AWS is positively antique, but it's clean, readable, understandable and predictable in a homely Web 2.0 (or even 1.0) way.

Google's UI seems haphazardly put together by comparison, from the super tiny font to how common tasks are too often hidden away — the hamburger menu and the project selector being two examples. The progress of a task is also often hidden away and fairly inscrutable, such as when creating a container cluster.

When I started looking into the container support, I found that there's basically no web console for it. You can create clusters and see some summary of status about the cluster, but you can't see pods, replication controller settings, etc. — it turns out that the "Container Engine" is little more than a prebuilt Linux image with a startup shell script that starts up Kubernetes. AWS's ECS is the same way, but at least it has screens for creating jobs, adjusting resource settings and so on.

Google Cloud seems pretty great, but the web console definitely has a long way to go.

dcgoss · on March 11, 2016

I believe Container Engine does a little more than you are giving credit for. Its main feature is that it hosts the Kubernetes master - you don't have to worry about setting up etcd, high availability, or anything else in regards to the master or connecting the nodes. Kubernetes also comes with a UI preinstalled on the master, allowing you to launch services and see info regarding pods, replication controllers, and more, as well as basic system resource usage: http://kubernetes.io/docs/user-guide/ui/

atombender · on March 11, 2016

I didn't know about the UI, thanks. But why isn't it built into the web console?

dcgoss · on March 18, 2016

Not sure. Perhaps that was a low priority because it is already running/can be deployed on Kubernetes fairly easily.

noahl · on March 11, 2016

(Disclaimer: I work on Google Cloud)

Thanks for your great review! We really appreciate it.

I hate to nitpick one point in what you said, but are you sure that S3 is cheaper than Google Cloud Storage? Glacier is definitely cheaper than we are, but they offer ~4 hour object retrieval latency. If you consider only services with real-time retrieval, I think we stack up quite well in both performance and price :).

dcgoss · on March 18, 2016

Hey! Thanks for the response - great example of the kind of employee interaction I was talking about.

Here are the calculators that show S3 is cheaper than GCS - hopefully I didn't type anything in wrong. I used 1 TB of storage, 10 million get operations and 10 million post operations, and 200 GB egress.

S3 monthly: $102.63

GCS monthly: $160.62 (about 56.5% more expensive)

GCS: https://cloud.google.com/products/calculator/#id=cddb4e9a-f2...

S3: https://calculator.s3.amazonaws.com/index.html#r=IAD&s=S3&ke...

kalkin · on March 10, 2016

Strange to do a whole long section on price-comparison without talking about AWS's spot instances, which are often much cheaper than the reserved instances.

fhoffa · on March 10, 2016

Note that Google Cloud offers preemptible instances, up to 70% cheaper than the normal ones.

https://cloud.google.com/preemptible-vms/

It would be interesting to see Quizlet thoughts on GCP preemptible VMs vs AWS spot instances and why they think they are better (or not?), but that could be the subject of a whole different post.

(Disclaimer: I work at Google - https://twitter.com/felipehoffa)

kalkin · on March 10, 2016

Interesting. I did a quick search for "GCP spot instances" and didn't find anything. I'd be interested in a comparison too!

brianwawok · on March 10, 2016

True, but how do you quantify the work? Real world on AWS is

Buy X capacity reserved

Buy Y capacity spot instance

Buy Z capacity on demand to fill in the peaks

And it becomes a function of putting in more time = cheaper total bill.

GCE workflow is:

Buy the instances you need, bill will work out at the end of the month.

blakesterz · on March 10, 2016

Interesting read. Can someone explain this point to me? "Software-defined networking means that google.com appears to be one hop away" One hop from...? Everywhere? Each server? How does that help? I understand 'traceroute' but not where that single hop to Google comes in and why that's great.

_pfwi · on March 10, 2016

Yep. That's right.Google has built their custom networking. It's like when a machine A sends packet to machine B, the packet is handed over to Google SDN (Software-defined Network), which carries to over their private network and then injects the packet into machine B. Googles network is an order of magnitude faster compared to AWS.

StreamBright · on March 10, 2016

You mean if AWS has 50ms data center latency than GCP has 5ms?

https://en.wikipedia.org/wiki/Order_of_magnitude

_pfwi · on March 10, 2016

Google’s network is so fast, however, that this kind of multi-cloud might just be possible. To illustrate the difference in speeds, we ran a bandwidth benchmark in which we copied a single, 500 Mb file between two regions. It took 242 seconds on AWS at an average speed of 15 Mbit/s, and 15 seconds on GCE with an average speed of 300Mbit/s. GCE came out 20x faster.

Reference: https://gigaom.com/2013/03/15/by-the-numbers-how-google-comp...

otterley · on March 11, 2016

What did you use to copy the file? If you used scp, your results are invalid and you must re-run the test again with a different protocol: you can't take advantage of large TCP window sizes with it because ssh uses a small, fixed send/receive buffer size. scp will penalize the performance of large transfers over high-latency links, even if the bandwidth between them is high.

pbbakkum · on March 10, 2016

Sure - this isn't really a dimension of comparison, just something that I found interesting / surprising. It seems like SDN is probably the future, and this is an illustration of how its different.

StreamBright · on March 10, 2016

He probably "measured" it within GCP. :)

dcgudeman · on March 10, 2016

From 2007 to 2015 Quizlet ran on Joyent, a cloud platform built on SmartOS, which is a Solaris fork (Joyent also offers Linux hosting).

I would like know why they made that choice.

tshtf · on March 10, 2016

Security conscious users would be best served by staying with AWS EC2 right now. GCP lacks all of the IAM functionality in AWS.

boulos · on March 10, 2016

https://cloud.google.com/iam

This is very recent though, and not considered Generally Available yet across the platform (meaning fully hardened, supported, and backed by an SLA).

Take another look!

Disclosure: I work on Compute Engine.

sharpy · on March 10, 2016

Interesting. It seems that Google IAM is very similar to AWS IAM yet very different in one aspect. In AWS, you can define exactly what subset of APIs/resources are accessible to a role, which is very flexible, but also can be very confusing. It seems Google has taken the approach of pre-defining sensible roles.

shaftway · on March 10, 2016

I've just got to say, this is probably the best article I've ever read, all thanks to Cloud-to-Butt.

betaby · on March 10, 2016

Well, "Compute Engine networks do not support IPv6 at all."

_0nac · on March 10, 2016

"However, Google is a major advocate of IPv6 and it is an important future direction."

https://cloud.google.com/compute/docs/networks-and-firewalls

merb · on March 10, 2016

The Best Cloud is probably a combination of all. Since that would give you the highest Availability.

abalos · on March 10, 2016

This really sounds like a "theoretical" best. Practically this sounds like a huge burden on developers.

keithb- · on March 10, 2016

Not trying to single you out (your statement is totally on point), but generally we need a word for this developer-centric view of Cloud. My kids write javascript and might be considered developers, but they don't have the faintest idea about operating systems, containers, services, or anything non-superficial in the stack let alone IAM, I18N, consensus, etc. Cloud is not any less of an operational model as it is a development platform.

Anyone else have to read "Madame Bovary" in high school? Maybe this focus on developer is a form of provincialism.

Come on: someone smarter than me must have coined this already.

TheGuyWhoCodes · on March 10, 2016

I actually know of some companies that run part of their infrastructure on AWS and part on GCP.

There are companies that help you run "seamlessly" on any cloud provider you want so in theory you can use them to balance your services between cloud provider for cost or performance.

jessaustin · on March 10, 2016

So much drama for reserved instances. Different prices for different terms of service is a centuries-old business practice. If it's not for you, then don't buy it.

wmf · on March 10, 2016

Pointing out that automatic discounts are strictly better than reserved instances sounds like actionable information, not drama IMO.