I've been overall very impressed with the direction of Google Cloud over the las...

MichaelGG · on April 12, 2016

I'm totally impressed with gcloud. Slick, smooth interface. Cheap pricing. The fact the UI spits out API examples for doing what you're doing is really cool. And it's oh-so-fast. (From what I can tell, gcloud's SSD is 10x faster or 1/10th the cost of AWS.)

And this is coming from a guy that really dislikes Google overall. I was working on a project that might qualify for Azure's BizSpark Plus (they give you like $5K a month in credit), and I'd prefer to pay for gcloud than get Azure for free

louis-paul · on April 12, 2016

Same, was considering GCP for the future, but this is bad. I'm not using them without some kind of redundancy with another provider. I hope they write a good post-mortem, these are always interesting at large scale.

Gratsby · on April 12, 2016

How bad is it really? They started investigating at 18:51, confirmed a problem in asia-east1 at 19:00, the problem went global at 19:21, and was resolved at 19:26.

They posted that they will share results of their internal investigation.

That kind of rapid response and communication is admirable. There will be problems with cloud services - it's inevitable. It's how cloud providers respond to those problems that is important.

In this situation, I am thoroughly impressed with Google.

louis-paul · on April 12, 2016

It's bad because it concerns all their regions at the same time, while competing providers have mitigations against this in place. AWS completely isolates its regions for instance [1], so they can fail independently and not affect anything else. That Google let an issue (or even a cascade of problems) affect all its geographic points of presence really shows a lack of maturity of the platform. I don't want to make too many assumptions, and that specific problem could have affected AWS in the same way, so let's wait for more details on their part.

The response times are what's expected when you are running one of the biggest server fleets in the world.

1: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-re...

Gratsby · on April 12, 2016

Expecting that problems won't happen with a cloud provider that happen everywhere else is a pipe dream. They might be better at it because of scale, but no cloud provider can always be up. It happened at Amazon, now it's happened at Google. Eventually, finding a provider that never went down will be like finding the airline that never crashed.

Operating across regions decreases the chances of downtime, it does not eliminate them.

> The response times are what's expected when you are running one of the biggest server fleets in the world.

That may be true, but actually delivering on that expectation is a huge positive. And more than having the right processes in place, they had the right people in place to recognize and deal with the problem. That's not a very easy thing to make happen when your resources cross global borders and time zones.

Look at what happened with Sony and Microsoft - they were both down for days and while Microsoft was communicative, Sony certainly was not. Granted, those were private networks, but the scale was enormous and they were far from the only companies affected.

louis-paul · on April 12, 2016

> It happened at Amazon

AWS has never had a worldwide outage of anything (feel free to correct me). It's not about finding "the airline that never crashed", it's finding the airline whose planes don't crash all at the same time. It's pretty surprising coming from Google because 15 years ago they already had a world-class infrastructure, while Amazon was only known for selling books on the Internet.

Regarding the response times, I recognize that Amazon could do better on the communication during the outage. They tend to wait until there is a complete failure in an availability zone to put the little "i" on their green availability checkmark, and not signal things like elevated error rates.

Gratsby · on April 12, 2016

Here's an example from this thread: http://status.aws.amazon.com/s3-20080720.html

louis-paul · on April 12, 2016

I stand corrected, my statement was too broad.

AWS had two regions in 2008 [1]. That was 7 years ago, and I think you would agree that running a distributed object storage system across an ocean is a whole different beast than ensuring individual connectivity to servers in 2016.

1: https://aws.amazon.com/about-aws/global-infrastructure/

discodave · on April 12, 2016

> AWS completely isolates its regions

Yeah... just don't look too closely under the covers. AWS has been working towards this goal but they aren't there yet. If us-east-1 actually disappeared off the face of the earth AWS would be pretty F-ed.

MichaelGG · on April 12, 2016

Our servers didn't go off, just lost connectivity. Same has happened to even big providers like Level3. Someone leaks routes or something and boom, all gone.

I'd be surprised if AWS didn't have a similar way to fail, even if they haven't. This is obviously a negative for gcloud, no doubt, but it's hardly omg-super-concerning. I'm sure the post-mortem will be great.

shoyer · on April 12, 2016

Actually, according to the status report, they confirmed that the issue affected all regions at 19:21 and resolved it by 19:27. That's six minutes of global outage.

Disclaimer: I work for Google (not on Cloud).

williamstein · on April 12, 2016

The outage took my site down (on us-central1-c) at 19:13, according to my logs, so it was already impacting multiple regions by 19:13. (I have been using GCP since 2012 and love it.)

Gratsby · on April 12, 2016

Thank you, I missed that on my first reading - I saw the status update was posted at 19:45, not the content within it stating the issue was resolved at 19:27. I updated my parent comment.

raymondh · on April 12, 2016

I concur. The response was first rate.

Behind the scenes, I'm sure they will iterate on failure prevention and risk analysis.

dcgoss · on April 12, 2016

Absolutely. GCP has been fantastic.

troymc · on April 12, 2016

Amazon S3 went down globally on July 20, 2008: http://status.aws.amazon.com/s3-20080720.html

BlackjackCF · on April 12, 2016

Sadly, I think a global outage was more acceptable in 2008 than it is now...

Knowing Google though, they'll learn their lesson on how to improve their entire workflow right quick.

nkristoffersen · on April 12, 2016

Keep in mind, S3 was still a very new project at that point. Launched March 2006 and the first of its kind.

https://en.wikipedia.org/wiki/Amazon_S3

sandGorgon · on April 12, 2016

Can you talk about this? I have been spectacularly unsuccessful at using ECS (and currently run my VMs on a vanilla Debian ec2 instance)

jordanthoms · on April 12, 2016

Switching from ECS to GKE (Google Container Engine) currently. Both seem overcomplex for the simpler cases of deploying apps (and provide a lot of flexibility in return), but I have found the performance of GKE (e.g. time for configuration changes to be applied, new containers booted, etc) to be vastly superior. The networking is also much better, GKE has overlay networking so your containers can talk to each other and the outside world pretty smoothly.

GKE has good commandline tools but the web interface is even more limited than ECS's is - I assume at some point they'll integrate the Kubernetes webui into the GCP console.

GKE is still pretty immature though, more so than I realized when I started working with it. The deployments API (which is a huge improvement) has only just landed, and the integration with load balancing and SSL etc is still very green. ECS is also pretty immature though.

merb · on April 12, 2016

The Problem is that GCP doesn't run an RDS service with PostgreSQL. And external vendors are mostly more costly than AWS RDS. Especially for some customer homepages where you want to run on managed stuff as cheap as possible.

brianwawok · on April 12, 2016

This is sad for sure. The new MySQL cloud 2.0 is really good, and if you use a DB agnostic ORM you can probably make MySQL work for quite a while. Sad to lose access to all the new PG features though, and I would love if Google expanded their cloud SQL offerings..

bpicolo · on April 12, 2016

This is what made me use aws as well

huslage · on April 12, 2016

I'm admittedly biased, but have you checked out Docker Cloud? http://cloud.docker.com

vhab · on April 12, 2016

While not Docker Cloud specifically, when we eyeballed UCP we found it very underwhelming when pitted against Kubernetes.

To us it appeared yet another in a sea of many orchestration tools that will give you a very quick and impressive "Hello World", but then fail to adapt to real world situations.

This is what Kubernetes really has going for it, every release adds more blocks and tools that are useful and composable targeting real world use (and allow many of us crazies to deal with the oddball and quirky behavior our fleet of applications may have), not just a single path of how applications would ideally work.

This generally has been a trend with Docker's tooling outside of Docker itself unfortunately. Similarly docker-compose is great for our development boxes, but nowhere near useful for production. And it doesn't help Docker's enterprise offerings still steer you towards using docker-compose and the likes.

akanet · on April 12, 2016

Not to bash, but the page you linked is classic Docker - it says literally nothing about what "Docker Cloud" is.

"BUILD SHIP & RUN, ANY APP, ANYWHERE" is the slogan they repeat everywhere, including here, and it means even less everytime they do it. What IS Docker Cloud? Is it like Swarm? Does it use Swarm? What kinds of customers is Docker Cloud especially good at helping? All these mysteries and more, resolved never.

jordanthoms · on April 12, 2016

I hadn't heard of it, actually. However it doesn't seem to support GCP which removes it from contention for us unfortunately.

sandGorgon · on April 12, 2016

so am I (I'm YC alumni) .. but RDS is too important for us to move away from it. Let me put it this way - if you had an RDS equivalent in Docker Cloud, lots of people would switch. Docker is more popular than you know.

Heroku should be an interesting learning example to the tons of new age cloud PAAS that I'm seeing. Heroku database hosting has always been key to adoption.. to an extent that lots of people continue to use it even after they move their servers to bare metal. The consideration and price sensitivity to data is very different than app servers.

kahwooi · on April 12, 2016

I believe this is tutum that they bought some time ago. I tried tutum before with Azure. After deleting the Containers from tutum portal, it does not clean everything from Azure. Today the storage created by tutum is still in my Azure storage. LOL.

smargolis · on April 12, 2016

Docker Cloud still requires BYO cloud, however.

dcgoss · on April 12, 2016

For the record, the Kubernetes dashboard comes pre-installed on all masters on GKE. So the UI is there, albeit not integrated into the Gcloud console.

sk5t · on April 12, 2016

Seconded--I can tell the ECS documentation is trying to help, but the foreign task/service/cluster model + crude console UI keeps telling me to let my workload ride on EC2 and maybe come back later.

sandGorgon · on April 12, 2016

what I figured out much later was that ECS is a thin layer on top of a number of AWS services - they use an AMI that I can use, ec2 VMs that I can run myself and Security Groups + IAMs that I can create by my own.

But the way they have built the ECS layer is very very VERY bad.. and I have an unusually high threshold for documentation pain.

nzoschke · on April 12, 2016

I work on Convox, an open source PaaS. Currently it is AWS only. It sets up a cluster correctly in a few minutes. Then you have a simple API - apps, builds, releases, environment and processes - to work with. Under the hood we deploy to ECS but you don't have to worry about it.

So I do agree that ECS is hard to use but with better tooling it doesn't have to be.

I'm also a big fan of how GKE is shaping up.