App Engine datastore: 1 year, 100,000 apps, 0% downtime

AndrewDucker · on Jan 6, 2012

This is what you pay for with GAE, and why you have to write to match its way of working. It's not going to fit everyone, but not having to worry about the resiliency and scalability of the underlying datastore is worth it for some people.

nasmorn · on Jan 6, 2012

Technically it is really impressive but I would always worry about a future Google CEO coming in and saying. We need to concentrate on our core businesses. That is really only a matter of time or more correctly happens from time to time in almost every company. The amount of money google earns with GAE is so little it doesn't matter compared with ads.

At this point all you need to do is engineer all the technically sophisticated systems of GAE on your own and you are good to port your app to your own stack. The good thing is should you succeed there are 100000 desperate customers waiting.

Terretta · on Jan 6, 2012

Assuming this is dogfooding that is being sold at (very sizable) markup, it's hard to find a case where discontinuing the offering would be important for the future CEO.

If, however, Google switches to different technologies in house, then one could expect this to go away rather quickly, as its prior dogfooding driven products have once abandoned in house.

latchkey · on Jan 6, 2012

Now that they are out of preview mode, there is a 3 year notification process in the terms of service. That should be enough time for most people to migrate to something else. If not, then GAE probably isn't the right solution for you.

AndrewDucker · on Jan 6, 2012

I believe that GAE is built on the same technologies that Google use in-house, so they're unlikely to transition any time soon.

But even if not, there's always AppScale: http://appscale.cs.ucsb.edu/

jfoster · on Jan 6, 2012

You're right. When there was a GAE outage last year, the Chrome Web Store also had an outage. I think this was the one: http://googleappengine.blogspot.com/2011/07/java-app-engine-...

fraserharris · on Jan 6, 2012

Perhaps having tens of thousands of developers trained on your infrastructure has intangible benefits?

nextparadigms · on Jan 6, 2012

I doubt they will do that anytime, soon, but I agree some of the past decisions like that have really hurt them image wise, so they better be more careful of what they allow in the first place next time.

pors · on Jan 6, 2012

Pretty impressive, but what does "no system-wide downtime" mean? So there was local downtime?

benjaminwootton · on Jan 6, 2012

Well it's a distributed data store so I guess at times they took down certain nodes for upgrades or fixes etc then bought them back up afterwards where they became consistent again. This probably wouldn't have had any user impact.

These distributed data stores are so appealing from the perspective of uptime and stability.

Needing to upgrade the DB or take cold backups or run massive schema migrations is the reason for lots of scheduled downtime.

And removing the huge whopping single point of failure from your system that you would have with a traditional RDBMS can only do good things for your stability.

pors · on Jan 6, 2012

Yeah, good point, makes sense

jfoster · on Jan 6, 2012

There absolutely has been App Engine downtime in the past year. They seem to be claiming that the datastore specifically (an App Engine service) hasn't had any downtime. It's not really a good measurement of App Engine's reliability as a whole.

Example: http://googleappengine.blogspot.com/2011/07/java-app-engine-...

latchkey · on Jan 6, 2012

"no HR datastore apps were affected."

Sure, it is a small point, but it actually matters a lot. At this point everyone hosting a real service that cares about uptime should be on HR.

I had someone from google tell me that GAE lost an entire datacenter and nobody noticed.

vosper · on Jan 6, 2012

Yes, there have been plenty of instances of downtime aside from the HR datastore - e.g. memcache and task queue have both gone down simultaneously and for more than an hour, which can be hugely disruptive but don't count as datastore outage.

Also, there are lots of instances of localized downtime that doesn't make it into the global system status - you only have to hang out in the IRC room to see people coming in with "is anyone else seeing really slow startup times" or "is anyone else not seeing logs updating for the last hour".

jroseattle · on Jan 6, 2012

Most of the time, these statements are meant to get one to ask "what about their competitors?" In this case, this is most assuredly trying to get people to think about AWS, and their well-known outage last year.

However, on these statements, I always focus on the "apps". I'm not familiar with the GAE environment beyond what I read, but the last time I looked into it, I could deploy on GAE for free. I pushed a simple test application up there to see how it worked. It wasn't anything beyond a how-does-this-work experiment.

Which makes me ask: how many of those 100K apps aren't similar to mine, or at least of the unpaid variety? Not to refer to those applications as unimportant, but companies are paying for AWS (some substantially so) and it's known that an outage in AWS is a major consideration for companies that use that platform. I'm unaware of the context of usage of paid accounts on GAE.

If GAE had an outage, would it matter? Would people notice? I don't know how big or critical their community of paid apps represents in the grand scheme of things.

its_so_on · on Jan 6, 2012

semantics:

when I read "0% downtime" I roughly read it as "less than 0.49% downtime", which is not THAT impressive. (it would mean 99.51% uptime)

In cases where 1% is like arsenic, "0% arsenic" doesn't mean much. I'll take the water source with 0.00% arsenic! (if you'd like to brag about it :D - and why not, as Wikipedia notes, "Arsenic contamination of groundwater is a problem that affects millions of people across the world.")

actually Wikipedia says acceptable levels are 1 part in 100 000000 (if I'm converting correctly), but 0.00 at least gives you an indication that you're not boasting about poison.