Google Spanner's Most Surprising Revelation: NoSQL is Out and NewSQL is In

nostrademons · on Sept 24, 2012

It's really funny to watch tech journalists try to write about Google infrastructure from the outside, based only on one paper...

Hell, it's usually really funny just to watch tech journalists try to write.

pron · on Sept 24, 2012

High Scalability is a serious and informative blog, and I wouldn't dismiss it. This isn't TechCrunch. Its focus is less on rigorous scientific analysis of a narrow field, but a good, moderately deep, overview of anything to do with data scaling. And it's doing a darn good job.

chc · on Sept 24, 2012

In case you aren't aware, nostrademons is a Googler. He is pretty well-qualified to dismiss it in this particular case.

lrobb · on Sept 25, 2012

He might be... But just because you're one of the thousands of engineers at google, that doesn't make you automatically qualified.

ArbitraryLimits · on Sept 25, 2012

Is he similarly well-qualified to dismiss an entire profession?

nostrademons · on Sept 25, 2012

The second sentence is admittedly a cheap-shot and a bit overgeneral. I thought about editing it out, but people had already responded to it and I hate when people ninja-edit the part of a post that I'm responding to.

ArbitraryLimits · on Sept 25, 2012

Well, thanks.

achille2 · on Sept 25, 2012

His employer doesn't matter. It doesn't add anything productive to the discussion. If one of Bing's developers had laughed off articles about Microsoft's architecture as "look at the adorable little journalists running around throwing big words around", I would not expect it to rise to top comment.

benologist · on Sept 25, 2012

It's still not fair to toss HighScalability in with the pageview whoring tech rags.

pron · on Sept 25, 2012

In that case, a respectful and, more importantly, substantial comment would have been in order.

jfoutz · on Sept 25, 2012

It's funny because it's technically a violation of an NDA.

neilk · on Sept 24, 2012

The part that amuses me is their surprise that Google sees some advantage to transactions.

Journalists (of any kind) want to see things in terms of personality or philosophical conflict, e.g. SQL vs NoSQL. Because it's what their readers understand, and usually, it's all that they understand. They don't seem to consider that people might be making rational choices and difficult tradeoffs.

seiji · on Sept 24, 2012

If that site contains one journalist, I'll eat my diamond tiara. Let's call them blogophiles instead. Or reposting aggragists. Adveraggrapostsummarizers. "One who reposts content from other sites while attaching ads and providing a veneer of quote-unquote insight."

bduerst · on Sept 24, 2012

This is exactly why I think the "News" market is ripe for some disruptive product.

The only problem is that sources can send cease and desist letters to any company that effectively steals the sources' viewerbase.

VonGuard · on Sept 24, 2012

I can't really apologize for my brethren in the market, but yes, tech journalism tends to be terrible. The journalism market was already disrupted by Twitter, Facebook and blogs. Blogs in particular are great, because instead of reading what some idiot from journalism school has to say about, oh, I dunno.... uhm.... exceptions versus errors in programming languages (as touched off by that Python vs. Go post), instead of reading some wonk's interpretation, you can instead read a developer's opinion.

In the meantime I just wanted to point out that everyone and their mother is making "announcements" about Spanner, and usually trying to make the world think they're better. Mostly this is from NoSQL companies.

w1ntermute · on Sept 24, 2012

I don't think it's fair to lump all technology journalists into one category. There are two basic types of technology journalists - technologists who write about technology and journalists who write about technology. The former tends to be more accurate, but the latter tend to be more savvy, and thus successful, journalists, since most readers are not technologists.

idiotprogrammer · on Sept 25, 2012

Web programmers take note: High Scalability == Tech Journalism.

The sad part is web programmers think these blogs are some sort of programming advice. "What is your stack?" == "I am an idiot."

How does it feel to "kickstart a revolution" (those are Mr. High Scalability's the exact words) every time you publish a paper?

saraid216 · on Sept 25, 2012

...you know, I did a text search for "kickstart" and its one instance had nothing to do with Spanner.

sigil · on Sept 24, 2012

Buzzword headline aside, the Spanner paper is great and worth your time. As is the BigTable paper, the Dremel paper, and the Paxos Made Live paper.

I read the Google whitepapers and wonder, is there anywhere else one can go to work on real solutions to distributed systems problems? At smaller scales you can cheat -- you don't need Paxos, you can get away with non-consensus-based master / slave failover. You can play the odds with failure modes. At Google's scale, you can't: behavior under normally unlikely failures matters, probability matters, CAP matters.

ckuijjer · on Sept 24, 2012

I've downloaded them all, now I should really start reading them. Here are some direct pdf links of the papers:

- Spanner: http://research.google.com/archive/spanner-osdi2012.pdf

- BigTable: http://research.google.com/archive/bigtable-osdi06.pdf

- Dremel: http://research.google.com/pubs/archive/36632.pdf

- Paxos made live: http://www.eecs.harvard.edu/cs262/Readings/paxosmadelive.pdf

espeed · on Sept 24, 2012

Look at Titan (http://thinkaurelius.github.com/titan), a new distributed OLTP graph database that has a storage layer that adds distributed transactions to pluggable backends, such as HBase and Cassandra. It's by the team the created Tinkerpop Blueprints and Gremlin, the graph traversal language.

You can read more about it in Matthias's PhD dissertation (http://www.knowledgefrominformation.com/category/publication...).

Also see Calvin:

"Calvin can run 500,000 transactions per second on 100 EC2 instances in Amazon’s US East (Virginia) data center, it can maintain strongly-consistent, up-to-date 100-node replicas in Amazon’s Europe (Ireland) and US West (California) data centers---at no cost to throughput."

"Calvin is designed to run alongside a non-transactional storage system, transforming it into a shared-nothing (near-)linearly scalable database system that provides high availability and full ACID transactions. These transactions can potentially span multiple partitions spread across the shared-nothing cluster. Calvin accomplishes this by providing a layer above the storage system that handles the scheduling of distributed transactions, as well as replication and network communication in the system. The key technical feature that allows for scalability in the face of distributed transactions is a deterministic locking mechanism that enables the elimination of distributed commit protocols."

http://cs.yale.edu/homes/thomson/publications/calvin-sigmod1...

And Omid (https://github.com/yahoo/omid) is another somewhat similar system, but it only works with HBase. Here's a comparison from the Omid team: https://groups.google.com/d/msg/omid-project/BTue2jAH1iQ/ZP3...

equark · on Sept 24, 2012

You could come work at Sense (http://www.senseplatform.com) or email me tristan@senseplatform.com. Not Google scale, but the same technological challenges without the legacy bagage.

chops · on Sept 24, 2012

You could go to Basho to work on Riak Enterprise.

http://basho.com/products/riak-enterprise/

fintler · on Sept 25, 2012

At lanl.gov, we've been developing a ton of stuff related to distributed problems, recently we wrote a library for abstracting self-stabilization for single-mount petascale filesystem treewalking that extends Dijkstra's original design: https://github.com/hpc/libcircle

We have a detailed paper on it coming out at SC12.

nicolast · on Sept 25, 2012

We created Arakoon (http://arakoon.org) in-house to solve a real distributed systems problem. At small engineering scale (only a handful of contributors), using (Multi-)Paxos.

neilk · on Sept 24, 2012

Lots of big sites roll their own solutions for this problem. Often there will even be multiple in-house technologies with different characteristics. Google, LinkedIn, and Facebook for sure.

ssmoot · on Sept 25, 2012

I haven't read TFA, so this is out of context I'm sure. Your note on failover just caught my eye. It's not a counter to your thought-provoking comment.

  "At smaller scales you can cheat -- you don't need Paxos,
  you can get away with non-consensus-based master / slave
  failover."

IME, not so much. Github's recent outages are a solid example. Automating failover for master/slave systems based on a single point of data, such as IP-failover, process failure, heartbeat, etc for slave-promotion is inherently a very risky strategy.

More often than not, no matter the amount of hardening or testing I put into my scripts, it just doesn't work when you really need it to.

TravisCI on Heroku seems like another good example.

What does work incredibly well IME is systems designed with HA built in. HA-Proxy and CARP is a seriously solid combination. You can synchronize your configuration nightly to add some meager level of automation to the process. I'm also pretty excited about CouchDB (and it's derivatives) since AFAIK it's the only free, nix, peer-to-peer replicating database system I've found (I know of no RDBMS that qualifies, Riak is not free, and RavenDB Server is Microsoft Server only).

In my experience, the two pieces of the HA "last mile" that few systems overcome are:

  * Server Affinity is bad, so bad
  * Failover MUST be transparent to the clients

The great thing about a database over HTTP is it's stateless. As long as the failover doesn't happen mid-request, you don't have to worry about connection errors, reconnecting, etc. So pair that up with something like HA-Proxy and you can increase both read scalability and availability. Then stick HA-Proxy on a box using CARP, and you've just automated the failover of your entire database stack, with as close to zero downtime as you can get, all with simple, reliable, free tools.

Put several application servers behind HA-Proxy as well, and use something like Infinispan for Session Storage and Caching, with CouchDB for file storage and now you have a system that can support highly dynamic web sites, the sort that can't typically be easily cached so availability can't easily be achieved through serving stale responses, with no single point of failure.

Then use FreeBSD for your CouchDB servers, have at least two actively load-balanced systems for "production" and a third that's just a replication-consumer backup. Then have all three independently zfs-snapshot for your backups. Now your massive database files are safe as well, and it doesn't impact availability. In the case of critical corruption (ie: an application bug that propagated bad data throughout the system), just a few minutes of manual effort cloning snapshots until you pinpoint the issue can have you back up and running.

No lengthy database restore commands to run. If you know when the bad data was persisted, you could have the correct snapshot identified, a clone created, a server instance loaded on another port, and batch update the records in the live system. If you document and run drills on the the different application failure scenarios you could recover from what would otherwise be catastrophic losses in just a few minutes.

You could even have the backup system be running on the last snapshot, so if you're really on the ball, you could be back online simply by disabling the health-check proxies (run the health-check through an nginx proxy, so you can just "sv stop healthcheck") on the primary systems (assuming you could develop a reliable way to suspend the clone/server-process-restart process on the backup server).

As long as your monitoring is up to snuff, you should be able to sleep peacefully.

Got a bit carried away. Hopefully someone finds the rambling useful though. ;-)

cgs1019 · on Sept 24, 2012

This article comes across as really cynical and entirely lacking in the kind of rigor and detail I have previously found on highscalability. Spanner is really mind-blowingly cool tech. I thought this article was much more informative and worth the time to read: http://news.ycombinator.com/item?id=4562546

efuquen · on Sept 24, 2012

"Another complicating factor is that as Masters of Disk it’s not surprising Google ..."

Masters of Disk? He seriously that that was a good line? One of the many things that annoyed me about that post. Thanks for the links.

josephcooney · on Sept 24, 2012

"He seriously that that was a good line?"

Ironic.

apu · on Sept 24, 2012

Muphry's law

Jare · on Sept 25, 2012

I think the notes about transactions are quite valid as the point of a HS post. However the author didn't have anything else to add ("I look forward to more insightful commentary. There’s a lot to make sense of") yet felt the need to keep writing, and filled the article with style and attitude rather than substance.

Dave_Rosenthal · on Sept 24, 2012

Disclaimer: I'm a co-founder of a database company (FoundationDB) building a scalable, ACID database.

I couldn't agree more with main quote that they pulled from the paper, expressing the difficulty of (even great) programmers having to "code around the lack of transactions." Ease of development is one of the biggest benefits of transactions.

However, another huge benefit that didn't get much play in the article is the freedom that transactions afford you to build abstractions and other data models on top of whatever you are given. In our product's case, a low-level ordered K/V store is used for a storage layer and several different data models are exposed on top (see http://foundationdb.com/#layers).

I think the future of databases has a diversity of data models and query languages (including SQL, document, K/V, columnar, etc.). I also think the future of databases is ACID. It seems like more and more of the NoSQL early adopters (and creators) are coming to the same conclusion.

dkhenry · on Sept 24, 2012

I wonder how you see the publishing of the Spanner paper effects your product? Does this give validity to your product seeing as there is a production system with similar features to yours already in use, or is there a risk Google might provide this system as a service to other companies possibly competing with you.

Dave_Rosenthal · on Sept 24, 2012

Well, it doesn't seem like it's their ambition, but there's no doubt that if Google released all of their in-house data tools, including Spanner, that it would drastically change the market.

We love to see this stuff, though. When we started building our product over three years ago, the idea of a distributed, ACID database was sort of laughed at. (I think the CAP theorem scared a lot of people off from building really useful products.) FoundationDB isn't the same as Spanner, but they share some of the same goals. We see that as a huge validation.

Thanks for the question.

akldfgj · on Sept 24, 2012

What's not their ambition?

http://cloud.google.com/products /compute-engine.html /cloud-storage.html /big-query.html /more-products.html

rbanffy · on Sept 24, 2012

I don't think what defines NoSQL is the lack of transactions. In fact, lack of transactions is not even a common trait among all things called NoSQL.

To be absolutely sincere, I think we should get rid of the word NoSQL. As for NewSQL, we shouldn't have started using it. It's great for branding, but it confuses journalists.

scott_s · on Sept 24, 2012

I prefer to call NoSQL "non-relational database."

nolok · on Sept 24, 2012

NewSQL ? Seriously ? Do we really need another low-quality buzzword for people to re-use everywhere ?

mwexler · on Sept 24, 2012

I don't know, I think it's helpful. It changes the conversation from being about the access language to being focused on how a technology processes data at scale. Well, almost...

Notice that even with the shift from "NoSQL" to "NewSQL", mentions of joins continues to be conspicuously absent from many of these discussions. So, it's worth noting that many of these NewSQL things are "almost but not quite SQL", hence the value of a new word.

bcoates · on Sept 24, 2012

I don't think Spanner is one of these fake-relational databases, it's an ACID datastore that's used as underlying storage for the real RDBMS F1.

From the F1 paper (http://research.google.com/pubs/pub38125.html) It looks like it's mostly intended as an improvement to sharded MySQL, by putting the sharding where it belongs, down at the physical storage level, instead of up at the client access level.

Maybe a better buzzword would be NoMySQL (NOracle?)

grandalf · on Sept 24, 2012

It's as if someone just implemented a crude parser to map SELECT statements into a query for whatever the underlyig nosql system happens to be.

jbigelow76 · on Sept 24, 2012

I didn't really like the NewSQL thing either. I initially thought it must have meant transactional DBs with some first class schema-less features like Postgres has. Nope, it's another one of those phrases or ideas that ReadWriteWeb, the source of the linked to NewSQL definition, is trying to turn into a "big idea". They tried that previously with the Internet of Things (still don't know what the hell that is) as well as their own name, when they referred to the Web 2.0/API hype as the Read/Write Web.

I actually like RWW over TC or Mashable but sometimes I have to tune out some of their pompous prognostication.

DanWaterworth · on Sept 24, 2012

I think there should be a term for these new DBMSs, but the problem with NewSQL is that the thing they have in common is not SQL, but consistency and transactions.

ChuckMcM · on Sept 24, 2012

So your saying its not ACID its more like Ecstasy?

DanWaterworth · on Sept 24, 2012

Ecstasy might work:

Eternally Consistent State, Transactions And Scale, Yes please.

charlieflowers · on Sept 25, 2012

Very well done, sir.

akldfgj · on Sept 24, 2012

It is ACID. It is a DBMS. It's not a Relational DBMS.

ChuckMcM · on Sept 24, 2012

Dude its a pun :-)

nostrademons · on Sept 25, 2012

We just need to name a drug "DBMS" (preferably something that makes you feel emotionally close, hence "relational DBMS") to complete the pun.

mariuz · on Sept 25, 2012

I like the OldSQL thanks , it is used in open source SQL databases

chris_mahan · on Sept 24, 2012

How about ReSQL? Or SQLAgain? Or even BTS (Back to SQL)?

realrocker · on Sept 24, 2012

Imagine an amateur programmer walking into this whole debacle.

wmf · on Sept 24, 2012

Isn't that called Hacker News?

realrocker · on Sept 24, 2012

I guess I should clarify my statement. But I won't.

debacle · on Sept 24, 2012

I'd probably apologize for my clumsiness and keep walking.

realrocker · on Sept 24, 2012

well played sir.

aklein · on Sept 24, 2012

Here is a google engineer giving a keynote on Spanner:

http://vimeo.com/43759726

neovive · on Sept 24, 2012

With some of the best programming minds in the world, it's interesting to see that Google finds it more efficient for programmers to solve the performance issues vs reimplementing core db functionality, such as transactions.

I like the direction of moving closer to original database theory concepts and allowing the creative energy to focus on solutions to performance problems at high-scale.

dmix · on Sept 24, 2012

> best programming minds

They probably have a lot of great minds at the top of Google but there's still going to be a significant amount of average ones, just like in any company/structure.

kyt · on Sept 24, 2012

"Maybe this time Open Source efforts should focus elsewhere, innovating rather than following Google?"

There's a ton of innovative projects in the open source community, but it's difficult to convince people to use them. Developing a clone of a Google tech has a built in marketing advantage: "Google uses something like this."

cageface · on Sept 25, 2012

So the correct and responsible thing for Google to do now would be to patent the shit out of this and then sue back into the stone age anybody that implements anything even vaguely similar, right?

I mean, the social fabric depends on companies protecting their innovations, right?

zaidf · on Sept 24, 2012

I am in favor of any label that replaces "NoSQL" with something else. NoSQL always rubbed me the wrong way. Besides not being a huge fan of the "No", it really says nothing about what it is(if it isn't SQL).

exabrial · on Sept 24, 2012

So... transactions are cool again? Thank goodness!

jbigelow76 · on Sept 24, 2012

Transactions have always been cool, it's just that business that don't need to them have always been cooler in the in the world of tech journalism so they don't get much mention. If something goes wrong at Instagram nobody really cares if you they don't roll back posting the picture of your pastrami sandwich just because the #lolcatz tag got applied by mistake.

rbranson · on Sept 24, 2012

I'd love to have actual distributed transactions that could scale indefinitely and not create availability issues. We actually get a steady stream of user complaints about inconsistencies between counter caches and what appears in results. Worse is the inconsistencies that can happen between graph edges that you want to partition in two different manners (eg. following vs followers).

jaylevitt · on Sept 24, 2012

Likewise, consistency's more important in facebook-y scenarios than you'd think. The canonical example I've heard is that "defriend my boss" MUST always be seen before "post 'I'm quitting!'"

0xbadcafebee · on Sept 24, 2012

So is Spanner basically just a distributed mostly-relational database? It seems like to find the record you want you just search the indices of nodes like a B-tree and in a couple queries are left at the node with the records you want. The downside is all that lost write performance but since it's synchronous you can probably mitigate some of the rebalancing by having many indices of indices, so only a couple nodes have to actually rebalance anything on a write.

zaphar · on Sept 24, 2012

Don't mistake Transactional with Relational. Just because a database has transactions doesn't automatically mean it is Relational too.

mjs · on Sept 25, 2012

"A complicating factor for an Open Source effort is that Spanner includes the use of GPS and Atomic clock hardware." (!)

Can anyone explain why such an accurate clock is helpful? I can see that it's needed if you create a document on the East Coast at about the same time as you create a document on the West Coast, and you absolutely need to know which was created first, but for most applications can't you just go with whatever time the system that got the insert thinks it is??

jpollock · on Sept 25, 2012

Distributed state machines are all about deciding in what order things happened in. That A happened before B. If you had a clock that everyone agreed on, much of the problems would go away. The problem is that we don't have one that everyone can agree on. Protocols like Paxos allow us to guarantee an order will be agreed, but it can be slow. They seem to be using the clocks with an error bound to perform a first pass ordering to see if a collision is possible at all, or if they can skip a few steps. Caveat: I don't fully understand that part of the paper yet...

acdha · on Sept 25, 2012

> whatever time the system that got the insert

is the key part: if you want to know what order to apply changes in, you need every system to have fairly close time synchronization so you can apply changes in the correct order. NTP will get you enough precision for many applications but if you get enough updates to the same resources, you're certain to start getting differences which are smaller than the reliable accuracy of your system clock.

You can use other protocols - e.g. a CAS-style "change old-value to new-value" conditional update - but those have performance implications and require app support. Given that a GPS probably costs about an hour or two of engineer-time, trying to eke a little more precision out of the system clock seems like a potentially cheap win.

saraid216 · on Sept 25, 2012

> for most applications

I suspect that part of the point is that this application does care.

gbog · on Sept 25, 2012

> few organizations need to support transactions on a global scale

That doesn't ring true to me. My previous job was in the field of services to administrations, nothing very uncommon, and we needed global transactions very badly. I suppose any bank, insurance and even plane ticket trader would love to have global transactions.

mooneater · on Sept 25, 2012

Why does Google so often give details on their production infrastructure? I would think they could get a bigger lead by keeping quiet.

novaleaf · on Sept 27, 2012

a bit late, but bigtable has transactions... we use it a lot (but limits you to 5 objects at a time)

philthom · on Sept 25, 2012

Yeah Silver Bullets Are Always Plated

effinjames · on Sept 25, 2012

i like the way elasticsearch use json to express query (query-DSL)

effinjames · on Sept 25, 2012

what's with the downvote? do i really care? the thing is, json could be use as a replacement for sql, a safer sql.