Spanner: Google's Globally-Distributed Database

ChuckMcM · on Sept 15, 2012

This was an interesting project at Google, it started when I was there, and it was breaking things when I left. It is too bad that Ken Thompson didn't get at least acknowledged for his role in making it happen.

I don't think it will be as influential as the original GFS was but its an important piece of work that folks should study.

Locke1689 · on Sept 15, 2012

No, I think it's critical. I worked on one of the first services to ever use Spanner when I was an intern. Lock-free read transactions is a game changer. Short answer -- if your database system can't do lock-free reads, your database is broken. That one feature allows one to do some incredible performance optimizations.

zwischenzug · on Sept 16, 2012

What _exactly_ is a lock-free read transaction? Is it different to reading in a MVCC system?

linuxhansl · on Sept 15, 2012

>if your database system can't do lock-free reads, your database is broken

Yep.

rdtsc · on Sept 16, 2012

I know CouchDB doesn't do read locking. What are other ones out there?

coolestuk · on Sept 16, 2012

Not doing read locking is not a game-changer.

Firebird doesn't do read locking. Neither does Lotus Notes. Both have been around about 20 years.

Locke1689 · on Sept 16, 2012

Not doing read locking alone. Combine it with a planet-scale data storage system...

pdog · on Sept 16, 2012

Postgres and Oracle?

mace · on Sept 16, 2012

PostgreSQL uses MVCC to ensure ACID compliance without read locks. Uncommitted concurrent transactions are isolated from each other.

jackowayed · on Sept 16, 2012

Datomic.

eluos · on Sept 16, 2012

"non-blocking reads in the past"

Sounds like google's finally invented time travel.

vital_sol · on Sept 16, 2012

Actually, this is exactly how transactions in Oracle work. The difference is - one db server (Oracle) vs. distributed system (Google)

pdog · on Sept 16, 2012

Oracle doesn't have to be one DB server. Check out Oracle RAC for instance.

bgentry · on Sept 15, 2012

I would also be interested in a longer, elaborated answer.

Locke1689 · on Sept 16, 2012

The lessons are pretty much the same as the ones functional programming have been trying to teach us for years: immutability and caching.

Beyond that I would rather not elaborate for reasons of confidentiality.

eternalban · on Sept 16, 2012

Value addressing. MVCC.

sedachv · on Sept 16, 2012

What is "value addressing"? That's the first time I've seen that term and google doesn't bring anything relevant up.

arnarbi · on Sept 16, 2012

This is a wild guess, but I guess it might be a synonym for content addressing

http://en.wikipedia.org/wiki/Content-addressable_storage#Con...

eternalban · on Sept 16, 2012

+1. Neologism of an autodidact. IMHO it is more correct, regardless.

theotherone · on Sept 15, 2012

Elaborate?

luriel · on Sept 16, 2012

> It is too bad that Ken Thompson didn't get at least acknowledged for his role in making it happen.

Interesting, can you say more about this?

Is he not mentioned because officially he is part of the Go team?

enneff · on Sept 16, 2012

Ken sat near Jeff, Sanjay, and co while they were designing Spanner, and he regularly consults in an informal capacity on people's projects. I wasn't there, but it wouldn't surprise me at all if Ken's unique insight contributed to Spanner's design.

heretohelp · on Sept 16, 2012

I'm familiar with Ken Thompson, so I'm more puzzled than someone who isn't familiar with his work might be.

What exactly is his unique insight? Do you know any specifics or are you just speaking on behalf of the fact that he's a famous programmer?

I say this as someone whose Planetside2 character is named: "KenThompsonHackerExtraordinaire"

enneff · on Sept 16, 2012

Ken's mind just works in a different way to most people. You explain your problem to him and he'll respond with some question or statement that turns your entire perspective inside out.

luriel · on Sept 16, 2012

Enneff works with ken in the Go team at Google.

As for his particular insight, if you are familiar with his work, that should be enough.

For those not familiar with his work, this interview might be a good starting point:

http://genius.cat-v.org/ken-thompson/interviews/unix-and-bey...

gruseom · on Sept 16, 2012

This is golden:

"The aggressive use of a small number of abstractions is, I think, the direct result of a very small number of people who interact closely during the implementation."

conradfr · on Sept 16, 2012

It's from 1997 I guess ? Seems like he was wrong about Linux and maybe Microsoft :)

ChuckMcM · on Sept 16, 2012

When I was in the platforms group looking at storage issues the Spanner requirements had a lot of commentary from Ken in them, so much so that I thought it was his idea/project until someone corrected me a bit later. That was why I was surprised he wasn't acknowledged, from where I sat it seemed like he was one of the architects of the effort. Apparently that wasn't the case.

linuxhansl · on Sept 15, 2012

I work on HBase (the Apache version of BigTable). It makes me sad to see how far ahead Google is compared to the rest of the world. :)

The notion of uncertain time is ingenious.

zaphar · on Sept 16, 2012

I think that's more a factor of Google's scaling needs vs the rest of the world. We needed to invent it first so we did.

state · on Sept 16, 2012

That's a nice way to put it. That's exactly why these inventions are so interesting: they seem give insight in to problems of another order of magnitude.

DonnyV · on Sept 19, 2012

At least they understand by sharing this information it moves the technology forward. You don't see a lot of other big companies doing that.

lsb · on Sept 15, 2012

Interestingly, the data storage seems similar to Rich Hickey's Datomic: "data is versioned, and each version is automatically timestamped with its commit time; old versions of data are subject to conﬁgurable garbage-collection policies; and applications can read data at old timestamps."

tdg · on Sept 15, 2012

That's exactly like BigTable[1]. It makes sense that they built on top of that.

[1] http://static.googleusercontent.com/external_content/untrust...

akkartik · on Sept 15, 2012

But you can mutate bigtable cells. Datomic seems dramatically different in that respect.

Evbn · on Sept 15, 2012

Can you? Or do some apps just always ask for the latest timestamped version when they read?

akkartik · on Sept 15, 2012

You could but it's not enforced. In practice, teams at google seem to use the time axis in myriad ways, and seldom like datomic.

Also, always reading the most recent timestamp doesn't use time like datomic does. You aren't querying by time and so on.

sedachv · on Sept 16, 2012

Timestamp versioning is one of the oldest (1978) ideas in distributed systems: http://patricklogan.blogspot.ca/2007/09/naming-and-synchroni...

nnythm · on Sept 15, 2012

MVCC has been around for at least thirty years, but it's interesting that we have seen more databases with this feature recently.

linuxhansl · on Sept 15, 2012

Almost all databases use it in form or the other.

PostgreSQL uses it, Oracle uses it, MySQL (innodb) uses it, Apache HBase uses it, the list goes on and on...

Nitramp · on Sept 16, 2012

I think the major contribution in this paper is how to do consistent snapshot reads in a distributed system without a common reference clock, i.e. the use of True Time.

Many databases use some sort of MVCC, but they operate on a single node or in a closely connected cluster. This paper shows how to achieve the same properties in a system spanning continents.

linuxhansl · on Sept 16, 2012

Another observation that struck me when I read this (and after reading the percolator and megastore papers) is how there is a convergence of the "traditional" relational DB world and the "new NoSQL" world. Relational Databases are becoming more scalable, partially with new technology, partially by shedding features in some scenarios. And the NoSQL stores, are becoming less so (it was really about "NoSQL" anyway, but that's a different story). All of these stores have layers or features that bring closer to the traditional SQL/relational model.

Spanner appears to strike a nice middle ground.

hellooo · on Sept 16, 2012

Is spanner written in cc or java?

kaib · on Sept 16, 2012

moondowner · on Sept 15, 2012

Another research publcation from Google that's more-than-worth reading.

These just pile up, I must find time and get my hands on them...

sudhirj · on Sept 16, 2012

This looks like the High-Replicaiton datastore which is now the default in App Engine - Paxos replication, a choice between strong and eventual consistency and tablet sharding. Interesting that they've already built it and it's available for everyone to use.

tete · on Sept 16, 2012

Fun fact: Spanner means voyeur in German slang.

Anyway, looks like a very exciting project. One could come up with so many applications.

kleiba · on Sept 16, 2012

Interestingly, "Spanner" is German for "voyeur". Coming from Google it's almost kind of ironic.

dmayle · on Sept 16, 2012

Even more interesting, "Spanner" is English for "something that spans", as in a database spanning the world.

Maybe it's a bit snarky, but I really don't see how you can read into something like that. It reminds me of the following Jack Handy quote:

Maybe in order to understand mankind, we have to look at the word itself: "Mankind". Basically, it's made up of two separate words - "mank" and "ind". What do these words mean? It's a mystery, and that's why so is mankind. - Jack Handy

huxley · on Sept 16, 2012

A spanner is also British English for what North Americans call a wrench.

regularfry · on Sept 16, 2012

Also, colloquially, for an idiot.

pwpwp · on Sept 15, 2012

Transactions don't scale. They really need to use NoSQL.

Evbn · on Sept 15, 2012

Did you read the first page? BigTable has no trnsanctions, and scales, but is a pain for apps that need consistency. Spanner adds transactions for apps that need it, at scale, charging a tax in the form of latency.

Using two different clock technogies per node (GPS and atomic!) and light speed networking helps make this manageable.

Fault-tolerant time!

zaphar · on Sept 15, 2012

We read about this at work at google a few months ago in a reading group. (perks of the job) And we spent almost the entire time talking about the timestamps. It is perhaps the most important piece of this paper. Fault tolerant time is right.

tonyarkles · on Sept 16, 2012

Yeah, I've been doing research in distributed systems and the timestamp part of this paper is incredibly interesting to me. It's awesome that I might actually get to cite something more recent than Lamport.