Hacker News new | past | comments | ask | show | jobs | submit login

> clients can do globally consistent reads across the entire database without locking

How is this possible across data centres? Does it send data everywhere at once?

Seems too good to be true of course but if it works and scales it might be worthwhile just not having to worry about your database scaling? Still I don't believe it ;-)

EDIT: further info...

> Spanner mitigates this by having each member be a Paxos group, thus ensuring each 2PC “member” is highly available even if some of its Paxos participants are down. Data is divided into groups that form the basic unit of placement and replication.

So it's SQL with Paxos that presumably never get's confused but during a partition will presumably not be consistent.




> In terms of CAP, Spanner claims to be both consistent and highly available despite operating over a wide area, which many find surprising or even unlikely. The claim thus merits some discussion. Does this mean that Spanner is a CA system as defined by CAP? The short answer is “no” technically, but “yes” in effect and its users can and do assume CA. The purist answer is “no” because partitions can happen and in fact have happened at Google, and during some partitions, Spanner chooses C and forfeits A. It is technically a CP system.


I would expect more from Brewer.

"CA except when there are partitions" is CP. It's not "effectively CA".


No, he's saying it's effectively CAP because the A downtime is so small.

It's one thing to do that for a key-value store. Entirely another to support joins on a globally distributed database. This ain't just one availability zone. Spanner is amazing.

It took them a few years to make it a service, but when they announced its use internally a few years ago, it seemed like the nail in the coffin for in-house database hosting.


I understand what he's saying. It's marketing.

There's nothing wrong with saying it's CP, but since we control everything there's extremely rare P. Then he can show availability numbers (which he kinda does).

Saying it's "effectively CA" defeats the point of the CAP theorem, which says you have to make tradeoffs. See: https://codahale.com/you-cant-sacrifice-partition-tolerance/


> It's marketing.

No, it's engineering. It's the recognition that if periods of unavailability are too small and too rare to be noticed, then the system behavior is indistinguishable from an "available" system in the sense of the CAP theorem.

It's like the "Retina" display you're probably reading from. There are pixels, you just can't see them.


Another point is that since all records are globally timestamped, you can do a read that is consistent at a timestamp in the past (i.e. read data as the database was 1 second ago, or something like that).

If data from other places has synchronized to your zone, you may be able to do this globally-consistent read while only touching your local datacenter (because TrueTime guarantees that no other records anywhere in the system will be created at the time you are querying).

Note: I work at Google, but I don't know more about Spanner than the Spanner paper.


Check out the papers. They revealed Spanner a few years ago. Other commenters have provided links.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: