High Scalability is a serious and informative blog, and I wouldn't dismiss it. This isn't TechCrunch. Its focus is less on rigorous scientific analysis of a narrow field, but a good, moderately deep, overview of anything to do with data scaling. And it's doing a darn good job.
The second sentence is admittedly a cheap-shot and a bit overgeneral. I thought about editing it out, but people had already responded to it and I hate when people ninja-edit the part of a post that I'm responding to.
His employer doesn't matter. It doesn't add anything productive to the discussion. If one of Bing's developers had laughed off articles about Microsoft's architecture as "look at the adorable little journalists running around throwing big words around", I would not expect it to rise to top comment.
The part that amuses me is their surprise that Google sees some advantage to transactions.
Journalists (of any kind) want to see things in terms of personality or philosophical conflict, e.g. SQL vs NoSQL. Because it's what their readers understand, and usually, it's all that they understand. They don't seem to consider that people might be making rational choices and difficult tradeoffs.
If that site contains one journalist, I'll eat my diamond tiara. Let's call them blogophiles instead. Or reposting aggragists. Adveraggrapostsummarizers. "One who reposts content from other sites while attaching ads and providing a veneer of quote-unquote insight."
I can't really apologize for my brethren in the market, but yes, tech journalism tends to be terrible. The journalism market was already disrupted by Twitter, Facebook and blogs. Blogs in particular are great, because instead of reading what some idiot from journalism school has to say about, oh, I dunno.... uhm.... exceptions versus errors in programming languages (as touched off by that Python vs. Go post), instead of reading some wonk's interpretation, you can instead read a developer's opinion.
In the meantime I just wanted to point out that everyone and their mother is making "announcements" about Spanner, and usually trying to make the world think they're better. Mostly this is from NoSQL companies.
I don't think it's fair to lump all technology journalists into one category. There are two basic types of technology journalists - technologists who write about technology and journalists who write about technology. The former tends to be more accurate, but the latter tend to be more savvy, and thus successful, journalists, since most readers are not technologists.
Buzzword headline aside, the Spanner paper is great and worth your time. As is the BigTable paper, the Dremel paper, and the Paxos Made Live paper.
I read the Google whitepapers and wonder, is there anywhere else one can go to work on real solutions to distributed systems problems? At smaller scales you can cheat -- you don't need Paxos, you can get away with non-consensus-based master / slave failover. You can play the odds with failure modes. At Google's scale, you can't: behavior under normally unlikely failures matters, probability matters, CAP matters.
Look at Titan (http://thinkaurelius.github.com/titan), a new distributed OLTP graph database that has a storage layer that adds distributed transactions to pluggable backends, such as HBase and Cassandra. It's by the team the created Tinkerpop Blueprints and Gremlin, the graph traversal language.
"Calvin can run 500,000 transactions per second on 100 EC2 instances in Amazon’s US East (Virginia) data center, it can maintain strongly-consistent, up-to-date 100-node replicas in Amazon’s Europe (Ireland) and US West (California) data centers---at no cost to throughput."
"Calvin is designed to run alongside a non-transactional storage
system, transforming it into a shared-nothing (near-)linearly scalable
database system that provides high availability and full ACID
transactions. These transactions can potentially span multiple
partitions spread across the shared-nothing cluster. Calvin
accomplishes this by providing a layer above the storage system that
handles the scheduling of distributed transactions, as well as
replication and network communication in the system. The key technical
feature that allows for scalability in the face of distributed
transactions is a deterministic locking mechanism that enables the
elimination of distributed commit protocols."
You could come work at Sense (http://www.senseplatform.com) or email me tristan@senseplatform.com. Not Google scale, but the same technological challenges without the legacy bagage.
At lanl.gov, we've been developing a ton of stuff related to distributed problems, recently we wrote a library for abstracting self-stabilization for single-mount petascale filesystem treewalking that extends Dijkstra's original design: https://github.com/hpc/libcircle
We have a detailed paper on it coming out at SC12.
We created Arakoon (http://arakoon.org) in-house to solve a real distributed systems problem. At small engineering scale (only a handful of contributors), using (Multi-)Paxos.
Lots of big sites roll their own solutions for this problem. Often there will even be multiple in-house technologies with different characteristics. Google, LinkedIn, and Facebook for sure.
I haven't read TFA, so this is out of context I'm sure. Your note on failover just caught my eye. It's not a counter to your thought-provoking comment.
"At smaller scales you can cheat -- you don't need Paxos,
you can get away with non-consensus-based master / slave
failover."
IME, not so much. Github's recent outages are a solid example. Automating failover for master/slave systems based on a single point of data, such as IP-failover, process failure, heartbeat, etc for slave-promotion is inherently a very risky strategy.
More often than not, no matter the amount of hardening or testing I put into my scripts, it just doesn't work when you really need it to.
TravisCI on Heroku seems like another good example.
What does work incredibly well IME is systems designed with HA built in. HA-Proxy and CARP is a seriously solid combination. You can synchronize your configuration nightly to add some meager level of automation to the process. I'm also pretty excited about CouchDB (and it's derivatives) since AFAIK it's the only free, nix, peer-to-peer replicating database system I've found (I know of no RDBMS that qualifies, Riak is not free, and RavenDB Server is Microsoft Server only).
In my experience, the two pieces of the HA "last mile" that few systems overcome are:
* Server Affinity is bad, so bad
* Failover MUST be transparent to the clients
The great thing about a database over HTTP is it's stateless. As long as the failover doesn't happen mid-request, you don't have to worry about connection errors, reconnecting, etc. So pair that up with something like HA-Proxy and you can increase both read scalability and availability. Then stick HA-Proxy on a box using CARP, and you've just automated the failover of your entire database stack, with as close to zero downtime as you can get, all with simple, reliable, free tools.
Put several application servers behind HA-Proxy as well, and use something like Infinispan for Session Storage and Caching, with CouchDB for file storage and now you have a system that can support highly dynamic web sites, the sort that can't typically be easily cached so availability can't easily be achieved through serving stale responses, with no single point of failure.
Then use FreeBSD for your CouchDB servers, have at least two actively load-balanced systems for "production" and a third that's just a replication-consumer backup. Then have all three independently zfs-snapshot for your backups. Now your massive database files are safe as well, and it doesn't impact availability. In the case of critical corruption (ie: an application bug that propagated bad data throughout the system), just a few minutes of manual effort cloning snapshots until you pinpoint the issue can have you back up and running.
No lengthy database restore commands to run. If you know when the bad data was persisted, you could have the correct snapshot identified, a clone created, a server instance loaded on another port, and batch update the records in the live system. If you document and run drills on the the different application failure scenarios you could recover from what would otherwise be catastrophic losses in just a few minutes.
You could even have the backup system be running on the last snapshot, so if you're really on the ball, you could be back online simply by disabling the health-check proxies (run the health-check through an nginx proxy, so you can just "sv stop healthcheck") on the primary systems (assuming you could develop a reliable way to suspend the clone/server-process-restart process on the backup server).
As long as your monitoring is up to snuff, you should be able to sleep peacefully.
Got a bit carried away. Hopefully someone finds the rambling useful though. ;-)
This article comes across as really cynical and entirely lacking in the kind of rigor and detail I have previously found on highscalability. Spanner is really mind-blowingly cool tech. I thought this article was much more informative and worth the time to read: http://news.ycombinator.com/item?id=4562546
I think the notes about transactions are quite valid as the point of a HS post. However the author didn't have anything else to add ("I look forward to more insightful commentary. There’s a lot to make sense of") yet felt the need to keep writing, and filled the article with style and attitude rather than substance.
Disclaimer: I'm a co-founder of a database company (FoundationDB) building a scalable, ACID database.
I couldn't agree more with main quote that they pulled from the paper, expressing the difficulty of (even great) programmers having to "code around the lack of transactions." Ease of development is one of the biggest benefits of transactions.
However, another huge benefit that didn't get much play in the article is the freedom that transactions afford you to build abstractions and other data models on top of whatever you are given. In our product's case, a low-level ordered K/V store is used for a storage layer and several different data models are exposed on top (see http://foundationdb.com/#layers).
I think the future of databases has a diversity of data models and query languages (including SQL, document, K/V, columnar, etc.). I also think the future of databases is ACID. It seems like more and more of the NoSQL early adopters (and creators) are coming to the same conclusion.
I wonder how you see the publishing of the Spanner paper effects your product? Does this give validity to your product seeing as there is a production system with similar features to yours already in use, or is there a risk Google might provide this system as a service to other companies possibly competing with you.
Well, it doesn't seem like it's their ambition, but there's no doubt that if Google released all of their in-house data tools, including Spanner, that it would drastically change the market.
We love to see this stuff, though. When we started building our product over three years ago, the idea of a distributed, ACID database was sort of laughed at. (I think the CAP theorem scared a lot of people off from building really useful products.) FoundationDB isn't the same as Spanner, but they share some of the same goals. We see that as a huge validation.
I don't think what defines NoSQL is the lack of transactions. In fact, lack of transactions is not even a common trait among all things called NoSQL.
To be absolutely sincere, I think we should get rid of the word NoSQL. As for NewSQL, we shouldn't have started using it. It's great for branding, but it confuses journalists.
I don't know, I think it's helpful. It changes the conversation from being about the access language to being focused on how a technology processes data at scale. Well, almost...
Notice that even with the shift from "NoSQL" to "NewSQL", mentions of joins continues to be conspicuously absent from many of these discussions. So, it's worth noting that many of these NewSQL things are "almost but not quite SQL", hence the value of a new word.
I don't think Spanner is one of these fake-relational databases, it's an ACID datastore that's used as underlying storage for the real RDBMS F1.
From the F1 paper (http://research.google.com/pubs/pub38125.html) It looks like it's mostly intended as an improvement to sharded MySQL, by putting the sharding where it belongs, down at the physical storage level, instead of up at the client access level.
Maybe a better buzzword would be NoMySQL (NOracle?)
I didn't really like the NewSQL thing either. I initially thought it must have meant transactional DBs with some first class schema-less features like Postgres has. Nope, it's another one of those phrases or ideas that ReadWriteWeb, the source of the linked to NewSQL definition, is trying to turn into a "big idea". They tried that previously with the Internet of Things (still don't know what the hell that is) as well as their own name, when they referred to the Web 2.0/API hype as the Read/Write Web.
I actually like RWW over TC or Mashable but sometimes I have to tune out some of their pompous prognostication.
I think there should be a term for these new DBMSs, but the problem with NewSQL is that the thing they have in common is not SQL, but consistency and transactions.
With some of the best programming minds in the world, it's interesting to see that Google finds it more efficient for programmers to solve the performance issues vs reimplementing core db functionality, such as transactions.
I like the direction of moving closer to original database theory concepts and allowing the creative energy to focus on solutions to performance problems at high-scale.
They probably have a lot of great minds at the top of Google but there's still going to be a significant amount of average ones, just like in any company/structure.
"Maybe this time Open Source efforts should focus elsewhere, innovating rather than following Google?"
There's a ton of innovative projects in the open source community, but it's difficult to convince people to use them. Developing a clone of a Google tech has a built in marketing advantage: "Google uses something like this."
So the correct and responsible thing for Google to do now would be to patent the shit out of this and then sue back into the stone age anybody that implements anything even vaguely similar, right?
I mean, the social fabric depends on companies protecting their innovations, right?
I am in favor of any label that replaces "NoSQL" with something else. NoSQL always rubbed me the wrong way. Besides not being a huge fan of the "No", it really says nothing about what it is(if it isn't SQL).
Transactions have always been cool, it's just that business that don't need to them have always been cooler in the in the world of tech journalism so they don't get much mention. If something goes wrong at Instagram nobody really cares if you they don't roll back posting the picture of your pastrami sandwich just because the #lolcatz tag got applied by mistake.
I'd love to have actual distributed transactions that could scale indefinitely and not create availability issues. We actually get a steady stream of user complaints about inconsistencies between counter caches and what appears in results. Worse is the inconsistencies that can happen between graph edges that you want to partition in two different manners (eg. following vs followers).
Likewise, consistency's more important in facebook-y scenarios than you'd think. The canonical example I've heard is that "defriend my boss" MUST always be seen before "post 'I'm quitting!'"
So is Spanner basically just a distributed mostly-relational database? It seems like to find the record you want you just search the indices of nodes like a B-tree and in a couple queries are left at the node with the records you want. The downside is all that lost write performance but since it's synchronous you can probably mitigate some of the rebalancing by having many indices of indices, so only a couple nodes have to actually rebalance anything on a write.
"A complicating factor for an Open Source effort is that Spanner includes the use of GPS and Atomic clock hardware." (!)
Can anyone explain why such an accurate clock is helpful? I can see that it's needed if you create a document on the East Coast at about the same time as you create a document on the West Coast, and you absolutely need to know which was created first, but for most applications can't you just go with whatever time the system that got the insert thinks it is??
Distributed state machines are all about deciding in what order things happened in. That A happened before B. If you had a clock that everyone agreed on, much of the problems would go away. The problem is that we don't have one that everyone can agree on. Protocols like Paxos allow us to guarantee an order will be agreed, but it can be slow. They seem to be using the clocks with an error bound to perform a first pass ordering to see if a collision is possible at all, or if they can skip a few steps. Caveat: I don't fully understand that part of the paper yet...
is the key part: if you want to know what order to apply changes in, you need every system to have fairly close time synchronization so you can apply changes in the correct order. NTP will get you enough precision for many applications but if you get enough updates to the same resources, you're certain to start getting differences which are smaller than the reliable accuracy of your system clock.
You can use other protocols - e.g. a CAS-style "change old-value to new-value" conditional update - but those have performance implications and require app support. Given that a GPS probably costs about an hour or two of engineer-time, trying to eke a little more precision out of the system clock seems like a potentially cheap win.
> few organizations need to support transactions on a global scale
That doesn't ring true to me. My previous job was in the field of services to administrations, nothing very uncommon, and we needed global transactions very badly. I suppose any bank, insurance and even plane ticket trader would love to have global transactions.
Hell, it's usually really funny just to watch tech journalists try to write.