In general, strongly consistent distributed datastores like zookeeper tend to be strongly consistent (cf Consul and Etcd too)... But Postgres was not tested as a distributed database, sharded or replicated, and without any form of failover.
The difference is: kill a zookeeper node and you will not notice, kill Postgres and your app is dead.
Postgres is a good DB, but since it's not distributed, it's not very useful to compare it to distributed databases. Yes it's consistent, but it's only as reliable as the single node where it is installed.
This is a common misconception about CAP theorem. Significant number of people don't realize that distributed system also includes clients, it's not just communication between servers.
I suspect he did not go over replication, because Postgres technically still fail over support is DIY, although he should. There are two replication methods though which I would like to see:
- asynchronous - this one is fast, but it most likely would have similar issues the other database have
- synchronous - the master makes sure data is replicated before returning to the user this should in theory always consistent
You would typically have two nodes in same location replicating synchronously and use asynchronous replication to different data centers. On a failure, you simply fail over to another synchronously replicating server.
Regarding consul/etcd actually those technologies did not do well in his tests, but authors appear to be motivated to fix issues.
I agree with your point about the lcient, but there's always a client... What's missing in the postgresql test is high availability, redundancy and partition tolerance... Similarly any inprocess db would beat the competition :)
That's why I said it's unfair to say "postgresql did well".
Call me maybe is supposed to test all the difficult problems of CAP, which have not been tested at all with Postgresql.
That's a real problem. There's little need for a nosql database (except redis maybe, because it's so fast) if you're not trying to overcome partitions and ensure HA...
MongoDB doesn't even scale well horizontally[1]. I normally would put a link to paper where they benchmarked Cassandra & HBase with MongoDB 2, but looks like they did their tests again with MongoDB 3.0 and included Couchbase as well.
I've mostly used MongoDB in mostly-read, and in a replica set... that said, if I needed to support pure scale, I'd be more inclined to reach for Cassandra. If I only needed mid-range replication, I'm more inclined to look at RethinkDB or ElasticSearch at this point. In fact the project I'm working on now is using ElasticSearch.
All of that said, you have to take a research paper funded by a database company (Datastax is backing Cassandra) with a grain of salt. Not to mention, that most people reach for MongoDB because it has some flexibility, and is a natural fit for many programming models. Beyond this, setting up a replica set with MongoDB was far easier than with any other database I've had to do the same with... Though I'd say getting setup with RethinkDB is nicer, but there's no automated failover option yet.
The results are so vastly apart than I don't think there's enough salt that you can add to make MongoDB look good here.
They also were quite generous by comparing load using non-durable write for CouchDB, HBase and MongoDB against Cassandra's durable write.
From my personal experience many scaling problems that you have with MongoDB once you switch even to a relational database that can't scale out are laughable.
Postgresql doesn't even try to scale horizontally.
Regarding Mongodb, all I'll say is that I've switched from mysql to mongodb 2 years ago, and I've never looked back. YMMV.
I'm also a user of ElasticSearch and Redis, and looking to add Couchbase to the lot. One size doesn't fit all. mysql and postgres certainly don't fit all either.
Postgres does one thing and does it well, which is keeping your data safe. The whole NoSQL movement was to sacrifice ACID in exchange for speed and scalability, which MongoDB has neither. You effectively get database that not only performs slower in single instance[1], but also can't even scale horizontally.
Also saying that Postgres cant scale horizontally is not entirely true, it in fact can[2], it is currently more complicated but I learned something when I was investigating how our applications would behave with Postgres backend. Turns out that every instance we had mongo we could run postgres on a much smaller instance. In one instance the data was so laughable that you could just run postgres on the same node that was running the app.
The point of it is that even if you think that you need to scale out, unless you're Google, Facebook or similar company you chances are you don't.
Granted that you can only prove that the system is vulnerable and not the reverse, but if there is a vulnerability it is much harder to trigger it.