What are good off-the-shelf distributed databases? We looked at MongoDB but it wasn't worth giving up SQL. To reiterate the no free lunch point, no one has figured out how to outsmart the CAP theorem yet, so all you can do is design around it.
I work for them so take with a pinch of salt, but Oracle DB. It gives you a fully multi-master horizontally scalable database with ACID transactions (not sharding), full SQL, elastic scalability, built in queues, JavaScript stored procs, automatic REST API construction, many other features. Its pricing is competitive with a cloud hosted Postgres, believe it or not (the clouds are making a lot of money off customers who are wedded to Postgres). I work through some of the numbers for an extreme case here [1].
Behind the scenes, the way it works is by combining software tricks with special hardware. You rent a (part of a) database cluster. The cluster is running on high end hardware running customized kernels, with a private Infiniband RDMA-capable interconnect between the nodes separate from the front-side network that clients connect with. A lock manager coordinates ownership of data blocks, which can be read either from disk nodes or directly out of the RAM of other database nodes. So if one node reads a block then writes to it, the only thing written to disk immediately is the transaction log. If another node then needs to write to that block, it's transferred directly over the interconnect using RDMA to avoid waiting on the remote CPU, the disk is never touched. Dirty blocks are written back to disk asynchronously. The current transaction counter is also poked directly into remote nodes via RDMA.
In the latest versions the storage nodes can also do some parts of query processing using predicate push-down, so the amount of data to be transferred over the interconnect is also lowered. The client drivers understand all the horizontal scalability stuff and can failover between nodes transparently, so the whole setup is HA. A node can die and the cluster will continue, including open transactions.
If you need to accelerate performance further you can add read-through coherent cache nodes. These act as proxies and integrate with the block ownership system to do processing locally.
Other than financial reasons (I own some stock), I've started making this argument here on HN because it's unintuitive but correct, which is just enjoyable. A lot of people in the startup world don't realize any of the above, thinking that horizontally scalable fully coherent SQL databases either don't exist or have severe caveats. E.g. one reply to you suggests FoundationDB which is great, but it's a KV store and not a SQL database.
By the end of the day it’s not black or white there are trade offs. So special hardware is simply a no go zone for me. What happens if you want to leave the cloud and host on premises its the activation of the lock-in mechanism. Thanks I can manage one solution or another using one of the open source technologies. It’s all trade offs.
You don't have to use the special hardware, it's just faster and easier if you do. And you the customized hardware deal pre-dates the cloud: you can buy the pre-built racks and have them wheeled into a self-hosted datacenter if you want. That's called ExaData.
But if you want to run the DB on your own VMs or bare metal, you can do that. It doesn't have any DRM so from time to time you'll be expected to run some scripts that check your usage and reports back, to ensure you've been paying for what you use. But otherwise it's operationally no different to an open source DB.
The open source aspect makes a difference in terms of who you pay for support (if anyone), what quality of support you get, things like that.
Well, no moreso than for any other RDBMS. It implements standard SQL, exporting your data is easy. The only lockin comes from using features that other databases don't have, but that tradeoff exists with any kind of software including open source.