I open every "New Postgresql features" article in the hope that it will contain "Simple cluster setup, you just start nodes and they will figure out the rest." Unfortunately it didn't happen so far...
Is it such a hard problem, that no one is able to solve it, or it is not a concern or interest of the parties that are developing Postgresql or the tools around it?
(And by hard problem, I don't mean surviving an aphyr-level diagnostics without any issues, but it it would be certainly nice to see such too.)
Edit: I'd settle for a multi-master replicated cluster, where the node failures and startups (and the migration of the data) is handled transparently in the cluster. I know that there are many other aspects, but even this basic case is painfully hard to achieve (much harder than with MySQL).
It probably will never happen. Postgres follows an ideology of explicit and safe...it never tries to assume that you meant to do one thing if there is a possibility that you meant another.
With clusters, there are far too many variables for it to be plug and play. For example, do you want replication, or distributed partitions? If replication, do you want master-slave, or master-master? If distributed partitions, what are you going to trade off: consistency or availability? (That being said, Postgres will likely never change from its CA stance, you'll have to use an alternative distribution like PostgresXL).
Furthermore, clustering isn't likely to be much more than a pareto-tail problem for the next decade or so. It is useful for millions of use cases, and something that only Google's and Twitter's can benefit from is not one of them. If you have the data to justify extensive clustering requirements, I would hope you also have the resources to contribute your patches to Postgres. So far, they all seem to be content solving their own niche problems with niche solutions that slowly bleed down to us mere mortals (Ex. Cassandra), or just buying a solution like EnterpriseDB or Oracle.
Can't agree. You don't need a ton of data to run into perf issues if you are running on slow commodity hardware for example in Azure. Nor does the lack of synchronous replication with automatic failover make your job easy. Considering the lack of features in PG in this area, running it in Azure seems like a no go to me. We currently had to pick SQL Server which has failover and so on which works great but costs a lot.
It's a hard problem, and the Postgres team's careful approach to engineering means that they are solving it slowly, in comparison to the current fashion for ship first, patch later.
They bring stuff in piece by piece, eg. in 9.4 they added logical changeset extraction/logical log streaming replication, which is another step down the road to clustering. On top of this they are building bi-directional replication. And so it continues.
This is the same reason they haven't added upsert or merge yet.
It will happen eventually. There are use cases and postgres can do more to address them. The existence of third-party solutions is evidence of that.
What makes it really hard is solving the problem in a way that doesn't lead people into a trap. MongoDB offers solutions for some users, but a lot of users walk away very disappointed when the "solution" didn't do what they think it would do (often after running into problems in production).
But it still needs to be done. A general solution is impossible, so we need specific solutions. Each of those will need to be designed in a way that the user knows as early as possible whether it will work for them, or they need to use one of the other approaches.
What's happening right now is work on very powerful infrastructure for logical replication. Postgres invested in binary replication before, and the results have been great. But binary replication is not a good foundation for multi-master; so now there's investment in logical. There is also early-stage investment in parallel query, which I believe can pay off with mutli-machine parallelism, which is another form of scale-out.
The "trap" of MongoDB was more due to the exaggerated marketing around its features and data reliability out-of-the-box. Since Postgres already has plenty of mindshare, it wouldn't need to resort to such tactics, so it could describe its features with "giant red exclamation points" describing the exact pitfalls and extent of the use cases. So I'd be very much in favor of its developers prioritizing specific clustering solutions.
Documenting is one part, but that doesn't really answer the "simple setup" requirement if people have to read a lot.
Ideally, even someone who doesn't read the documents would be gently and intuitively guided toward the right solution and away from costly mistakes. Easier said than done, but let's try to get as close as we can.
You never want node failover to be transparent or automagic. Doing that is begging to end up in a "split-brain" scenario, and have fun cleaning up from that.
EDIT: Yes I'm simplifying; saying "never" overstates somewhat. If your cluster management tool can account for STONITH, you're probably safe — or at least safer. If you don't know what STONITH is without googling, you shouldn't be playing with multi-master, highly available databases and failover betwixt them.
There are robust ways to do this. The fact that some choose to implement their own buggy protocols for it does not mean that I do not want it. I use automatic failover with SQL Server today (and has been for years). I don't like the idea of getting up in the middle of the night to acknowledge some hardware error to initiate a failover.
The fact that a commercial software empire with tens of billions of dollars in revenue can, in conjunction with an OS over which they have complete control, implement working resource fencing does not imply that a mostly-volunteer Open Source project, running on at least a dozen-odd different hardware and software platforms, can too — and certainly not on the same time scale.
Clustering /is/ hard. There are tradeoffs to the various ways to do it, and explicit choices you have to make that have the ability to bite you in the ass later. As one of the stated goals of Postgres is reliability and correctness, they don't want to half-ass it. That being said, there is active work on getting multi-master working well out of the box, which unfortunately didn't land in 9.4. It may make it in 9.5, but that remains to be seen.
I'm currently building this sort of thing and it's a glorious feeling. Clustering is hard, like "search engines before publishing of the page rank algorithm" hard. Sure there are lots of options, some very good ( I have fond memories of HotBot's advanced search, it was my go to search engine for a few years ) but tractability changed drastically after that paper. Now search engine theory and building a basic search engine from scratch is suitable for students instead of postgraduates. It's fun to work on a bleeding edge.
Don't expect it anytime soon. A large number of Postgres' key developers work for EnterpriseDB and that is a feature they charge for as part of Postgres Plus Advanced Server.
clustered read scalable deployments in minutes with no special skills
Full integration with high availability solutions such as Red Hat's Cluster Suite
With Postgres Plus Advanced Server, you can build sharded systems or other replication architectures with a variety of bundled solutions.
Master-to-Master, and similar styled applications are fully supported
Then their marketing copy is full of shit because they promise what rather sounds like a clustered setup with very little effort, maybe not zero config but doable with no special skills.
Their marketing copy lists many things that are just "postgresql". Postgresql already gives you "clustered read scalable deployments in minutes with no special skills". Setting up a read-only replication slave is trivial.
No it is not a hard problem, it is an impossible, unsolvable problem. Software can not alter its behavior to match your desires if you do not tell it what you desire. Clustering is not some simple monolithic thing where you just "enable clustering" and that's that. There's billions of possible clustering setups. Setting it up is as simple as it can get, you have to pick which of the billions of setups you want.
>> Software can not alter its behavior to match your desires if you do not tell it what you desire
Why not have special types of transaction that would explicitly define consistency expectations across a cluster. Make the normal default mirror all data across all clusters and require a lock across the entire cluster when inserting/updating/deleting. You would then be free to alter expectations and introduce sharding to improve performance as needed.
Yes, there are billions of different setups, but there are a few basic ones, and you could start by solving one or two, before solving the generic case. For example, one could provide just a cluster setup for data replication. No fancy distributed data models, just copy the data around in the cluster, if a new node joins copy that data to it, if something fails, handle the failure. Postgresql claims to have multi-master setups, so this is really about the node handling and copying the data around.
After that, you could introduce locally distributed cluster. Later on you could introduce geographically distributed cluster setups. But just because the later is very complex, does not mean that you can't start with the basic setup.
Is it such a hard problem, that no one is able to solve it, or it is not a concern or interest of the parties that are developing Postgresql or the tools around it?
(And by hard problem, I don't mean surviving an aphyr-level diagnostics without any issues, but it it would be certainly nice to see such too.)
Edit: I'd settle for a multi-master replicated cluster, where the node failures and startups (and the migration of the data) is handled transparently in the cluster. I know that there are many other aspects, but even this basic case is painfully hard to achieve (much harder than with MySQL).