well corosync/pacemaker is definitly not the same as zk/etcd/consul. STONITH is ...

Sinjo · on Dec 6, 2017

It's a little lost in another comment thread (https://news.ycombinator.com/item?id=15862584), but I'm definitely excited about solutions like Patroni and Stolon that have come along more recently.

merb · on Dec 6, 2017

Well you should definitly look into them. In the past we used corosync/pacemaker a lot (even for different things than just database-ha) but trust me... it was never a sane system. if it ain't broke it worked. if something broke it was horrible to actually get back to any sane state at all.

we migrated to patroni (yeah stolon is cool aswell, but since it's a little bit bigger than we need to we used patroni). the hardest part for patroni is actually creating a script which would create service files for consul (consul is a little bit wierd when it comes to services) or somehow changes dns/haproxy whatever to point to the new master (this is not a problem on stolon)

but since then we tried all sorts of failures and never had a problem. we pulled plugs (hard drive, network, power cord) nothing bad did happen no matter what we did. watchdog worked better than expected in some cases where we tried to fire bad stuff at patroni/overload it. and since it's in python the charactaristic/memory/cpu usage is well understood. (the code is also easy to reason about, at least better than corosync/pacemaker.) etcd/zk/consul is battle tested and did work even that we have way more network partitions than your typical network (this was bad for galera.. :(:() we never autostart a failed node after a restart/clean start. we always look into the node and manually start patroni. and also we use the role_change/etc hooks to create/delete service files in consul and to ping us if anything on the cluster happens.

nh2 · on Dec 7, 2017

I am currently using Stolon with synchronous replication for a setup, and overall it's great.

It gives me automated failover, and -- perhaps more imporatantly -- the opportunity to exercise it a lot: I can reboot single servers willy-nilly, and do so regularly (for security updates every couple days).

I picked the Stolon/Patroni approach over Corosync/Pacemaker because it seems simpler and more integrated; it fully "owns" the postgres processes and controls what they do, so I suspect there is less chance to accidentally mis-configurations in the fashion of what the article describes.

I currently prefer Stolon over Patroni because statically typed languages make it easier to have less bugs (Stolon is Go, Patroni is Python), and because the proxy it brings out of the box makes it convenient: On any machine I connect to localhost:5432 to get to postgres, and if the Postgres fails over, it ensures to disconnect me so that I'm not accidentally connected to a replica.

In general, the Stolon/Patroni approach feels like the "right way" (in absence of failover being built directly into the DB, which would be great to have in upstream postgres).

Cons:

Bugs. While Stolon works great most of the time, every couple months I get some weird failure. In one case it was that a stolon-keeper would refuse to come back up with an error message, in another that a failover didn't happen, in a third that Consul stopped working (I suspect a Consul bug, the create-session endpoint hung even when used via plain curl) and as a result some stale Stolon state accidentally accumulated in the Consul KV store, with entries existing that should not be there and thus Stolon refusing to start correctly.

I suspect that, as with other distributed systems that are intrinsically hard to get right, the best way to get rid of these bugs is if more people use Stolon.

CyberDem0n · on Dec 7, 2017

> I currently prefer Stolon over Patroni because statically typed languages make it easier to have less bugs (Stolon is Go, Patroni is Python)

Sounds like a holy-war topic :) But lets be serious. How statically typed language helps you to avoid bugs in algorithms you implement? The rest is about proper testing.

> and because the proxy it brings out of the box makes it convenient: On any machine I connect to localhost:5432 to get to postgres

It seems like you are running a single database cluster. When you'll have to run and support hundreds of them you will change your mind.

> if the Postgres fails over, it ensures to disconnect me so that I'm not accidentally connected to a replica.

HAProxy will do absolutely the same.

> Bugs. While Stolon works great most of the time, every couple months I get some weird failure. In one case it was that a stolon-keeper would refuse to come back up with an error message, in another that a failover didn't happen, in a third that Consul stopped working (I suspect a Consul bug, the create-session endpoint hung even when used via plain curl) and as a result some stale Stolon state accidentally accumulated in the Consul KV store, with entries existing that should not be there and thus Stolon refusing to start correctly.

Yeah, it proves one more time: * don't reinvent wheel: HAProxy vs stolon-proxy * using statically typed language doesn't really help you to have less bugs

> I suspect that, as with other distributed systems that are intrinsically hard to get right, the best way to get rid of these bugs is if more people use Stolon.

As I've already told before. We are running a few hundred Patroni clusters with etcd and a few dozen with ZooKeeper. Never had such strange problems.

merb · on Dec 7, 2017

> > if the Postgres fails over, it ensures to disconnect me so that I'm not accidentally connected to a replica.

> HAProxy will do absolutely the same.

well I think that is not the same what stolon-proxy actually provides. (actually I use patroni) but if your network gets split and you end up with two masters (one application writes to the old master) there would be a problem if one application would still be connected to the splitted master.

however I do not get the point, because etcd / consul would not allow to still hold the master role which means that the splitted master would lose the master role and thus either die, because it can not connect to the new master or just be a read slave and the application would than probably throw errors if users are still connected to the splitted application. highly depends how big your etcd/consul is and how good your application detects failures. (since we are highly dependent on our database we actually kill hikaricp (java) in case of too many master write failures and just restart it after a special amount of time. well we also look in creating a small lightweight async driver based on akka, where we do this in a little bit more automated fashion.)

CyberDem0n · on Dec 7, 2017

> well I think that is not the same what stolon-proxy actually provides. (actually I use patroni) but if your network gets split and you end up with two masters (one application writes to the old master) there would be a problem if one application would still be connected to the splitted master.

On network partition Patroni will not be able to update leader key in Etcd and therefore restart postgres in read-only mode (create recovery.conf and restart). No writes will be possible.

benmmurphy · on Dec 7, 2017

it would be interesting to know how stolon/patroni deal with the failover edge cases and how this impacts availability. like if you accessing the DB but can't contact etcd/consul then you should stop accessing the DB because you might start doing unsafe writes. but this means that consul/etcd is now a point of failure (though, this usually runs multiple nodes so shouldn't happen!). but you can end up in a situation where bugs/issues with the HA system ends up causing you more downtime than manual failover would cause.

you also have to be careful with ensuring there is sufficient time gaps when failing over to cover the case when the master is not really down and connections are still writing to it. like the patroni default haproxy config doesn't even seem to kill live connections which seems kind of risky.

CyberDem0n · on Dec 7, 2017

> if you accessing the DB but can't contact etcd/consul then you should stop accessing the DB because you might start doing unsafe writes.

If patroni can't update leader lock in Etcd, it will restart postgres in read-only mode. No writes will happen.

> like the patroni default haproxy config doesn't even seem to kill live connections which seems kind of risky.

That's not true: https://github.com/zalando/patroni/blob/master/haproxy.cfg#L...

benmmurphy · on Dec 7, 2017

Ah. Thanks. I was looking at the template files but I guess that is not used or used for something else.

Sinjo · on Dec 6, 2017

Thanks for the extra info, and the insight into how you're using Patroni. Always helpful to hear about someone using it for real, especially someone who's come from Pacemaker. :)