Mostly unrelated, but a fun fact about quorums that I enjoy noting whenever I can because it still seems under-explored: A quorum != a majority. Currently most (all?) production implementations I've seen of RAFT and the various Paxoses use "majority" as the quorum algorithm, so the two get mostly conflated.
In my layman understanding: Given a set, a quorum is some method to choosing a sub set, such that any two such sub sets will always have at least one overlapping member.
Majority is one quorum algorithm - given a set [A,B,C], the majorities are: [A,B,C], [A,B], [A,C] and [B,C]. Any two of those sets will have at least one member overlapping.
However, majority is somewhat wasteful, because the latency of these quorum-based algorithms are almost always bound by the slowest member of the quorum - the more machines you need to wait for, the more likely one of them will be outlier-slow.
You'd potentially be better off choosing a quorum algorithm that requires less than a majority - because that'd mean, in the best case, fewer responses to wait for, lowering the probability that one of those members will be very slow. There are drawbacks to this - it makes fault tolerance and provisioning harder to calculate - but it's got some cool potential benefits.
I would argue that by the time you've chosen Paxos or some other majority quorum commit protocol, you're already well aware that you're building a CP system, and that availability and latency aren't your main concern. A majority quorum is basically the most obvious (and somewhat brute force) way of providing serializable consistency in the system.
The one non-majority quorum commit protocol that most people are probably already familiar with is the "sloppy quorum" replication in Dynamo systems[1] (e.g. Cassandra, Riak, Voldemort, etc.). Basically, since the quorum is configurable on a per-cluster basis instead of being inherent to the protocol, and usually isn't a majority of the cluster, the system can still make progress when half of the nodes are unreachable. (But of course, as the paper notes, this means that you need to resolve conflicts some other way, which adds a whole bunch of complexity.)
> you're already well aware that you're building a CP system, and that availability and latency aren't your main concern
Assuming you've chosen correctly between CP and AP approaches, this tells us that availability and latency aren't as important as consistency. But there's nothing that says they aren't arbitrarily close...
Yeah, definitely -- I agree that the decision doesn't mean to just to blindly throw away availability optimizations once you've decided that consistency is important.
Actually, invoking CAP probably didn't add to my message. What I meant to say is that people don't talk about non-majority quorum commits that much because the interesting part is that the serializability comes with majority/overlapping quorums.
As I read it, the comment you were replying to was still restricting its discussion to overlapping quorums, and merely pointing out that that's not actually synonymous with majority.
Oh man, that is a really cool paper, thanks a bunch for sharing that. I've got next week off and lots of itch to try Go for network code.. might try this out!
That's pretty interesting. Is there a concrete example of a quorum definition where the probability of an outlier is improved vs majority quorum? I'm struggling to come up with one. I've always assumed majority is optimal, since you can tolerate outliers in (n-1) / 2 voters without seeing an outlier for the commit overall. Eg: for 3 node Raft - you'd need both followers to have an outlier before a client notices a slow commit.
In my layman understanding: Given a set, a quorum is some method to choosing a sub set, such that any two such sub sets will always have at least one overlapping member.
Majority is one quorum algorithm - given a set [A,B,C], the majorities are: [A,B,C], [A,B], [A,C] and [B,C]. Any two of those sets will have at least one member overlapping.
However, majority is somewhat wasteful, because the latency of these quorum-based algorithms are almost always bound by the slowest member of the quorum - the more machines you need to wait for, the more likely one of them will be outlier-slow.
You'd potentially be better off choosing a quorum algorithm that requires less than a majority - because that'd mean, in the best case, fewer responses to wait for, lowering the probability that one of those members will be very slow. There are drawbacks to this - it makes fault tolerance and provisioning harder to calculate - but it's got some cool potential benefits.
Some cool ones to explore here: https://pdfs.semanticscholar.org/a243/7f18205414f6398b29c4f8...