I prefer the walkthrough of the CAP theorem in “Designing Data Intensive Applications,” which says that the CAP “theorem” is an undefined and generally unhelpful concept in distributed systems. “Available” has no formal definition.
And it’s confusing that it’s three letters, because it’s not “choose two”. It’s not that a system is “partition tolerant,” it’s if there’s a network partition, you choose availability or consistency. And obviously you choose availability for the distributed systems you most commonly encounter.
> And obviously you choose availability for the distributed systems you most commonly encounter.
fail safe (rather than fail open) is generally regarded as the most acceptable thing we can do as CRUD programmers; that’s true, however it’s categorically untrue that this is a foregone conclusion.
There are many cases (especially in financial services) were failing safe is a much preferable option and having retry logic in application is much preferred
In distributed banking back-ends, if one of your datacenters goes down, you don't take down the other one for safety. You don't force global consistency/linearity of all transactions before allowing UI updates. There are delays in financial reconciliation all the time, the important thing is they are eventually consistent with a ledger, not that if one thing fails, you stop the train. And the reality of distributed systems is things fail constantly. Hard drives, networks, bugs, clock drift...
This is in contrast to something like a supercomputer, or a distributed map-reduce job, where if one node fails as part of a distributed process, it will corrupt your data, and you have the luxury to stop the whole thing, fix the issue, and restart the whole process.
"No formal definition" includes where we are today, not the original, outdated idea. By the original (not useful) definition, an "available" distributed system can return a response 10 years later, which is not helpful nor relevant when thinking about distributed systems.
What has changed that makes the idea out-dated? You can argue that it was always unhelpful but I can't see how it could be out-dated.
You're correct that the theorem doesn't address latency requirements. There are all kinds of things it doesn't address. The point of it is simply as you say - you must give up consistency to have availability in the face of a partition, or vice-versa. Some vendors of distributed systems would have us believe otherwise. This theorem gives us a framework to understand at a very basic level the trade-offs that must be made in distributed system implementations. That isn't really very much for as much airtime as it got, which I suspect is what your contention is with.
And it’s confusing that it’s three letters, because it’s not “choose two”. It’s not that a system is “partition tolerant,” it’s if there’s a network partition, you choose availability or consistency. And obviously you choose availability for the distributed systems you most commonly encounter.