So this might not be the right place for this, but i'm curious. How do people de...

sanderjd · on Aug 12, 2016

This is a really big question that quickly gets into distributed systems theory in general, but in part, the idea is to recognize that a lot of times the "bad stuff" is not so bad. For instance, if someone updates their profile picture on Facebook, and I make a request that should include their profile picture, but their update hasn't propagated to the node I'm reading from, I just get the old profile picture, and that's a-ok. There are definitely nasty things (like simultaneous updates) that you have to be aware of, but for a lot of applications, there are a lot of cases where out of date data makes little difference.

Klathmon · on Aug 12, 2016

I guess the part I have trouble with is how do you separate what can't be delayed from what can?

Is it just a matter of keeping the information on 2 separate systems, or are there tools that let you for instance mark one query as "take your time and make sure everything is 100% up to date before committing this" and another as "return whatever you got, it's not important"?

pas · on Aug 12, 2016

Cassandra, MongoDB and Redis can do that to some degree.

http://cassandra.apache.org/doc/latest/architecture/dynamo.h...

http://hector-client.github.io/hector/build/html/content/con...

Tunable durability, and some minimal isolation (atomicity) guarantees for Mongo: https://docs.mongodb.com/manual/reference/write-concern/ and https://docs.mongodb.com/manual/core/read-isolation-consiste...

http://redis.io/commands/WAIT

I guess something like this can and will eventually find its way into any practical and effective distributed data store.

dberg · on Aug 12, 2016

One solution is to cookie persist the user to the primary site and have a TTL to allow for some replication lag. Facebook does something similar here.

gchpaco · on Aug 12, 2016

Depends on the database, but Cassandra, for example, has quorum mode for writes, which requires a majority of the cluster members ack the write. This can be enabled on a per-query basis, and also for reads.

The other way of doing it is things like CRDTs (https://en.wikipedia.org/wiki/Conflict-free_replicated_data_...) which have a join operation for any two data values.

You have to keep it in the back of your mind that it's a thing, but working without consistency can be done.

im_down_w_otp · on Aug 13, 2016

Quorum and CRDTs deal with completely different problems.

CRDTs do one thing... they mitigate the issue of "lost updates". All acknowledged writes will be represented in the results of a query and no "winner" strategy is involved that would cause some acknowledged writes to be possibly incorrectly dominated by others and thus lost.

Quorum (strict) just provides a very, very, very weak form of consistency in the case of concurrent readers/writers (RYW) and just very, very weak consistency in the case of serialized readers/writers (RR).

My personal opinion is that any eventually consistent distributed database that doesn't have built-in CRDTs, or the necessary facilities to build application-level CRDTs, is a fundamentally dangerous and broken database because unless all you ever write to it is immutable or idempotent data, you're going to have your database silently dropping data during operation.

trollied · on Aug 12, 2016

... and the problem is that the new kids think all of this is acceptable in worlds where it really isn't acceptable

cookiecaper · on Aug 12, 2016

"Eventual consistency" usually means a few seconds rather than a few hours. If your users can tolerate hitting refresh when they see the stale data, or if your application is such that users don't have the option to use the newly-inputted or updated information for a short while, it may not be a problem.

It's probably not the type of thing you can just drop in. It will fit some use cases and not others. You'll want to design your application around an eventually-consistent system to prevent and/or handle conflicts in a sane manner.

Some systems like Riak have a built-in complex resolution scheme based on "vector clocks", which record information about the version of the data that was received and can tell which entry is newer. [0]

[0] http://docs.basho.com/riak/kv/2.1.4/developing/usage/conflic...

hinkley · on Aug 12, 2016

Always wondered this myself.

There seems to be a general fondness for writing to the database and then immediately reading back from it to make a decision. Some variant of that show up in the requirements quite often, usually in the form of confirmation pages or overview screens following right after an update, so the user knows what they did took effect.

And almost as often you'll find that some member or members of your team have made the assumption that the whole interaction will occur on one machine talking to one database instance. With session affinity you might be able to create a convincing illusion that this is working, but now with microservices you're intentionally hitting many servers. If checkout and order history are on two different servers, how does eventual consistency work?

avereveard · on Aug 12, 2016

>There seems to be a general fondness for writing to the database and then immediately reading back from it to make a decision

one way I saw banking software do it was keeping deltas of the changes thorough a wizard like process and then commit it all at once in a single transaction, at which point if there was a conflict the wizard restarted with up to date data. conflict detection was done by a sequential id keyed on the accounts touched on the operation, so that any operation modifying the account before your delta was applied would knock your commit off

hinkley · on Aug 12, 2016

Banking and the credit industry are a good example of eventually consistent systems.

But you can see the inconsistencies and you're expected to just ignore them. I guess what I'm asking is how do you push back on requirements and inattentive coworkers that make these compromises unworkable, because it seems so pervasive to me.

hvo · on Aug 12, 2016

It may interest you to study this a bit to understand MVCC (Multiversion Concurrency Control).

https://www.postgresql.org/docs/9.6/static/mvcc.html

Florin_Andrei · on Aug 12, 2016

> How do people deal with "eventual consistency"? In my head once a transaction is done, it's everywhere and everyone has access to it.

Different strokes for different folks.

Synchronous replication: the whole cluster is a single unit. COMMIT only returns when it's been truly committed to all nodes. OTOH, it's slow, and may or may not tolerate partitions.

Asynchronous replication: COMMIT returns when it's gone into the current node. It is then the cluster's job to propagate it to the rest. This is a lot faster, and can deal with network partitions. OTOH there is usually a state lag between distant nodes.

Different applications may need either sync or async, depending on requirements.