Hacker News new | past | comments | ask | show | jobs | submit login

This is a really big question that quickly gets into distributed systems theory in general, but in part, the idea is to recognize that a lot of times the "bad stuff" is not so bad. For instance, if someone updates their profile picture on Facebook, and I make a request that should include their profile picture, but their update hasn't propagated to the node I'm reading from, I just get the old profile picture, and that's a-ok. There are definitely nasty things (like simultaneous updates) that you have to be aware of, but for a lot of applications, there are a lot of cases where out of date data makes little difference.



I guess the part I have trouble with is how do you separate what can't be delayed from what can?

Is it just a matter of keeping the information on 2 separate systems, or are there tools that let you for instance mark one query as "take your time and make sure everything is 100% up to date before committing this" and another as "return whatever you got, it's not important"?


Cassandra, MongoDB and Redis can do that to some degree.

http://cassandra.apache.org/doc/latest/architecture/dynamo.h...

http://hector-client.github.io/hector/build/html/content/con...

Tunable durability, and some minimal isolation (atomicity) guarantees for Mongo: https://docs.mongodb.com/manual/reference/write-concern/ and https://docs.mongodb.com/manual/core/read-isolation-consiste...

http://redis.io/commands/WAIT

I guess something like this can and will eventually find its way into any practical and effective distributed data store.


One solution is to cookie persist the user to the primary site and have a TTL to allow for some replication lag. Facebook does something similar here.


Depends on the database, but Cassandra, for example, has quorum mode for writes, which requires a majority of the cluster members ack the write. This can be enabled on a per-query basis, and also for reads.

The other way of doing it is things like CRDTs (https://en.wikipedia.org/wiki/Conflict-free_replicated_data_...) which have a join operation for any two data values.

You have to keep it in the back of your mind that it's a thing, but working without consistency can be done.


Quorum and CRDTs deal with completely different problems.

CRDTs do one thing... they mitigate the issue of "lost updates". All acknowledged writes will be represented in the results of a query and no "winner" strategy is involved that would cause some acknowledged writes to be possibly incorrectly dominated by others and thus lost.

Quorum (strict) just provides a very, very, very weak form of consistency in the case of concurrent readers/writers (RYW) and just very, very weak consistency in the case of serialized readers/writers (RR).

My personal opinion is that any eventually consistent distributed database that doesn't have built-in CRDTs, or the necessary facilities to build application-level CRDTs, is a fundamentally dangerous and broken database because unless all you ever write to it is immutable or idempotent data, you're going to have your database silently dropping data during operation.


... and the problem is that the new kids think all of this is acceptable in worlds where it really isn't acceptable




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: