In fact, convergence is a very easy property to preserve in all distributed syst...

josephg · on Jan 31, 2023

> From my perspective, CRDTs are useful for a lot of kinds of data.

Yep I 100% agree.

I think the highest value uses for technology like this are in creative applications. I think about wikis, blogs, shared whiteboards, music production and video editing. In all of these cases, "referential integrity" (database constraints) don't really matter that much, and the working set is usually pretty small.

Sketch was outcompeted by Figma because figma used a CRDT as its backend, which enabled it to be collaborative. Sketch had an arguably better product, and was first to market. But it was stuck in the single-editor model because they didn't have a tool like automerge.

As for conflicts, increasingly my favorite CRDT for "general purpose" data (JSON trees) is MVRegisters. In the case of a conflict, a MV (Multi-value) register stores all of the conflicting values. But the application doesn't have to care - we can still treat it like a "single writer wins" register.

To make this work, the CRDT provides two APIs: a simple API and a complex API:

- The simple API just gives the application "the current value". In the case of concurrent edits, the system quietly chooses a winner. This is enough for most software most of the time. Its certainly enough to get started.

- The complex API returns all current values when a conflict has happened. Applications further along in their development lifecycle can use this API to present conflicts to the user and ask the user what should happen. (Or the application can resolve the conflict itself using application-specific logic).

The nice thing about this approach is that the data itself doesn't have to change. Its just an application / UI change to show conflicts. So collaborative applications can be written without caring about conflicts (at first). And later, when conflicts between multiple users cause problems, the applications can move to a richer API if they want to. (And remember, it all works like git under the hood anyway. We can store the full history so even when conflicts are resolved in a weird way, you still haven't lost the users' original edits.)

pharmakom · on Jan 31, 2023

Your mention of Git reminds me of CI and makes me think of a general strategy:

1. Allow the user (of the CRDT library) to define a fitness function that should be minimised

2. When multiply valid merges are possible, pick the result according to the fitness function

crabmusket · on Jan 31, 2023

> Most CRDTs aim to preserve causality: if I see your change, and then make my change, my new value will win. If we both make changes without knowing about each other, that's a conflict.

I haven't kept track of CRDTs since I worked with them in ~2015 and having read the paper by Shapiro et al, but I thought a casual description would be more along the lines of "once we both receive each other's changes, we will agree on the final state"? Or does that no longer reflect current state of the art, or was I just mistaken at the time?

lll-o-lll · on Jan 31, 2023

Would you say that automerge is useful for applications that don’t involve a human? I’m imagining a cluster of “service registry” services that use automerge as a way to manage shared state between them. There wouldn’t be a human to fix a merge conflict, so all possible merge outcomes would need to be well defined.

The CRDT examples I see are all oriented around human collaboration, are they a bad choice for something more akin to a distributed database?

pharmakom · on Jan 31, 2023

There is a talk about using CRDT across a server cluster to maintain a social media “like” counter