Hacker News new | past | comments | ask | show | jobs | submit login

Using timestamps and last write wins is a well known and well documented behavior, so you must assume updates can be sometimes applied in a different order than they were submitted. However, reordering updates is not the same as rolling back writes as was in case of Mongo. Cassandra does not promise linearizability in this mode of operation (if not using LWTs), and this is a tradeoff to get better availability. There is no better way than last write wins if you want high availability and partition tolerance and don't want to pay the performance and availability price for a consensus algorithm like Paxos or Raft. And no, vector clocks do not solve this problem in practice at all.

As for the post on HN from a few months ago, I remember only one guy who mixed LWT and non LWT updates and was surprised with lack of linearizability. Not a Cassandra fault if somebody doesn't know what he's doing.




> assume updates can be sometimes applied in a different order than they were submitted

I suggest you re-read the analysis. Some databases can offer safe, generalized commutative updates; e.g. Riak. Cassandra can't: updates, in general, can be lost through reordering.

> There is no better way than last write wins if you want high availability and partition tolerance and don't want to pay the performance and availability price for a consensus algorithm like Paxos or Raft.

There is. There's a whole field of research devoted to this problem. http://hal.upmc.fr/inria-00555588/document


You can have safe updates through clustering columns or if you really insist on destructive updates - through LWTs. With clustering columns you can easily achieve whatever is possible with vector clocks.

http://www.datastax.com/dev/blog/why-cassandra-doesnt-need-v...

As for the research you posted, there is no free lunch. Each of these strategies come with their own set of drawbacks. That's why Cassandra offers choice at a query level.


Clustering columns do not make Cassandra updates any safer: they only reduce the scope of conflicts. It's still last-write-wins--the approach that Cassandra's own blog recognizes as "a high potential for data loss".

That's why Cassandra offers choice at a query level.

It doesn't give you a choice: you get last-write-wins, or some limited merge functions with CQL types. You can't get generalized CRDTs in Cassandra, because they won't expose conflicts to the user layer. Cassandra gives up on a whole class of safe AP datatypes as a consequence of this restriction.


Now you are arguing about a lack of a particular builtin feature. Riak is last-write-wins by default with optional vector clocks and builtin CRDTs. Cassandra is last-write-wins by default with clustering columns allowing to implement an equivalent of Riak vector clocks and CRDTs in the application. If you include a client id in the primary key and use client-side timestamps, you essentially are doing vector clocks and there are no conflicts guaranteed - users of Cassandra have been doing it for years.


> they won't expose conflicts to the user layer.

They will, you'll see them as separate rows of the same partition, and then you can merge them as you wish in the application.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: