Learn Riak, A Fully Distributed, Erlang-Based Document Store

mey · on Nov 10, 2009

What does Riak do over CouchDB or MongoDB?

Clarification: even if it doesn't, that does not mean it's not a worth while project. Competition is always good, imo :)

roder · on Nov 10, 2009

Riak is actually a distributed datastore, where as CouchDB & MongoDB are replication based.

For example, there are no special nodes in Riak and no one node has all the data. When a node joins a riak cluster, it begins to share and participate in the cluster.

CouchDB replicates the entire dataset from to another, in MongoDB you need to do sharding (which will be implemented in the 1.1 release from what I understand).

Both are Couch & Mongo are pretty awesome kv stores, but neither implement a distributed datastore like Riak (or Cassandra).

mey · on Nov 10, 2009

How is cross system locking of nodes handled? To write or read, do all nodes need to be in sync?

roder · on Nov 10, 2009

Are you familiar with the CAP theorem? You can't have all three, but you can have 2. Most distributed datastores, pick for you which two of the three you'll get.

One of Riak's goals is to make the choice transparent to the user per document and let's the user select which they want. There's some more information about that on their website: http://riak.basho.com/cap.html

EDIT: this is good info about it too http://riak.basho.com/nyc-nosql/

mey · on Nov 10, 2009

The reason I asked is you brought up CouchDB isn't continuous where yours is, but you approach CAP in the same way CouchDB does. So the difference between continuous and scheduled replication is mostly syntax sugar. CouchDB can be cron scheduled to replicate across or pull from continuously in a similar fashion for similar functionality.

roder · on Nov 10, 2009

Hmm, it's not really the difference between continuous and scheduled... Basically, Couch is a shared-nothing system, meaning, it retains all the data on one node. Then you can easily replicate (say via scheduled task w/ cron) the data from node to node.

Riak is a distributed system, where each node that joins becomes part of a cluster of nodes and shares everything (data, work, etc).

mey · on Nov 10, 2009

Ok, thanks for clarifying :)

oomkiller · on Nov 10, 2009

So this isn't a shared-nothing system? How do they handle failing nodes, and prevent data loss?

roder · on Nov 10, 2009

Riak creates vnodes, or virtual nodes for each node. These vnodes make up what's called a ring server. So you may have 1024 vnodes in say, 4 bare-metal nodes.

I pointed this presentation out below, but I'll point it out again, because it's a really good presentation to understand riak: http://riak.basho.com/nyc-nosql/

oomkiller · on Nov 12, 2009

Thanks for the link, bonus points for the video being on vimeo!

va_coder · on Nov 10, 2009

It seems MongoDB and CouchDB have better documentation.

This article talks about configuring things like the gossip protocol. Does anyone know of a good article on Riak that shows how to store and retrieve data for something like a blog?

roder · on Nov 10, 2009

Short answer is: no, not at the moment.

MongoDB & CouchDB have a lot more history as open source project. Riak, being recently released into the wild, will have its various how-tos and documentation grow in time.

But lets say you wanted to do a blog in python, the client libs they provide are pretty well documented inline. Check it out: http://bitbucket.org/justin/riak/src/tip/client_lib/jiak.py