Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Learn Riak, A Fully Distributed, Erlang-Based Document Store (dreverri.github.com)
23 points by roder on Nov 10, 2009 | hide | past | favorite | 12 comments


What does Riak do over CouchDB or MongoDB?

Clarification: even if it doesn't, that does not mean it's not a worth while project. Competition is always good, imo :)


Riak is actually a distributed datastore, where as CouchDB & MongoDB are replication based.

For example, there are no special nodes in Riak and no one node has all the data. When a node joins a riak cluster, it begins to share and participate in the cluster.

CouchDB replicates the entire dataset from to another, in MongoDB you need to do sharding (which will be implemented in the 1.1 release from what I understand).

Both are Couch & Mongo are pretty awesome kv stores, but neither implement a distributed datastore like Riak (or Cassandra).


How is cross system locking of nodes handled? To write or read, do all nodes need to be in sync?


Are you familiar with the CAP theorem? You can't have all three, but you can have 2. Most distributed datastores, pick for you which two of the three you'll get.

One of Riak's goals is to make the choice transparent to the user per document and let's the user select which they want. There's some more information about that on their website: http://riak.basho.com/cap.html

EDIT: this is good info about it too http://riak.basho.com/nyc-nosql/


The reason I asked is you brought up CouchDB isn't continuous where yours is, but you approach CAP in the same way CouchDB does. So the difference between continuous and scheduled replication is mostly syntax sugar. CouchDB can be cron scheduled to replicate across or pull from continuously in a similar fashion for similar functionality.


Hmm, it's not really the difference between continuous and scheduled... Basically, Couch is a shared-nothing system, meaning, it retains all the data on one node. Then you can easily replicate (say via scheduled task w/ cron) the data from node to node.

Riak is a distributed system, where each node that joins becomes part of a cluster of nodes and shares everything (data, work, etc).


Ok, thanks for clarifying :)


So this isn't a shared-nothing system? How do they handle failing nodes, and prevent data loss?


Riak creates vnodes, or virtual nodes for each node. These vnodes make up what's called a ring server. So you may have 1024 vnodes in say, 4 bare-metal nodes.

I pointed this presentation out below, but I'll point it out again, because it's a really good presentation to understand riak: http://riak.basho.com/nyc-nosql/


Thanks for the link, bonus points for the video being on vimeo!


It seems MongoDB and CouchDB have better documentation.

This article talks about configuring things like the gossip protocol. Does anyone know of a good article on Riak that shows how to store and retrieve data for something like a blog?


Short answer is: no, not at the moment.

MongoDB & CouchDB have a lot more history as open source project. Riak, being recently released into the wild, will have its various how-tos and documentation grow in time.

But lets say you wanted to do a blog in python, the client libs they provide are pretty well documented inline. Check it out: http://bitbucket.org/justin/riak/src/tip/client_lib/jiak.py




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: