Massive CouchDB Brain Dump

duck · on April 9, 2010

I've never thought about posting one of the many brain dumps I put in txt files because I figured it would only make sense to me in such a rough draft... but after reading this I will have to start. Very helpful overview of all aspects of CouchDB / document based storage.

po · on April 9, 2010

"if you need a change in your schema, it's dead simple to do--just start using the new schema if you don't care that the old documents don't have the new field, you don't have to worry about them"

For me the scary part is that I usually do care. How do you stop caring about this? Doesn't this mean your application code starts to get populated with if/then/else? Is there a best-practice for backfilling data when the (effective) schema changes?

papaf · on April 9, 2010

I've seen many systems that use relational databases be patched up with if (null)/then/else in the application where fields have been added and the change was not important enough to justify a data migration. I wouldn't say this practice is OK just that it happens quite a bit in my experience.

If I was using couchdb I would assume that all but the most fundamental fields were optional and have the application act accordingly.

po · on April 9, 2010

Sure... I've seen that a ton. I've had to resort to writing a bunch of code like that as well.

As an example of what I mean, there are some places where you can simplify by knowing that a relationship is 1:1 instead of 1:n because based on the current schema it's impossible for it not to be 1:1. You don't need an if/then or any logic in your application when you access that field.

This means if you want to make it 1:n you have to change the schema and deal with all of the code that will break due to that change. Once that is done, then your application can input lists of items into the storage.

In the document oriented approach, is seems like you (or any code touching the document type) just start inputting lists. This makes changing the schema a non-issue. It is up to the code pulling data to always assume the data could be null, single or a list of items.

In the couchDB model it seems the schema is implicitly defined by the behavior of the application. I don't really know what that means for the application. I'm sure the tradeoff is worth it in a lot of cases, I'm just saying that this is one of the parts where my imagination is failing me and the unknown-unknowns are great.

It seems like here the skills of writing a great REST API are more relevant than traditional data modeling skills.

alexpopescu · on April 9, 2010

I totally agree. It sounds everyone is thinking that just because you can put anything inside your store that's also a good idea. And it definitely is not. I am trying to underline this quite often: http://nosql.mypopescu.com/post/507962068/the-role-of-data-m...

jmah · on April 9, 2010

I found Damien Katz's talk about quitting and risking it on CouchDB to be really great (from resources): http://www.infoq.com/presentations/katz-couchdb-and-me

andrewvc · on April 9, 2010

I'd be interested to hear a little bit about choosing between CouchDB, MongodB, and others. Couch and Mongo are the two I'm most interested in, but I haven't had the time to really look at either closely.

nicpottier · on April 9, 2010

I'd definitely look at Mongo first. Couch is ridiculously simple, too much so. It doesn't do locking because it versions every single record, so if you do updates very frequently you are going to have to clean your DB really often. Their queries are less flexible than Mongo's as well.

Couch is interesting in a way, but I feel like the set of problems that fit it well is far smaller than Mongo's. (and that that set is far smaller than SQL still, but that's another matter)

Mongo is also way way faster, though the API isn't as 'cute' by far. I feel like the great web UI for couch, along with their heavy javascript/json usage is the main reason it gets so much traction.

andrewvc · on April 9, 2010

Thanks for the helpful info, I do still have one reservation about mongo however.

There is one thing that couch does to better than mongo, and thats single server durability. You can pretty much kill -9 couch and it won't care (that's pretty much how it stops itself).

Now, I understand mongodb's stance on why you should be running multiple servers for your high performance app, but for a lot of apps that don't need HA that's just overkill and a pain to configure (and extra $$$ for more hardware). In fact, I'd go so far as to say thats the case for most apps. If I recall they're working on it at the moment though.

lurkerperpetual · on April 9, 2010

They have a syncdelay parameter which you can tweak to flush to disk as often as you want http://www.mongodb.org/display/DOCS/Durability+and+Repair

andrewvc · on April 9, 2010

that's all I needed to hear, I'll likely go with mongodb for my next project now.

jchrisa · on April 9, 2010

One big thing that only Couch does is offline replication. You might shrug and think "so what" but once you've had a taste of how relaxing it is to be able to shovel data around whenever and where ever you want, it's very hard to go back.

lurkerperpetual · on April 9, 2010

Here's a comparison http://www.mongodb.org/display/DOCS/Comparing+Mongo+DB+and+C...

smharris65 · on April 9, 2010

While this comparison does point out the major differences between the two, it is written by the MongoDB team and is definitely phrased in a way to favor MongoDB.