I'm sure CouchDB is great, but this book, at least the first part, is really bad, full of half-truths and sometimes just plain wrong. I'm amazed O'Reilly is publishing this. Is there no editor involved to catch these mistakes?
In the section "Self-Contained Data" the author makes the argument that some data is better stored as a document than a bunch of relations, and uses INVOICES as an example: "Accountants appreciate the simplicity of having everything in one place. And given the choice, programmers appreciate that, too.". Invoices (and other business data) are the classical example of something that belongs in an RDBMS and not in something like CouchDB. It's not that hard to come up with a good examples for "relaxed data", like... say... blog posts, or blog comments, or twitter messages, or search results, or log entries?
Later, it gives incorrect/meaningless definitions for Consisteny, Availability and Partition Tolerance. It then completely confuses the CAP theorem, incorrectly placing Paxos (puts in under CA, should be CP). It also places RDMBSs on the figure, which is silly, "Relational Database Management System" in itself says nothing about how it is distributed, if even.
Invoices seem like a pretty good candidate for a document DB to me. It's much, much easier to store a list of "line items" in a single document than it is to extract those out into a properly normalized structure. What advantage does having that data across multiple tables give you?
Blog comments, on the other hand, might benefit from more normalization. Cramming a bunch of comments into a post document could result in contention problems, depending on the engine and how you're handling them. In CouchDB, you'd end up having to update the entire post document for each new/changed comment. Mongo handles it better, since you can atomically push new comments into your doc. Still, I think I'd rather have them in their own documents on a busy site.
The Key benefit of Relational Databases is to maintain integrity. And they do this primarily via 2 concepts: (1) Normalization: storing each fact in the db once (2) Referential Integrity.
In an Invoices you have an Invoice Header, that should refer to multiple entities, buyer, seller, shipment address etc ... . Invoice detail that refer to the Header and represent a one to many relation between the data in the invoice header and the items in the invoice. Which may also have additional relationship attributes, like price, discount or any details specific to the item, for example, maybe your DB support a shipment address per item!
Anyway, in a relation DB an invoice refer to many entities (separate facts) and many relations (also separate facts) each worthy of its own table that refer to each other.
If you believe that Invoices are a good candidate for a document DB, you probably don't believe that the Relational Model is valid in general. Or that the 2 concepts I mentioned at first really help integrity!
The main flaw I see is that the relational model makes is hard to create dynamic models. A good Relational Model practice is that an entity in a Relational DB should represent an entity from your Universe Of Discourse (uod). That is the say, a model is better when tables represent real entities of the problem you are modelling! This is sometimes impossible when you want to store dynamic structures, some argue that this is not the fault of the Relational Model theory, but rather its implementations.
No offense, but putting critical financial data like invoices into a schema-less datastore strikes me as short-sighted and risky in the extreme. Best case, you're making a bunch of extra work for your accountant; worst case, you could completely fail to impose the structure needed to do basic reporting and billing. Financial data is highly structured already; I'm not sure why you'd throw away all that nice regularity and structure in order to save a few inner queries.
Also, in the case of a simple blog app, I don't think that contention is going to be your biggest worry. The on-disk structures for CouchDB are append-only, so your biggest worry isn't going to be locking, it's going to be the stale document revisions taking up disk space between vacuum operations, and the replication overhead for all the intermediate versions.
You might have missed it, but the authors encourage you to contact them and give them feedback. It is probably to late for the written/published book by O'Reilly, but perhaps your feedback will be used to alter the online version. You might want to give it a try and let them know?
Excuse me? The book costs GBP40 or 30 for the PDF version. Don't tell me to relax. Relax is for free crappy blog posts, not technical trade literature.
Judging by online version calling this book a definitive guide is somehow misleading. There are still many crucial details about CouchDB that you cannot find in this book and you have to search for them in mailing lists/blogs/wikis. I think a title like Starting Couchdb with Couchapp fits better for this book.
Yes, but MongoDB is so easy to use, at least with Ruby and the mongo and mongo_record gems, that it would be a very short book. BTW, I just wrote a 3 part article for developer.com that has CouchDB and MongoDB examples.
It looks like they put it in the Definitive Guide series which then has some strong requirements around the brand and most likely was the cause of the name change. It's probably a good thing for the authors as it is a respected series and will probably result in more sales.
In the section "Self-Contained Data" the author makes the argument that some data is better stored as a document than a bunch of relations, and uses INVOICES as an example: "Accountants appreciate the simplicity of having everything in one place. And given the choice, programmers appreciate that, too.". Invoices (and other business data) are the classical example of something that belongs in an RDBMS and not in something like CouchDB. It's not that hard to come up with a good examples for "relaxed data", like... say... blog posts, or blog comments, or twitter messages, or search results, or log entries?
Later, it gives incorrect/meaningless definitions for Consisteny, Availability and Partition Tolerance. It then completely confuses the CAP theorem, incorrectly placing Paxos (puts in under CA, should be CP). It also places RDMBSs on the figure, which is silly, "Relational Database Management System" in itself says nothing about how it is distributed, if even.
Part I/1+2 needs major rewriting.