I'm not sure how that follows. Reading the linked chapter and the slideshow suggests that the target is anyone who is using a data storage system that has core data that is mutable. A significant portion of the problems discussed apply equally to nosql and sql and newsql databases.
Basically the 'lamba architecture' he refers to is event sourcing, or write-ahead logging, but with scalability in mind, and some cool hooks for maintaining correctness.
You use your hdfs store as your event log, and a couple layers to handle making the batch processing (map-reduce jobs) into real-time queryable databases (along with disposable caches that can be updated real-time to handle the stuff that comes in between batch-jobs in a real-time way).
The goal is to never lose the raw actions - so even updates to various layers (including the batch processing!) don't result in data corruption, just some time to reprocess all the raw inputs again.
I think it is a useful reference (the book), it seemed to go over a lot of things which folks who are dealing with large data sets today are already familiar with. If however you were a DBA on a typical DBMS or RDBMS system and were told to develop a "Big Data" system I could see how many of the things that Marz points out would trip you up. So I was asking if that was the target market for the book.
As to the content, see the papers on data flow architectures from the 70's and 80's [1]. They are very cool. We've done something similar at Blekko where we store raw data in a table structure and build in pre-computed results with combinators [2]. The Map/Reduce paper [3] is an excellent introduction to a number of these concepts. This is all good stuff and something that is helpful for people to have in their toolboxes. The title of the post gave me the impression that there was something new here (I'm always on the prowl for new stuff on these problems) and I didn't see what the new stuff was, it seemed like the stuff we know just presented more coherently rather than as a collection of links. Perhaps that is more clear, perhaps not.
Basically the 'lamba architecture' he refers to is event sourcing, or write-ahead logging, but with scalability in mind, and some cool hooks for maintaining correctness.
You use your hdfs store as your event log, and a couple layers to handle making the batch processing (map-reduce jobs) into real-time queryable databases (along with disposable caches that can be updated real-time to handle the stuff that comes in between batch-jobs in a real-time way).
The goal is to never lose the raw actions - so even updates to various layers (including the batch processing!) don't result in data corruption, just some time to reprocess all the raw inputs again.