You realize he works at Twitter, right? And that he's responsible for Storm, and Cascalog?
I'm not going to say anything better than it was said in the slides / book draft, so I'll just encourage you to take these techniques seriously... they're born out of necessity, not because they sound like fun, and real people are using them to solve problems that are hell to solve any other way.
That said, these are not problems that everyone has. If you're not nodding you're head along with the mutability / sharding / whatever complaints at the beginning of the deck, you can probably still get by with a more traditional architecture.
(Also, rereading... I should probably note that not everything needs to be kept forever; only the source data, since the views can be recomputed from them at any time. That makes things a bit cheaper.)
Twitter isn't a typical company. I assume they have both a budget and competent management that will let them get away with something like the Lambda architecture.
I reckon it's a lot harder to scale to even a terabyte under the constraints of a grubby setting like a datawarehouse for some instrument monitoring company.
Those guys will allow at best MS SQL for storage, and won't mind putting their developers through hell.
> They will not always work for big data, which often means "max out our storage with crap, don't worry disk is cheap" eg web metrics
Having worked on this kind of stuff myself, I'd have to argue the exact opposite. I've always ended up building something precisely like what is described when trying to tackle those kinds of problems at large scale.
They will not always work for big data, which often means "max out our storage with crap, don't worry disk is cheap" eg web metrics
When a decision maker goes "don't worry disk is unlimited", the resulting application is prone to maxing out storage.
Whenever your application maxes out your storage, you have no space for previous versions of the data.