"Conflating the storage of data with how it is queried" is a misinformed critici...

nathanmarz · on Sept 26, 2012

It's not a criticism of the relational model. It's a criticism of the relational database as they exist in the world. The normalization vs. denormalization slides explain how these concerns are conflated. The fact that you have to denormalize your schema to optimize queries clearly shows that the two concerns are deeply complected.

ucee054 · on Sept 26, 2012

Your ideas will work very well for small data, big storage. eg daily stock price histories

They will not always work for big data, which often means "max out our storage with crap, don't worry disk is cheap" eg web metrics

When a decision maker goes "don't worry disk is unlimited", the resulting application is prone to maxing out storage.

Whenever your application maxes out your storage, you have no space for previous versions of the data.

bkirwi · on Sept 26, 2012

You realize he works at Twitter, right? And that he's responsible for Storm, and Cascalog?

I'm not going to say anything better than it was said in the slides / book draft, so I'll just encourage you to take these techniques seriously... they're born out of necessity, not because they sound like fun, and real people are using them to solve problems that are hell to solve any other way.

That said, these are not problems that everyone has. If you're not nodding you're head along with the mutability / sharding / whatever complaints at the beginning of the deck, you can probably still get by with a more traditional architecture.

(Also, rereading... I should probably note that not everything needs to be kept forever; only the source data, since the views can be recomputed from them at any time. That makes things a bit cheaper.)

ucee054 · on Sept 26, 2012

> problems that are hell to solve any other way.

'Bigness' of data != data size

'Bigness' of data == data size / budget

Twitter isn't a typical company. I assume they have both a budget and competent management that will let them get away with something like the Lambda architecture.

I reckon it's a lot harder to scale to even a terabyte under the constraints of a grubby setting like a datawarehouse for some instrument monitoring company.

Those guys will allow at best MS SQL for storage, and won't mind putting their developers through hell.

cbsmith · on Sept 26, 2012

> They will not always work for big data, which often means "max out our storage with crap, don't worry disk is cheap" eg web metrics

Having worked on this kind of stuff myself, I'd have to argue the exact opposite. I've always ended up building something precisely like what is described when trying to tackle those kinds of problems at large scale.

lisper · on Sept 26, 2012

> complected

This word, I don't think it means what you think it means.

ebiester · on Sept 26, 2012

Perhaps my DBA-fu is limited, but what databases are doing soft real time materialized view maintenance? The big boys seems to do recomputation for materialized views rather than updating the materialized view on insert or update. I tried to google for this, but saw only research papers rather than implementation.

I know that in places I worked, Materialized views were mostly limited in the application realm because too many of them over enough data brought the DB to its knees.

Evbn · on Sept 27, 2012

Oracle updates views on update. Pstgres allows you to write a trigger to do the same.

bcoates · on Sept 26, 2012

Has anyone come up with a good explanation for why, given the recent proliferation of databases, nobody seems to be making anything more relational than the SQL DBMSes?

It should be easier than ever, particularly if you cheat and declare rotational media an unsupported legacy format.

plam · on Sept 26, 2012

more relational -> neo4j perhaps? not sure what you mean by rotational media

Devilboy · on Sept 26, 2012

Rotational media = spinning disks

Evbn · on Sept 27, 2012

What is more relational?

bcoates · on Sept 27, 2012

As in Codd's 12 rules above.