Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

"Conflating the storage of data with how it is queried" is a misinformed criticism of the relational model. Codd's Relational Algebra (Turing Award material) was in large part a move towards data independence, relaxing what was previously a tight coupling of storage format and application access patterns. Take a look at Rules 8 and 9, or just read the original paper: http://en.wikipedia.org/wiki/Codds_12_rules

> "Activities of users at terminals and most application programs should remain unaffected when the internal representation of data is changed and even when some aspects of the external representation are changed." http://www.seas.upenn.edu/~zives/03f/cis550/codd.pdf

It's not clear from the slides how RDBMSs manage to conflate these two concerns in practice.

It's also unclear to me how the "Lambda Architecture" differs from what we've been calling [soft real-time] materialized view maintenance for decades.



It's not a criticism of the relational model. It's a criticism of the relational database as they exist in the world. The normalization vs. denormalization slides explain how these concerns are conflated. The fact that you have to denormalize your schema to optimize queries clearly shows that the two concerns are deeply complected.


Your ideas will work very well for small data, big storage. eg daily stock price histories

They will not always work for big data, which often means "max out our storage with crap, don't worry disk is cheap" eg web metrics

When a decision maker goes "don't worry disk is unlimited", the resulting application is prone to maxing out storage.

Whenever your application maxes out your storage, you have no space for previous versions of the data.


You realize he works at Twitter, right? And that he's responsible for Storm, and Cascalog?

I'm not going to say anything better than it was said in the slides / book draft, so I'll just encourage you to take these techniques seriously... they're born out of necessity, not because they sound like fun, and real people are using them to solve problems that are hell to solve any other way.

That said, these are not problems that everyone has. If you're not nodding you're head along with the mutability / sharding / whatever complaints at the beginning of the deck, you can probably still get by with a more traditional architecture.

(Also, rereading... I should probably note that not everything needs to be kept forever; only the source data, since the views can be recomputed from them at any time. That makes things a bit cheaper.)


> problems that are hell to solve any other way.

'Bigness' of data != data size

'Bigness' of data == data size / budget

Twitter isn't a typical company. I assume they have both a budget and competent management that will let them get away with something like the Lambda architecture.

I reckon it's a lot harder to scale to even a terabyte under the constraints of a grubby setting like a datawarehouse for some instrument monitoring company.

Those guys will allow at best MS SQL for storage, and won't mind putting their developers through hell.


> They will not always work for big data, which often means "max out our storage with crap, don't worry disk is cheap" eg web metrics

Having worked on this kind of stuff myself, I'd have to argue the exact opposite. I've always ended up building something precisely like what is described when trying to tackle those kinds of problems at large scale.


> complected

This word, I don't think it means what you think it means.


Perhaps my DBA-fu is limited, but what databases are doing soft real time materialized view maintenance? The big boys seems to do recomputation for materialized views rather than updating the materialized view on insert or update. I tried to google for this, but saw only research papers rather than implementation.

I know that in places I worked, Materialized views were mostly limited in the application realm because too many of them over enough data brought the DB to its knees.


Oracle updates views on update. Pstgres allows you to write a trigger to do the same.


Has anyone come up with a good explanation for why, given the recent proliferation of databases, nobody seems to be making anything more relational than the SQL DBMSes?

It should be easier than ever, particularly if you cheat and declare rotational media an unsupported legacy format.


more relational -> neo4j perhaps? not sure what you mean by rotational media


Rotational media = spinning disks


What is more relational?


As in Codd's 12 rules above.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: