We invented RDMS databases to store large amounts of important data safely on ex...

fauigerzigerk · on Sept 26, 2012

Just a few facts that I have to deal with:

1) I can't afford a million machines or hundereds of clusters or keeping petabytes of data around in a way that doesn't make the data completely useless. Yes mutable state causes complexity, but I can't afford to rid myself of that problem by using brute force.

2) Performance is not replaceable by throughput if you have users waiting for answers on questions they just dreamt up a second ago and you have new data coming in all the time.

3) Cache locality and immutability don't go well together. Many indexes will always have to be updated, not just replaced wholesale with an entirely new version of the index.

cbsmith · on Sept 26, 2012

1) Even with hard drive prices inflated today, we're at over 9GB/$1 for storage. That means for $1 you can store 1.25 billion 64-bit integers. From that perspective, there is very little data (at least in terms of traditional RDBMS data) that is actually cost effective to throw out. To the extent that you do need to throw it out, you can wait until all transactions involving the data have long since completed.

2) I think you misunderstood what was meant here. His point was that your limiting factor really becomes throughput if you have the right architecture.

3) Indexes and cache locality also tend not to play well together. ;-)

confluence · on Sept 26, 2012

1) Obviously it's about tradeoffs - if you don't have the capital then operating with last gen environments is a perfectly reasonable decision. For others who do - it is not.

2) What kind of queries do you have that take that long? If it's scientific computing - no way around it. If it's just DB slicing/aggregating - if you use last gen structures you'll get last gen perf.

3 ) Cache locality on repeatable computable units on local in memory data do go better - continuous batched background updates to indices with a real time layer like the OP suggested will become the new standard.

fauigerzigerk · on Sept 26, 2012

if you don't have the capital then operating with last gen environments is a perfectly reasonable decision.

I am indeed operating in an environment of limited resources. It's called "The Real World". Reading marketing language like "last gen" makes me lose interest in a debate very quickly.

confluence · on Sept 26, 2012

I also operate in this "Real World" and I'm not constrained by your limitations - i.e. large clusters, lots of data etc.

Just because you happen to be constrained by your resources it does not follow that your world is any more "Real" than mine - it's just different - which is what I said.

What part of "last gen" is marketing speak? Original Xbox is "last gen", the iPod is "last gen" - quite literally the last generation.

fauigerzigerk · on Sept 26, 2012

It is marketing speak because marketing people use these kinds of phrases to stop readers thinking about the merits of different choices they have, and instead focus their minds on the idea that old technologies are superseded by new ones. Other popular phrases are "legacy" or "conventional".

The original Xbox is no longer available in the stores. That's because it has been superseded by the new model, which does everything the old one did, only better! Poor me who just doesn't have "the capital" to get myself the shiny new one just yet.

Apparently, that's the way you want me to think about immutability versus mutability in data structures. Makes no sense.

Evbn · on Sept 27, 2012

You said that everyone else's use case is "dead".