Hacker News new | past | comments | ask | show | jobs | submit login

We invented RDMS databases to store large amounts of important data safely on extremely constrained storage hardware (kilobytes were expensive in the 1970s) where you could not economically store redundant copies due to astronomical cost.

Append only immutable DBs are inherently better just because they essentially add a 4th dimension to your data (all history, all data, all the time), and hook up to the fact that Von Neuman machines love splitting things up into constrained sub-streams/problems and crunching through the entire data set in memory across clusters using discrete memory frames/units of work (think Google search index/GPU framebuffers/Integer arithmetic).

Storage is now unlimited and random seeks are very expensive. The more you serialize your processing and split them into in memory units that utilize cache locality - the faster you'll perform. You'll turn performance problems into throughput problems - all you have to do is to keep the data flowing.

RDMS databases are dead for the same reason I don't manage my YouTube bandwith use or bother managing files or bother managing RAM - I have cable now with terabyte hard disks and 32-64 GBs of RAM.

The future is immutable with periodic/continuous historical stream compaction (pre-computation) with queries running across hundreds of clusters and reducing all searches to essentially linear map-reduce (sometimes with b-trees/other speed ups) + a real time dumb/inaccurate stream layer.

1 machine cannot store the Internet - a million machines can. 1 machine cannot process the Internet - but a million machines operating on one 64GB slice of it located in RAM and operating with cache friendly memory-local discrete operations can run through the Internet millions of times a second.




Just a few facts that I have to deal with:

1) I can't afford a million machines or hundereds of clusters or keeping petabytes of data around in a way that doesn't make the data completely useless. Yes mutable state causes complexity, but I can't afford to rid myself of that problem by using brute force.

2) Performance is not replaceable by throughput if you have users waiting for answers on questions they just dreamt up a second ago and you have new data coming in all the time.

3) Cache locality and immutability don't go well together. Many indexes will always have to be updated, not just replaced wholesale with an entirely new version of the index.


1) Even with hard drive prices inflated today, we're at over 9GB/$1 for storage. That means for $1 you can store 1.25 billion 64-bit integers. From that perspective, there is very little data (at least in terms of traditional RDBMS data) that is actually cost effective to throw out. To the extent that you do need to throw it out, you can wait until all transactions involving the data have long since completed.

2) I think you misunderstood what was meant here. His point was that your limiting factor really becomes throughput if you have the right architecture.

3) Indexes and cache locality also tend not to play well together. ;-)


1) Obviously it's about tradeoffs - if you don't have the capital then operating with last gen environments is a perfectly reasonable decision. For others who do - it is not.

2) What kind of queries do you have that take that long? If it's scientific computing - no way around it. If it's just DB slicing/aggregating - if you use last gen structures you'll get last gen perf.

3 ) Cache locality on repeatable computable units on local in memory data do go better - continuous batched background updates to indices with a real time layer like the OP suggested will become the new standard.


if you don't have the capital then operating with last gen environments is a perfectly reasonable decision.

I am indeed operating in an environment of limited resources. It's called "The Real World". Reading marketing language like "last gen" makes me lose interest in a debate very quickly.


I also operate in this "Real World" and I'm not constrained by your limitations - i.e. large clusters, lots of data etc.

Just because you happen to be constrained by your resources it does not follow that your world is any more "Real" than mine - it's just different - which is what I said.

What part of "last gen" is marketing speak? Original Xbox is "last gen", the iPod is "last gen" - quite literally the last generation.


It is marketing speak because marketing people use these kinds of phrases to stop readers thinking about the merits of different choices they have, and instead focus their minds on the idea that old technologies are superseded by new ones. Other popular phrases are "legacy" or "conventional".

The original Xbox is no longer available in the stores. That's because it has been superseded by the new model, which does everything the old one did, only better! Poor me who just doesn't have "the capital" to get myself the shiny new one just yet.

Apparently, that's the way you want me to think about immutability versus mutability in data structures. Makes no sense.


You said that everyone else's use case is "dead".




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: