B-tree vs. Binary Search Tree

jacobscott · on Sept 25, 2008

"I studied this source codes and think I should be able to further optimize it. In the end, I got my kbtree.h macro library. As you can see in my hash table benchmark, the modified version beats STL set while using even smaller memory than the original version."

I don't see any such benchmarks in the article, which makes any claims not very useful.

bdfh42 · on Sept 25, 2008

A nice explanation of what this is all about is one written by Rod Stephens at http://www.devx.com/dotnet/Article/38097

litewulf · on Sept 25, 2008

I'm rather unimpressed by a linear reduction in memory consumption. Mostly because its linear, and its "easy" to scale linearly (its called going and buying more RAM!)

ajross · on Sept 25, 2008

But you can't go out and buy more cache. If you have an algorithm where the working set spills out of L2 cache (let's just assume that anything needing a log time search facility won't fit in L1) you can easily get a factor of 5 or 10 or so by packing your data.

Main memory latency is a huge factor in modern performance analysis. And I was sad to see that this wasn't even covered in the article. The biggest win for a B-tree for in-memory use IMHO isn't actually the size advantage, it's the fact that you can prefetch each node and not have to pay the full round trip latency for log2(N) lookups.

shadytrees · on Sept 25, 2008

The author assumed a 64-bit system, so I guess he's not thinking about this, but better memory usage would be important on embedded systems like phones and tiny surveillance robots that have banded together to steal underwear, &c.

anamax · on Sept 25, 2008

> Mostly because its linear, and its "easy" to scale linearly (its called going and buying more RAM!)

It's only easy if your system isn't maxed or if your data segment isn't already at its limits. (64 bit addresses cause yet another round of bloat.)

litewulf · on Sept 25, 2008

My point here is that linear improvements don't really improve your scaling or performance in any real sense. Linear improvements are equivalent to buying hardware (or sharding), whereas I think the kind of really nice improvement you want is say changing an exponential factor. Because thats the kind of improvement you can't compete with by buying hardware.

That being said, for a large enough server farm, yes, linear improvements are totally beneficial. (And a small enough server farm.) But its important to keep in perspective that not all betters are created equal ;)

DabAsteroid · on Sept 25, 2008

Well said.

http://news.ycombinator.com/item?id=305937

4 gigs of DRAM costs less than $100, today. Who cares if software is bloated?

michael_dorfman · on Sept 25, 2008

I think the various criticisms listed here are all valid (lack of benchmarks renders performance claims dubious, linear reduction in RAM is not that big a deal) but I still think it's worth the reminder that B-trees (and other alternatives to Binary Search Trees) are around. It's easy to get in familiar ruts with data structures....