The core of my protobuf-decoding library upb (https://github.com/haberman/upb/wi...

silentbicycle · on Feb 4, 2011

It really says something about C that so many useful-but-small systems have a good chunk of code devoted to hash table implementations, atoms ("a reference-counted string type"), etc.

When working with C, it sometimes makes sense to have bespoke data structures, but it always makes me think of Hanson's _C Interfaces and Implementations_, Greenspun's tenth rule, etc.

My 1500loc C project also has a hash table implementation (and, soon, dynamic arrays (gasp!)). That and serializing custom data structures to disk accounts for more than half of its code. It's fast as hell, but I originally prototyped it in like 100 lines of Lua.

beagle3 · on Feb 4, 2011

> It really says something about C that so many useful-but-small systems have a good chunk of code devoted to hash table implementations, atoms ("a reference-counted string type"), etc.

Well, the only reason for that is that C never had a hash table in its standard library. The same assertion is true about any other programming language I've seen that doesn't.

Python's and Lua's dicts are really hard to top (at least in the general case, and definitely inside the language), so there aren't even any attempts. Java's is slow and horrible, so there are hundreds of replacements, but they're all interface compatible.

> That and serializing custom data structures to disk accounts for more than half of its code.

That's one of the the things K gets unbelievably right - there is one routine (which with Arthur's style is probably 10 lines of C) for serialize, and another for deserialize. They work for any K structure, and they do that with blinding speed thanks to being totally memory mapped. My C code has switched to that approach as well - and it works really, really well.

silentbicycle · on Feb 4, 2011

My text indexer is using a mmap'd hash table too, and that's exactly why. :)

barrkel · on Feb 4, 2011

Sadly, I've written many hash-table implementations that are only ever used once, in a function. The hash table is just a local array, and the structures I work with generally already have a next pointer available for simplistic chaining.

Minimal hashtable implementation: an array declaration, a memset call, and a template search loop which either uses pointers or locations (pointers to 'next' pointers), depending on whether you need to look up the item, or potentially remove it. Roughly 2+n lines for n hash operations.

haberman · on Feb 4, 2011

> It really says something about C that so many useful-but-small systems have a good chunk of code devoted to hash table implementations

For this speed-obsessed project, I wouldn't have it any other way:

  Benchmarking hash lookups in an integer-keyed hash table.

  Table size: 8, keys: 1-8 ====
  upb_inttable(seq): 410 M/s
  upb_inttable(rand): 334 M/s
  std::map<int32_t, int32_t>(seq): 149 M/s
  std::map<int32_t, int32_t>(rand): 170 M/s
  __gnu_cxx::hash_map<uint32_t, uint32_t>(seq): 69.9 M/s
  __gnu_cxx::hash_map<uint32_t, uint32_t>(rand): 68.0 M/s

  Table size: 64, keys: 1-64 ====
  upb_inttable(seq): 410 M/s
  upb_inttable(rand): 333 M/s
  std::map<int32_t, int32_t>(seq): 32.4 M/s
  std::map<int32_t, int32_t>(rand): 33.4 M/s
  __gnu_cxx::hash_map<uint32_t, uint32_t>(seq): 67.9 M/s
  __gnu_cxx::hash_map<uint32_t, uint32_t>(rand): 71.4 M/s

  Table size: 512, keys: 1-512 ====
  upb_inttable(seq): 407 M/s
  upb_inttable(rand): 330 M/s
  std::map<int32_t, int32_t>(seq): 20.8 M/s
  std::map<int32_t, int32_t>(rand): 17.2 M/s
  __gnu_cxx::hash_map<uint32_t, uint32_t>(seq): 70.8 M/s
  __gnu_cxx::hash_map<uint32_t, uint32_t>(rand): 64.6 M/s

  Table size: 64, keys: 1-32 and 10133-10164 ====
  upb_inttable(seq): 394 M/s
  upb_inttable(rand): 334 M/s
  std::map<int32_t, int32_t>(seq): 32.0 M/s
  std::map<int32_t, int32_t>(rand): 30.7 M/s
  __gnu_cxx::hash_map<uint32_t, uint32_t>(seq): 71.3 M/s
  __gnu_cxx::hash_map<uint32_t, uint32_t>(rand): 70.6 M/s

kragen · on Feb 4, 2011

Summarizing your results, your hash table implementation is between five and ten times faster than STL and G++ generic maps, because G++, like most C++ compilers, imposes a substantial abstraction penalty on the use of templates — even though in theory that's not necessary. Is that right?

haberman · on Feb 4, 2011

Yes, 5-10x faster. As to the reason, I can only speculate. I doubt it's template-imposed abstraction penalty per se, though it could be a result of having to write the algorithm in a generic way. In other words, if you did a sort of "manual instantiation" of the hash_map template, where you made the types specific but didn't change a thing besides that, I think you'd get the same performance as the templated algorithm.

I can't say for sure the reason for the difference. Previously I thought that hash_map might have been wasting time calculating a hash of the integer key (whereas I just use the key as the hash), but I just added another test that uses hash_map with an identity hash function and the performance was unchanged.

I bet that hash_map is using external chaining, whereas I'm using internal chaining which has better caching behavior. Besides that it's hard to say for sure why mine is so much faster.

abecedarius · on Feb 4, 2011

Could be. I used internal chaining also in the program I mentioned in the other comment. And probably a power-of-2 size and another tweak or two -- I don't remember details.

groby_b · on Feb 5, 2011

See OP for hash_map. And std::map is dog slow because somebody thought a Red-Black-Tree is a good idea for a map.

So at least part of the penalty there is simply choosing the wrong data structure.

Also, IIRC, both hash_map and std::map don't support a reserve call. Which means you pay for quite a few allocations.

haberman · on Feb 5, 2011

> And std::map is dog slow because somebody thought a Red-Black-Tree is a good idea for a map.

What's wrong with red-black trees for an ordered map?

> IRC, both hash_map and std::map don't support a reserve call. Which means you pay for quite a few allocations.

This is absolutely an issue for insert performance, but I was only benchmarking lookup performance, which should not perform any allocations.

kragen · on Feb 7, 2011

> What's wrong with red-black trees for an ordered map?

I was wondering that too, except without the "n ordered" part. The claim that trees are competitive with hash tables in the average case rests on the claim that comparison is much faster than hashing, because you only have to hash once in the average case, whereas you have to compare something like 1.5 lg N times in the average case, which might be between 4 and 22 in common cases. I just ran this microbenchmark to compare hashing integers with comparing them, and hashing seems to be actually slightly faster than comparing on my CPU; this implies that hash tables should be dramatically faster than red-black trees.

https://gist.github.com/814746

In theory, this forces an unpleasant dilemma on code that aspires to be real-time but wants to do a lookup in a finite map: use hash tables with their awful worst-case performance (typically O(N), although you can improve this by hanging trees off your hash buckets, but that would be stupid), or use trees with their awful average-case performance?

In practice, I've never written real-time code, so I don't know if this dilemma is real.

groby_b · on Feb 9, 2011

It won't perform allocations, but since the structure has been built from multiple disjoint allocations, you're likely to get worse spatial coherence. (I.e. more cache misses)

And yes, for an ordered map RBL is not a bad choice. "I object to std::map being ordered" would probably have been the better wording.

abecedarius · on Feb 4, 2011

In my last C++ program, 5 years ago, I did find it worthwhile to code a custom hashtable. Was a bit surprised. This was g++, probably an older version even then.