It's not really fair to compare a custom implementation against the standard uno...

HippoBaro · on June 17, 2018

Yes, but it's also important to understand what you lose when using these better-performing hashtables. First thing that comes to mind is that when performing a successful lookup on a std::unordered_map, the reference is guarrantied to be valid until either (a) you erase that element or (b) the table itself is destroyed. Almost all faster hashtables around give up that properties for better lookup performance. Any insert operation might break all references.

So, in a sense, it's a bit unfair because std::unordered_map, in a real-world scenario, will potentially require you to perform less lookup because you can cache the references. With the others, you can't.

allenz · on June 17, 2018

Yes, I agree with you that unordered_map should be slower. For context, the author shows off a graph where his custom Fibonacci implementation is the fastest, which proves nothing about Fibonacci hashing.

stochastic_monk · on June 17, 2018

I had an application several years ago in which the executable was 10x as fast when using khash instead of std::unordered_map.

barrkel · on June 17, 2018

Hash table design in libraries is an exercise in minimizing the costs of the worst case, and it's made harder by typically only being able to control one side of the equation - the table and not the hash function.

Thus general purpose hash table implementations need to choose whether they will help naive programmers by doing modulus with a prime number of buckets, or if they will leave a potential performance bomb for people who e.g. hash integers to themselves. The article argues for a third way which is somewhere in the middle: faster than prime modulus, but not as dangerous as bit masking.

It's usually easy to beat any general purpose hash table for these reasons, as long as you control the hash function you can tweak your implementation to suit.

twic · on June 16, 2018

std::unordered_map is a bit of a red-headed stepchild in the C++ world. I'm not surprised that it hasn't had nearly the amount of tuning that hashmaps have had in other languages.

When i asked some C++ers about it, they warned me off using it, for reasons i didn't fully understand, but i got the impression that there are structural reasons why it can never be really fast, so anyone who needs a really fast hashmap uses some non-standard one anyway.

flurrything · on June 17, 2018

> It's not really fair to compare a custom implementation against the standard unordered_map implementations, which

But he is comparing the standard unordered_map implementation to his own implementation of the standard unordered_map (same API).

allenz · on June 17, 2018

It's the same API, but I highly doubt that his implementation is fully compliant with the C++ standard.

flurrything · on June 17, 2018

It is, and so is boost::unordered_map, that's the whole point of the comparison. Just check the implementation out.

allenz · on June 18, 2018

Source on that claim? Here's a comment chain where the author admits that his library is noncompliant: https://probablydance.com/2017/02/26/i-wrote-the-fastest-has.... Also note the complete lack of a standards test suite.

This library uses Robin Hood hashing for speed. As my original link explains, standard implementations use chained buckets so that users can cache lookups and to have better worst-case performance: https://news.ycombinator.com/item?id=9675964