Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

A good secondary hash would be the fmix methods from Murmur, or multiply-byteswap-multiply.

However, having seen a lot of terrible user-defined hashes (I do a bit of consulting on an internal Google mailing list), I strongly advise against rolling your own.



The secondary hash must primarily be good at mapping the hashed value into a slot: that is, be as cheap as possible (ideally 0 cycles, fib hashing is ~5 cycles) if the primary hash was good while preventing catastrophic performance if the primary hash was bad. All of this without knowing anything about the primary hash.


It'd be interesting seeing comparisons there given that the issue here was that none of the widespread implementations he surveyed, including a Google one, even tries to address this, and several of them use methods that are both worse and slower.

Seems like it's something that has gotten little attention.


As a secondary (supplemental) hash, I found the Murmur finalizer methods are ok, but you can get better if you use different constants. See my result in this answer: https://stackoverflow.com/questions/664014/what-integer-hash...

I measured the avalanche effect (the number of output bits that change if a single input bit is changed; should be nearly 16 on average for a 32 bit hash), independence of output bit changes (output bits should not depend on each other), and the probability of a change in each output bit if any input bit is changed.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: