Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It seems to me this would only be true if the keys that collide are related to each other, or you have a vastly oversized table that you collide a lot in, at which point you're just accidentally synthesizing a smaller table.

What am I missing here?

[edit] I guess the other situation would be if the keys are largely sequential, but then a hash table seems like an odd choice of data structure.



Linear probing works by initially hashing the value, call it h(x), then if there is a collision, it checks h(x)+1, h(x)+2, ..., h(x) + k, until it finds a open slot. Lookup works in the same way, and deletion is a bit more complicated.

This model plays nicely with the cache, although its downside is there tend to be more "runs" of contiguous filled slots in the hash table. This method still provably takes an expected insert/lookup time of O(1) with a 5-wise independent hash function and a load factor smaller than 1.


I do know how linear probing works, thanks. ;)

But now I see where the confusion lies. I was taking your post to be replying more to the hash function part of the GP, but you were talking specifically about the skip distance. Yes, now I see what you mean, and I'm not actually sure how I misinterpreted so badly in the first place.


Let's say you get a collision. Would you rather:

a) Find your correct key within the same cache-line. But, you have to check 4 more values to get there;

b) Find your correct key in the next try...But, you have to jump to another part of the array.

Once you've indexed into the array, you want to read forward from there. You don't want to jump around.

http://preshing.com/20130107/this-hash-table-is-faster-than-...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: