Hacker News new | past | comments | ask | show | jobs | submit login

The reason hash computation is traditionally called constant time is because it doesn't change while the hash table grows. Big O describes how run time or memory consumption grows as input grows, and with a hash table, the hash computation typically does not grow.

Calling a hash table O(log2(N)) gets misleading when you measure a hash function's performance, and it doesn't change as the table grows.

So you're right about your technicality, but you're using big-O slightly differently than how everyone else is using it.

The implications are interesting, in that if the hash computation did grow as O(log2(N)), then it would (theoretically) outperform the current typical O(1) hash table implementation, and they would meet only when the table is full.

Another way to see O(1) is that hash computation is always implemented not as a log2(N) function, but is log2(MAX_N), where MAX_N is the size of a full table, and since MAX_N is a constant, the complexity is O(1).

A side note is that not all hash tables are implemented using an array that can hold N items. Some hash tables use linked lists instead of re-hashing for collisions, meaning that your array can be arbitrarily less than N in size, meaning that it is definitely possible to have a hash function that takes less than log2(MAX_N) time. In that case, insertion time becomes explicitly greater than O(1).




I'm not following your logic here:

"The implications are interesting, in that if the hash computation did grow as O(log2(N)), then it would (theoretically) outperform the current typical O(1) hash table implementation, and they would meet only when the table is full."

How would O(log n) ever out perform constant time, O(1)? Could you elaborate?


This is and always has been an important part of understanding what big-O notation actually means. O(1) means constant time, it does not tell you what the constant is. The constant can be large.

In this case, if your hash always takes 64 bit values and always produces 64 bit values, it is O(1). If your hash were O(log2(N)), then your hash function with an empty table would do 1 bit worth of work, and would do only one operation. But when the table grew to contain 1,000 elements, it would do 10 bits worth of work.

The O(log2(N)) algorithm with N < 2^64 is always faster (in theory) than the O(1) algorithm that does a constant 64 bits of work.

The O(1) algorithm pays off in spades for N >> 2^64, it will become much faster than the logarithmic one, but for small N, log(N) is smaller than 64.


Right, a constant i.e unchanging amount of work.

I usually think of hashtables though as being O(1) in referring to the look up(amortized) and not the hash code itself.


Yep, and a hash table's get is O(1) amortized even when you include the hash function itself, simply because the hash function doesn't change as the hash table grows. @rand_r is right that the hash function is doing C * log2(MAX_N) operations, but that doesn't actually affect the big-O metric, it only affects the run time. :P


Indeed, thanks for the clarification.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: