Hacker News new | past | comments | ask | show | jobs | submit login

This technique is not new. E.g. I remember it described in ARM system developer's guide published back in 2004.

It shows how to use unsigned 64-bit multiplication to restrict a range of a 32-bit random number generator to 0..N. No shifts are needed, as the result is in the upper 32-bit register.




Yes it is not new; I read the multiplication trick was described even in 1986, in "The Metafont book" (Donald E. Knuth). (I don't have the book however).

But it does look like the technique was forgotten, or at least not used very often in the past.


Kenneth A. Ross mentions it in passing in his 2006 IBM research report (a great paper btw):

"We can interpret a 32-bit hash value v as a binary fraction, i.e., the representation of a number between zero and one whose digits after the (binary) decimal point are given by v. Given a hash table size s, we can obtain a slot number by multiplying s and v and shifting right 32-bits. This way, arbitrary table sizes are possible without using expensive division or modulus operations. Computing s∗v can be performed with one 32-bit multiplication (generating a 64- bit result). If s is a 16-bit number, as might be expected for a small hash table, then two 16-bit multiplications, a shift, and a 32-bit addition suffice."

Kenneth A. Ross, Efficient Hash Probes on Modern Processors, IBM Research Report RC24100 (W0611-039) November 8, 2006

https://dominoweb.draco.res.ibm.com/reports/rc24100.pdf


It is much older than that.

If you interpret a random n-bit integer as random fraction >= 0 and < 1, it is obvious that if you multiply it by K you will have a random number >= 0 and < K.

If you take the integer part and discard the fractional part, i.e. you keep just the upper word from multiplication, you get a random integer >= 0 and <= (K-1).

This was well known since the oldest times when pseudo-random numbers began to be used, i.e. immediately after WW2. At that time, before the introduction of the floating-point numbers, interpreting the binary numbers as fractions instead of integers was much more frequent.

Using modulo became popular only in higher level languages, where it was more complex to write a double word multiplication, but it is clearly an inferior solution compared to using a multiplication for the same job.

Moreover, the multiplicative method ensures that all the integer values possible for the result have the same probabilities, while with the modulo K method it is possible that the last value (K-1) has a lower probability (corresponding to fewer possible values of the input number), which is not what is normally desired.

For ancient congruential PRNGs, the modulo method was even worse, because those PRNGs had the lower bits less random than the upper bits.


> the multiplicative method ensures that all the integer values possible for the result have the same probabilities

Uh, you can't do that for any non-power-of-two K. The multiplicative method doesn't remove but only moves those lower probabilities around when used without rejection.


You are right in general, but for the common case, when you want random numbers with the maximum value much less than the maximum integer that can be represented in a word, e.g. you want dice numbers or lottery numbers, the probabilities of the integers produced by multiplication are much more uniform than of those produced by modulo.

If you would want a very wide range for the output random numbers, which is seldom the case, the comparison of the methods could be reversed.


No. The two methods are qualitatively equally good. In both cases, some outputs have probability floor(2^n/K)/2^n while others have probability ceil(2^n/K)/2^n. The numbers of outputs with each probability are such that they average out to 1/K. The only difference is the distribution of which output numbers have which of these two probabilities.


Exactly, both the modulo and the multiplicative method are equally close to uniform. In fact, under the constraint of starting from a uniform input with 2^n possibilities, both produce an output that is as close to uniform as possible. A necessary and sufficient requirement for that is the maximum and minimum number of inputs that maps to any given output differs by at most one, which is the case for both methods.


> immediately after WW2. At that time, before the introduction of the floating-point numbers

Well technically floating point was invented around 1900 and then invented again independently by Zuse in the 30s (all of his computers were floating point machines).


For random numbers, that wont' be uniform. Suppose we have a good quality, uniformly distributed 32 bit unsigned random number, that we want to reduce, to say, 0..13 which also has a uniform distribution.

What we can do take enough bits from the number to cover the range, in this case 4:

  out = r % 16;
Then, we test whether the value is in the range 0 to 13. If not, we throw the random number away, and try it with a another four bits.


For many use cases biased is good enough.

The topic of unbiased bounded generation is pretty interesting itself and both lemire and M.E. O'Neill (author of pcg) wrote blog posts and an academic paper comparing different approaches (some using as building block the fixed point trick)

https://www.pcg-random.org/posts/bounded-rands.html https://lemire.me/blog/2019/06/06/nearly-divisionless-random... https://arxiv.org/pdf/1805.10941.pdf


Yes, that's called rejection sampling, and it's the only way to produce truly uniform output if the range doesn't evenly divide the input range.

However in a setting of say finding buckets in a hashtable which isn't a power-of-two size, this would amount to potentially computing multiple hashes. In certain scenarios, that may be unacceptable for computational reasons. If strict uniformity isn't required, the approach presented here may be good enough. It won't be exactly uniform if the range isn't a power of two, but it'll be equally close to uniform as modulo reduction is.


> hashtable which isn't a power-of-two size

I think, I'd never want to do such a thing, but I will remember this multiplication trick in a legitimate situation where a non-power-of-two residue is being calculated, but the mathematical residue per se is not required.


If your non-power-of-two sized hashtable is compile-time-constant, the modulus will get compiled into similarly fast code. It's only needed when the size isn't a constant.

Particularly for large open hashtables having to double the size of your table to go up to the next power of 2 is a really significant memory overhead... and if the size is variable, the multiply is a lot faster. It's a good technique.


A power of two hash table doubling in size just has to adjust a clipping mask. E.g. when the size is 16 we have mask == 0xF. When the size goes to 32, we have mask == 0x1F. Reducing the hash value to the table size is just hash & mask.

There is simply no modern use case for non-power-of-two hash tables. Even in open addressing.

Table sizes which are prime numbers are theoretically important because they allow techniques like quadratic probing to guarantee to visit every possible table offset, thanks to Euler's theorem. This one:

https://en.wikipedia.org/wiki/Euler%27s_theorem

(Only if n is prime is the phi(n) totient function equal to n - 1, meaning that all values of the congruence are traversed by the successive powers of a.)

This will go out the window if you start doubling the table, so that n is no longer prime; you instead need a predetermined sequence of primes to serve as table sizes.

Quadratic probing is silly because it's been found that linear probing, in spite of its issues, wins in terms of caching performance. The only reason to go to open addressing over chained hashing would be for better caching, due to not chasing chain pointers, and then you might as well reap the full improvement from better caching by using linear probing.


> There is simply no modern use case for non-power-of-two hash tables. Even in open addressing.There is simply no modern use case for non-power-of-two hash tables.

I think I'd much rather a hash table that needs 32.1 GB actually take 32.1 GB and not 64 GB especially when I need several of them, thank you very much!

With the extra memory saved and the TLB/cache footprint reduced I can get a lot of performance to offset the minuscule latency difference between a widening multiply and a logical and!

In a recent data structure I've worked on, using an algebraic code requiring a couple bisection searches in a couple tiny tables to decode-- to just reduce table entries by 4 bits measurably improved performance, simply from reducing the cache/tlb footprint a few percent. As computers have become faster multiplies and adds have become increasingly free relative to memory latency/bandwidth.


What is new here is the method to remove bias.


I don't believe it removes bias. By pigeon-hole-principle-related reasoning, you cannot map, say, 65536 values to 17 values without bias.

The method redistributes the bias compared to straight modulo.

You won't get a uniform distribution with this, just a differently shaped one compared to modulo.


No, it's not. The method described by ARM is functionally identical, so it is equally biased as the method described in the blog post here.


I see, this is one of several posts on this topic. There is a link near the end to the unbiased method, which is more original, but as you note fixed-point multiplication is not new.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: