Hacker News new | past | comments | ask | show | jobs | submit login
A fast alternative to the modulo reduction (2016) (lemire.me)
114 points by bra-ket on Oct 6, 2019 | hide | past | favorite | 35 comments



An additional argument for the multiply-then-shift instead of using modulo, is that for most random number generators the high bits are more random than in the low bits.

Another approach, if you really want to use modulo, is to execute the division as a multiplication by the reciprocal followed by a shift. This makes sense when the divisor is odd and used multiple times. See e.g. [0] about how compilers generate code to divide by a constant.

[0] https://en.wikipedia.org/wiki/Division_algorithm#Division_by...


Be careful using this with hash tables of integers in small hash tables. It looks like if applied naively it would be quite easy for this method to end up flattening all small integer keys into bucket 0.


I heard a story recently about that being exploited as an attack.

Attackers knew some site used some DB that used a specific hash and items that hash to the same value end up in a linked list. So, the submitted a ton of user data that all hashed to the same value then DOS’d with requests that ran through the whole linked list every time.


It's called an algorithmic complexity attack, and yeah it's a known issue with hashmaps. One solution is to us a parametrized hash function with a secret key. And a second is to use another data structure entirely.


It’s actually called a hashmap collision attack, fixed by randomizing your hash function at the program start. For example generating a key at program start that will be used with siphash.



Also called a hash flooding attack. I like hashmap collision though.


Not fixed by a random seed and not fixed by siphash. Only fixable by not allowing a linear search in the collisions. (counting or promote to a tree)

The random seed can by extracted, exposed or calculated, and then siphash doesn't help you at all, it just makes everything 2x slower.


How do you extract/calculate The random seed?

This is what all (non-vulnerable) programming languages do btw.

PS: saw your other post, interesting take.

https://news.ycombinator.com/item?id=12401920

I’ll venture a guess: a lot of practical attacks in the lab becomes completely impractical on a network.


I like how you’ve been saying for 3+ years that it’s possible to recover a SipHash key based on its output, something it’s specifically designed not to allow, but never given any evidence for it.


It doesn’t even have to be that complicated. Query parameters for HTTP requests are almost always stuffed into a hash table. You can get up to mischief a couple of packets at a time.

If you’ve been watching, there have been a series of articles over the last five or so years as each language upgrades their hash table implementation to thwart this class of attack.


That was the first thing that came to my mind as well. There are two definitions of what a hash function is, and they are almost always under specified.

(1) Map key to an integer (2) Map key to a uniformly distributed integer

In case of (1) the hash map implementation needs to do the mixing step, (2) is much better for benchmarking as in the article.


By naively, you mean using a hash function that does not “scatter” the values properly, something like defining hash(n) = n, right?


Yes. hash(n) = n is fine in lots of cases, though its still vulnerable to the algorithmic complexity attacks as others mentioned in this thread.


As far as I can tell, this is basically the 32.32 fixed-point equivalent of Math.floor(N * Math.random()).

It reminds me of the sort of tricks I used as a teenager back in the 90s to speed up my MS-DOS graphics code.

Another trick: You can perform rounding by adding fixed-point 1/2 before the bit shift.


> As far as I can tell, this is basically the 32.32 fixed-point

That's exactly how I described this trick in code-comments.


Let's say you want a random number from 0 to 5. You can generate a random integer, take the three low order bits (giving you a random number from 0 to 7), and if it's more than 5, try again. I just checked and it seems to work well ("bitmask with rejection"): http://www.pcg-random.org/posts/bounded-rands.html


That results in many rejections when the inclusive limit is a power of two (or slightly larger). For instance, for [0, 4], one has to reject 38% of the time, and this gets closer to 50% as the limit increases. This requires doing more samples from the (P)RNG.

The "Faster Threshold-Based Discarding" section of that article is the technique that Lemire actually suggests, to avoid the %, and the benchmarks indicate Lemire's method is equal or faster in all but the "Large Shuffle" case (and the "Optimizing Modulo" section reduces that difference significantly).


One way to pick an integer in [0, 4] given a source of uniform random integers in [0, 7] (which I shall call "the d8") is to divide the interval [0, 1] into 5 equal parts, by splitting it at 1/5, 2/5, 3/5, and 4/5. Number the 5 intervals 0 through 4.

Write those four split positions in base 8:

  1/5 = 0.1463(1463)
  2/5 = 0.3146(3146)
  3/5 = 0.4631(4631)
  4/5 = 0.6314(6314)
The parenthesis mark groups of digits that repeat infinitely.

Use the d8 to generate an octal fraction in [0, 1]. Find which of the 5 intervals it falls in, and output that interval's number.

For example, if the d8 comes up 0, 2, 5, or 7, you would output 0, 1, 3, or 4, respectively. If you get anything else, you will need at least one more roll. For example, suppose the first roll was 1. Then you know that the output is going to be 0 or 1. Roll again. On 0, 1, 2, or 3, output 0. On 5, 6, or 7, output a 1. On 4, you need to roll a third time. On the third roll, 1-5 => 0, 7 => 1, and 6 means you'll need a 4th roll. The 4th roll goes 0-2 => 0, 4-7 => 1, and 3 means roll again. The pattern then repeats for potential rolls 5, 6, 7, and 8, and so on.

Another way to work out the same thing, without explicit fractions, is to work out a state diagram, naming each state with a string that lists each reachable output, with some outputs repeated so that they appear in the name in proportion to their probability from that state. That's probably confusion, so here is an example, where we will use a d2 (i.e., a random bit stream) to generate integers in [0, 2]. In other words, we are using a d2 to simulate a d3.

The first state should be able to lead to 0, 1, or 2, an they should be equally probable, so we name the first state 012. There should be two next states from that, one for rolling 0 and one for rolling 1.

The way we get the names for those states is by splitting 012 in two. But we need a name with an even number of characters for that, so lets double all the characters: 001122. Now we can split it in two to get the two next states: 001 and 122.

Same thing to get the next states for 001 (make it even--000011--and then split it: 000 and 011) and 122 (=> 112222 => 112 and 222). So far we have this (draw root down):

  000 011  112 222
    \  /     \  /
     001     122
        \    /
          012
That says if we roll 00 we output 0, and 11 outputs 2, and 01 and 10 need at least another roll. If you work out the next states from 011 you get 001 and 111. 001 was 011's parent, so we've got a loop there. Same on the other side. Putting those in gives:

     A 111 111 B
      \ /   \ /
  000 011  112 222
    \  /     \  /
    A:001    B:122
        \    /
          012
To roll a d3 with a d2, start at the root, use the d2 to traverse the tree, and when you reach a node whose name is 000, 111, or 222, output 0, 1, or 2, respectively.

This all easily generalizes to using a dN to simulate a dM.

The base-M fraction approach can be done without actually building up the table of partition points, and the digits of the one partition point you need for each simulated dM roll can be worked out on demand as you need them, using just integers.

For example, if you need a/b in base M, set r = a. To get successive digits, iterate this:

  r = r * M
  d = floor(r/b)
  r = r % b
The sequence of d that generates is the digits of the base M expansion of a/b.


I'm not entirely sure how that relates to my comment...

However, I think that approach is conceptually very similar to "FP Multiply" (using floating-point arithmetic) or "Integer Multiplication" (using fixed-point arithmetic) in the GP's linked blog post. Those approaches package the whole "generate digits" idea into a single step, by handling the "split positions" directly as native fractions, rather than having to think of them as sequences in a particular base. These are likely to be significantly more efficient too, since they do not have to do (multiple) division/modulos, which are expensive as the main article points out (let alone computing/storing the fraction representation).

Your approach has the benefit that it is not biased (if the entropy in `r` is tracked appropriately, so that it can be resampled), but Lemire's "Debiased Integer Multiplication" version (as well as the optimised versions) are likely a faster way to remove the bias.


> Suppose you want to pick an integer at random in a set of N elements.

Furthermore, suppose the distribution has to be even. Then, you do not want the modulo N in any shape or form, unless the modulus is a factor of the range of the random number source.

What you do is find the smallest power of P, such that N < P. Then you reduce the input numbers using mod P; which is a fast bitwise AND operation against (P - 1).

Then, you reject values which are >= N; you get another random number R and try again, until R & (P - 1) lands in the [0, N) range.

> Suppose you have a hash table with a capacity N. Again, you need to transform your hash values (typically 32-bit or 64-bit integers) down to an index no larger than N.

I wouldn't have such a thing where N isn't a power of two.

Non-power-of-two moduli in hash tables are only useful in the open addressing technique, where all entries are stored in the table itself, rather than in chains emanating from the table. Open addressing resolves collisions by probing for alternate locations in the hash table. If the modulus is a prime number under open addressing, then if quadratic probing is used:

https://en.wikipedia.org/wiki/Quadratic_probing

it will have the property of visiting all the table locations. That is to say, the quadratic probes modulo a prime N generate a sequence of unique table entries before repeating. This guarantees that a place will be found for any new key even if the table has just one slot left.


> leaving the values 0, 1, …, (2^32 – 1) mod N with ceil(2^32/N) occurrences, and the remaining values with floor(2^32/N) occurrences

> You can also generate random numbers in a range faster, which matters if you have a very fast random number generator.

That seems too biased for many RNG uses.


The author addresses that: “You can correct the bias with a rejection, see my post on [fast shuffle functions](https://lemire.me/blog/2016/06/30/fast-random-shuffling/).”


In some of the code snippets, the following is used:

  threshold = -bound % bound
I don't understand how that isn't always zero. What am I missing?


In N-bit unsigned arithmetic as we have here, -bound = 2N - bound (and similarly for 32-bit) and the unsigned modulo operator "knows" that. In essence, that is a tricky way to compute 2N % bound, to work-around not able to represent 2N directly in an N-bit integer.


bound is unsigned and "the negative of an unsigned quantity is computed by subtracting its value from 2^n, where n is the number of bits in the promoted operand".

So it's doing (2^n - bound) % bound = 2^n % bound.


I’m a bit confused. I thought it was pretty common to a generate a random number between 0 and N by getting a random float between 0 and 1 and multiplying it by N.


No: floats are not uniformly distributed. That way you get (IIRC) “many” more numbers near 0 than near 1.


Generating a random float always takes that into account: for instance, the samples will be less than 0.01 only 1% of the time. A method that just interpreted a uniformly random bit pattern (that is, integer) as a float would only be useful in very specialised situations.


So my concept for how it works was you're putting the random bits into the mantissa while holding the exponent constant, then normalizing the result.

Is that accurate? Is there any concern there for never getting the smallest possible values? Or is that acceptable because you can't generate values very close to 1?


That's close to one approach (generate a float in [1, 2) by setting the mantissa with a constant explonent of 1, and subtract 1), another is to generate a 32 or 64 bit integer and multiply by 2^-32 or 2^-64. I think both of these can generate all floats close to 1, it is the floats close to 0 that they're less good with.

There is some concern with the bias due to missing small values (and missing low bits, even for moderate values), but this is often ignored, and is usually good-enough.


"An intuitive way to think about "fast range" technique is scaling. Number x ranges [0,2^32−1], multiplying it by N then divide by 2^32, the range becomes [0,N−1].

From "Writing a Damn Fast Hash Table With Tiny Memory Footprints": http://www.idryman.org/blog/2017/05/03/writing-a-damn-fast-h... , previous discussion https://news.ycombinator.com/item?id=14290055


This approach comes naturally to people working with fixed-point math: Your random value is a UQ32 fixed-point fractional value, you multiply it by a UQ32.0 size resulting in a UQ32.32 value that you shift back down to UQ0. [ https://en.wikipedia.org/wiki/Q_(number_format) ]

I wonder if the introduction of calculators into school curriculum has made people less prone to carry around rational values as intermediates. Division is a pain in the butt in mental arithmetic and it remains slow on computers (except when compilers are able to strength-reduce it away). There are a lot of nice optimizations in numerical algorithms that come from keeping denominators seperate (implicitly or explicitly).


I was thinking the same thing because I'm older I grew up on 8 bits and this sort of thing comes more naturally to me. It's a nice article but does it deserve to be called the 'Lemire technique'. :)


Can dang please add a (2016) to the title? This a very popular post, and rehashed every once in a while.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: