Darts, Dice, and Coins: A beautiful article on a beautiful algorithm

jchonphoenix · on Dec 28, 2011

This is an extremely beautiful problem.

It was also a great interview question because most candidates have never seen this before, and it requires problem solving skills and a very basic amount of algorithmic knowledge that every CS major should have.

Looks like I'll have to find a new interview question.

Again, this is such a gorgeous problem. It's one of the rare gems that used to show up on HN all the time, and now have become quite rare.

loeg · on Dec 29, 2011

All this method does is break the continuous [0,1) range into enough discrete buckets that each bucket contains at most two choices. Because the buckets are the same "probability-size" you can pick a bucket with O(1) and then do an O(1) comparison.

Everything between "Loaded Die with a Fair Die" and "Vose's Alias Method" seems superfluous.

Someone · on Dec 29, 2011

I agree with you that some parts distract more from the main line than that they add, but not about what those parts are. I would probably remove everything from the header "Simulating a Fair Die" up to the header "Throwing Darts".

I would keep the "Throwing Darts" parts because it provides a smoother ride towards the main result. I guess many readers will need that to understand the algorithm (If the intended audience is hard-core, it could just have been a reference to TAOCP and an URL to Vose‘s paper. For me, that probably would have been enough, as I knew the algorithm from reading TAOCP, and have already implemented the O(nlogn) initialization method variant)

Finally, I have one minor nitpick: the header "Simulating a Loaded Die with a Biased Coin" is incorrect. It should read "Simulating a Loaded Die with Biased Coins".

mcherm · on Dec 28, 2011

Thank you for posting this. Important as SOPA is, I'd love to see a few more like this mixed in with the other stories.

mcherm · on Dec 29, 2011

I take it back. Now that I read this through to the end I no longer want to ask more posts like this on HN. Few posts like this exist anywhere; asking for them regularly is too much. This may be one of the best mathematical essays I have ever read. I see others commenting about one section or another which could have been left out, leaving a straighter path to the conclusion. But the amazing, intuitive leaps that allow to lay out the answer and then prove it are great for making professors look brilliant, but they don't teach anyone how math is actually DONE. In contrast, this essay motivated each tiny baby step along the way and illustrated how one might work out a new algorithm. That is even more valuable than learning one algorithm.

mturmon · on Dec 29, 2011

I'm a little skeptical about the domain of usefulness of this algorithm. The simple method that computes the CDF and uses binary search for sampling is called "Roulette wheel selection" in this writeup.

The only difference at the big-O level between the "beautiful algorithm" and the above simple algorithm is, the latter takes log(n) steps for sampling versus O(1) (setup and memory usage are the same).

How many problems have n large enough so const1*log(n)+const2 is significantly more than the constant in O(1) algorithm? Not many, I'll bet, especially since const1 and const2 are small due to the simplicity of the simple CDF-based algorithm.

If it was O(sqrt(n)) vs. O(1), this would be a different thing, but log is different.

This is not to diminish the inherent interest in the sampling arguments used, and the existence of a big-O faster algorithm.

talaketu · on Dec 29, 2011

Imagine a scenario in which you need to generate a large number of samples based on a discrete random variable with a large number of outcomes n. If you had a library that implemented both algorithms, which should you use?

Are you skeptical that a domain for that scenario exists?

mturmon · on Dec 29, 2011

I'm just saying that n has to be very large before the O(1) asymptotics kick in, that is, for:

  c1 log(n) + c2 > c3

where, from what I can see, c3 >> c2 and c1 is low. And the setup time for the other algorithm is also greater (but, the same according to the O() measure).

In answer to your rhetorical question: I'd benchmark both. But, I already implemented the simpler algorithm in existing code, and bottlenecks for problems of interest to me are elsewhere.

Also, the more complex algorithm is more likely to have a bug -- which might be elusive, as they sometimes are with complex RNGs. I've found bugs in the published algorithms in DeVroye's book, for instance, and that book is considered authoritative. It's hard to convey just how subversive a parameter-dependent bug in a low-level RNG can be.

Splines · on Dec 28, 2011

Very interesting! I'm not a very math-minded person but that was fairly easy to follow along with. The nice pictures helped :)

donut · on Dec 29, 2011

Nice writeup. After reading the first paragraph, I challenged myself to come up with something and here it is:

https://gist.github.com/a982f1fee5116d289323

Example run:

  expected:
  a 0.166666666667
  b 0.333333333333
  c 0.5
  
  observed:
  a 0.168
  b 0.291
  c 0.541

For a list of n items to choose from, it appears that pick() will terminate on the first iteration of the scary-looking while 1 loop with probability ~1/n. For n = 3, and I've run this many times, I've not seen the number of iterations exceed 30. For n = 26 it easily goes into the ~200 range at the extreme.

bermanoid · on Dec 29, 2011

Loved the writeup - honestly, I have no problem with the extra detail between the first super-naive approach and the ultimate best solution, it strikes me as a great way to teach problems like this. Especially when compared to the upper-level math "strip the scaffolding" approach, which always leaves you wondering "How the hell did anyone ever think of that?"

One small nitpick. [tl;dr - please pick a proper license when you publish code on the net] I went to the code snippets page because I was considering using this code in one of my projects (it's a bit cleaner than the code I've been using), and found this statement:

If you're interested in using any of this code in your applications, feel free to do so! You don't need to cite me or this website as a source, though I would appreciate it if you did. However, please don't plagiarize the code here by claiming authorship - that would just be dishonest. I also caution you that while I'm fairly confident that the code on this site is correct, I haven't mercilessly tested every line, and so there may be a lurking bug or two here.

Please, if you're releasing code on the Internet that you hope for people to use, just pick a standard license, and ideally, put the license specification in the comments of the code itself (this one sounds like MIT, BSD, or zlib would probably be fine with the author). I know, you're being nice, you're being casual, you want people to use your stuff, and you're not going to sue anyone.

But a self-penned informal license like this means that when Joe Java wants to pull in AliasMethod.java to help out a math library that he's working on at his bank job, he can't just go ahead and do it, because it's not on the pre-cleared list of code licenses that he's allowed to import external code under. He has to get explicit approval from legal first, which means that he has to convince someone there that spending 15 minutes looking over the license text and deciding whether it's solid enough to trust is a worthwhile use of time, all over a couple dozen lines of code that he could implement a naive version of in 20 minutes plus testing. At a lot of companies, that just means "no." It also presents difficulties for open source projects that might otherwise like to incorporate your code because of license compatibility concerns, in many cases requiring them to e-mail you for specific permission to include your code inside a differently-licensed project.

More concretely, I'd be nervous relying on this license statement for a couple reasons: first, it says "using any of this code in your applications". Would a separately distributed library file be considered an "application", or are we only allowed to use the code inside full running programs? Second, what about redistribution of the source code, and derivative works? It's sort of implied that this is allowed (how would someone claim authorship of this bit of code if they weren't redistributing it?), but a proper license should make source redistribution and modification rights extremely explicit, without that it's typically assumed that there are none.

vilya · on Dec 29, 2011

This sort of response, I'm afraid, is why we can't have nice things.

The author has taken the time to make their code available and make it as free as possible to reuse. Why should they have to spare a second thought for the roadblocks your organisation chooses to put in the way of you using it? There are better uses for a developer's time and personally I'd much rather see another great write-up like this than a carefully chosen open source license on the sample code.

mcherm · on Jan 10, 2012

Sure, but it doesn't hurt to mention it because I'd even MORE rather have both the writeup (98% of the value) AND the well-licensed code (<2% of the value).