Let's say we draw n times. What's the probability that we get a number at least 1? Well, we want to exclude the cases where we get a number less than 1, so we want the volume of the unit cube (n-tuples of numbers from 0 to 1) minus the volume of the region below the standard simplex (standard simplex is stuff that sums to 1, we're excluding stuff that sums to less than 1.)
Now the volume of the region below the standard simplex in n dimensions is 1/n!. (Brief proof, if you're not familiar: True in 1 dimension. Assume true in n dimensions. Then for n+1 dimensions, have integral over volume in n dimensions, as "sum of first n variables" parameter varies from 0 to 1. But volume varies as length^n in n dimensions, so this is integral from 0 to 1 of x^n/n!, i.e. 1/(n+1)!.)
So after n draws, the probability that we've exceeded 1 is 1-1/n!. So the probability that it requires exactly n draws is 1/(n-1)! - 1/n! (assuming n>1; for n=0 the probability is 0).
So the expected number of draws, then, is the sum over n>=1 of n/(n-1)! - n/n!, but this latter is (assuming n>=2) equal to n/(n-1)! - 1/(n-1)! = (n-1)/(n-1)! = 1/(n-2)!. (For n=1, we get 0.)
I.e. it's the sum over n>=2 of 1/(n-2)!, i.e. the sum over n>=0 of 1/n!, or in other words it's e.
This is a pretty solution, and it gives you the probability generating function for free: by some power series manipulation, the pgf of the number of draws needed is g(z) = 1 + (z-1)*exp(z). So you can show fairly easily that Var(draws needed) = 3e - e^2, for example ...
This makes no sense to me. Why are we using the Poisson distribution to talk about uniform [0,1] random variables? Why do we do this complicated derivation just to show that Poisson(1;1) = 1/e, and why does this show the answer to our question?
If someone could explain, that would be very helpful.
The author is not solving the same problem he states in the title. I'm not even sure what problem he is solving. For one thing, the pdf of uniform distribution on a [0, 1] with regard to Lebesgue measure on R is a function that's 1 on [0, 1] and 0 elsewhere. The pdf he gave is actually a pdf of Poisson distribution with regard to counting measure.
Using simple methods from martingale theory we can show that the expected value we're looking for must be no smaller than 2 and no larger than 4, but I'm in no mood to delve into more detailed calculations.
You could probably pull out the exact probability mass function with characteristic functions.
To review why these would be useful, let χ[T](s) = ⟨ exp(i ω T) ⟩, then χ[A + B] = χ[A] χ[B] for any A and B which are independent, so the characteristic function for a uniform random variable is going to be χ = (exp(i ω) − 1)/(i ω) and thus the sum of n independent uniform random numbers is just χⁿ. Meanwhile the PDF of a random variable is just the inverse Fourier transform of the characteristic function, so we have
fₙ(t) = ∫ dω/(2π) exp(-i ω t) [χ(ω)]ⁿ
Pr(n samples > 1) = ∫{1 → ∞} dt fₙ(t).
You could then interchange those two integrals, solving the convergence problem of exp(-i ω ∞) by giving ω an infinitesimal negative-imaginary component. This translates to moving the ω contour a little below the real line, and then the normal tricks of complex analysis should give you definite values for all of the integrals as residues about the simple pole at 0.
Yeah, I tried to play work out the solution before looking at the article, but realized that the problem was not even well posed. You could ask for the number of uniforms needed so that the probability of their convolution being greater than one exceeds a certain value, but that's not what he is asking. And yes, he doesn't work with uniform distributions, but Poissons. Weird article.
Agree. The relevant concepts from the theory of stochastic processes would be "stopping time" or "expected hitting time".
There are lots of relationships between the jump times of the Poisson process and subdivisions of the real line, so it's possible that the post is doing some version of a valid calculation, but without the right reasoning/understanding.
Isn't the pdf of a uniform distribution just f(x) = 1. What he gave was an exponential distribution with rate parameter lambda. The rest of the math seems correct. But the dependence on e is not surprising as it is relevant to the initial pdf. Am I missing something?
import random
totalsteps = 0
for i in range(100000000):
sum = 0.0
steps = 0
while sum < 1.0:
sum += random.random()
steps += 1
totalsteps += steps
print('AVG STEPS: ' + str(totalsteps / i))
It's funny. I read the problem, estimated that it was "probably 2 or 3" and then wasn't all that surprised to find that the solution was e. The derivation of it is pretty cool, too.
You're right. I think the article's reasoning is fallacious. At least, I don't see what the Poission distribution has to do with this at all.
One way to arrive at the answer correctly is to write the number of samples needed before they first sum to 1 as N and observe that the probability that P[N > k] = 1/k!. You can get this from a k-dimensional integral. A general formula for E[N] is Sum_k=0^infty P[N > k] thus E[N] = e.
Some motivation for the non-math folks out here - twitter shows an ad on your timeline. Assume the chances of twitter getting a callback on that ad is uniform ( very unlikely, but play along ). That means roughly 50% of folks will ignore the ad or dismiss the ad, the rest will actually engage with the ad impression. Lets say twitter makes a dollar per engagement. How many ads need to be shown to make the first dollar ? Two dollars ? How about five, ten, hundred, or a million ?
That's what this problem gets at.And in case you are curious about the solution to all that money -
----
scala> def uniform = util.Random.nextDouble
def sumsTo(n:Int) = {
var (times, sum) = (0.0d,0.0d);
while(sum < n) { sum += uniform; times += 1.0d; };
times
}
def meanSumsTo(n:Int) =
(1 to 1000000).map(_=>sumsTo(n)).sum/1000000.0d
List(1,2,5,10,100,500,1000).map(x=> (x,meanSumsTo(x))
scala> res1: List[(Int, Double)] = List((1,2.718594), (2,4.672438), (5,10.665755), (10,20.668308), (100,200.65537), (500,1000.669558), (1000,2000.669246))
----
So you need to show 3 ads to make a buck, 5 ads to make $2 etc.
function sumToOne() {
var acum = 0,
count = 0;
do {
count++;
acum += Math.random();
} while (acum <= 1)
return count;
}
var avg = 0,
nTimes = 10000;
for (var x = 0; x < nTimes; x++) {
avg += sumToOne();
}
console.log(avg / nTimes);
The simplest answer is: you have a 50% chance of the number being > .5 and a 50% chance of getting a number > num1 +num2 >=1 so the answer is 2 draws. people are trying to show these grand equations to figure it out, but it's really a simple solution; Most of the time it take 2 draws.
Use your mind, which is enormously greater at making common sense solutions than a computer, and figure it out with common sense, then test it with an algorithm.
Everyone beat me to the punch, but I made a something a little different than you guys, I stored the count of trys to reach 1 and the frequency of the count in a multidimensional array to see what amount of tries got to the sum most frequently. i ran 100,000 tries at a time and ran it about 10 times.
Heh wrote about this (on my blog) a long, long ago. Was a fun numerical experiment to break the boringness of my own numerical experiments with fractals
Thanks, but I just redacted for clarity a solution I found elsewhere. At the time I was teaching at a university, and redacting for clarity was ingrained :)
The calculation given by the OP does not solve the stated problem, as remarked by xyzzyz.
Sniffnoy already gave the correct solution in elementary terms, but since I spent the last two hour learning about the convolution formula, I want to give my solution here.
Define: U[0,1] to the the uniform distr. on [0,1] and S_n to be the sum of n copies of the uniform distribution.
We can compute the probability density function f_{S_n}(x) using the convolution formula:
In particular we are interested in the event that the sum goes over 1:
Pr{ S_n < 1 } = \int_0^1 f_{S_n}(x) dx.
In the range [0,1] the function pdf f_{S_n}(x) takes on the simple form 1/(n-1)!x^{n-1} because j=0 is the only term we keep in the summation, so we can evaluate the above integral:
Pr{ S_n < 1 } = 1/n!.
OK, now for the problem statement ;)
Let N be the random variable which describes the number of draws from U[0,1] we will make to reach a sum of 1.
We need to find the probability density of N (call it p(n)) and then calculate its expected value
E{N} = \sum_n=0^\infty n*p(n)
As pointed out by Sniffnoy, the formula for p(n), the probability that it will
take //exactly// n draws to go over 1 is given by:
Note that [1 - p(n-1) - p(n-2) - ... - p(2)] is the prob of not going over 1 in the first n-1 trials,
from which we subtract 1/n! -- the probability of not going over in the n'th trial.
The final steps are:
E{N} = \sum_n=2^\infty n*p(n)
= \sum_n=2^\infty n*[1/(n-1)! - 1/n! ]
= \sum_n=2^\infty [n/(n-1)! - n/n! ]
= \sum_n=2^\infty [n/(n-1)! - 1/(n-1)! ]
= \sum_n=2^\infty [(n-1)/(n-1)!]
= \sum_n=2^\infty 1/(n-2)!
= \sum_m=0^\infty 1/m! (change of var m = n-2)
= e
Let's say we draw n times. What's the probability that we get a number at least 1? Well, we want to exclude the cases where we get a number less than 1, so we want the volume of the unit cube (n-tuples of numbers from 0 to 1) minus the volume of the region below the standard simplex (standard simplex is stuff that sums to 1, we're excluding stuff that sums to less than 1.)
Now the volume of the region below the standard simplex in n dimensions is 1/n!. (Brief proof, if you're not familiar: True in 1 dimension. Assume true in n dimensions. Then for n+1 dimensions, have integral over volume in n dimensions, as "sum of first n variables" parameter varies from 0 to 1. But volume varies as length^n in n dimensions, so this is integral from 0 to 1 of x^n/n!, i.e. 1/(n+1)!.)
So after n draws, the probability that we've exceeded 1 is 1-1/n!. So the probability that it requires exactly n draws is 1/(n-1)! - 1/n! (assuming n>1; for n=0 the probability is 0).
So the expected number of draws, then, is the sum over n>=1 of n/(n-1)! - n/n!, but this latter is (assuming n>=2) equal to n/(n-1)! - 1/(n-1)! = (n-1)/(n-1)! = 1/(n-2)!. (For n=1, we get 0.)
I.e. it's the sum over n>=2 of 1/(n-2)!, i.e. the sum over n>=0 of 1/n!, or in other words it's e.