Nice article. It reminds me of my year living in London, and taking the bus everyday to Imperial College from West End Lane in West Hampstead. There was a stop on both sides of the road - one for the outbound bus, and one for the inbound (the bus went from central London to a terminus and then returned mostly on the same route). Now we did not use schedules - way too inaccurate at rush hour, and the busses there were pretty frequent anyway. But we did expect an even chance of the inbound bus arriving before an outbound one did. My daughter and I became convinced after a while that this was not happening, so we invented a game (which we called "The Game of Life".) When our bus (inbound) arrived first, we added 1 to our score. We subtracted 1 for every outbound bus that passed before ours arrived (there were often more than 1). We realized that the result would be slightly skewed to the negative, but we expected the outcome to be close to 0 over time. Of course it was not. Anyway we extended the game to many statistical situations. For example, you go to the checkout line at the supermarket, and there are N people in front of you. When you get to the front of the line, you count the people behind you - call that M. If M is bigger than N, you scored life points. If it is smaller, you lost some. So you add M-N to your running score, and you get an idea of how lucky you are in life. However, I never followed up with any real analysis, so I enjoyed this article.
Imagine if (in the future) some item like a phone can detect this information around you, and automatically record it. Forming games ontop of this life data would be weird, neat, fun and sad all at the same time. Imagine seeing a real example of where someone else is just more lucky than you are in stupid but impactful (on your morale) ways.
If it didn't seem so tedious to track, I'd love to implement an app to record this info. Unfortunately no one I know would care, and I'm sure I'd get too lazy to keep it accurate. Neat nonetheless, thanks for the cool thoughts :)
Similar to the recruiters that throw away the top half of the application stack because they don't want unlucky people in their company I could see such data become valuable to some people.
Your use of "of course" seems to imply that there's some statistical reason that the probability of the next bus being inbound vs outbound wouldn't be equal. Is there? If so, it seems like it must be a different reason than the one in the article. What am I missing...?
Because we might score -1, -2 or worse if 2 or 3 busses went in the other direction before ours came, but if ours came first, we score 1. We get on the bus and thus don’t know if another one or more arrives first on our side.
This reminds me of a mathematical paradox that makes me doubt your conclusion: "In this country, every couple wants to have one daughter. They keep having children until they have a daughter, and then they stop. What gender balance should we expect?"
Couples can have any number of sons, and every couple has exactly one daughter. Still, the accepted mathematical solution is an equal gender ratio for the couples' children.
I think the "paradox" comes from how people implicitly assume "any number of sons" is somehow distributed or weighted in a way that favors towards numbers of 1 or above.
In contrast, "0 sons" is going to describe a full half of all marriages.
Not really. In the son/daughter case, the calculations are:
expected daughters: 1
expected sons: 1/20 + 1/41 + 1/82 + 1/163 + 1/324 + …
So number of expected daughters = 1, number of expected sons = 1. In practice since women can't have an infinite number of children, then this wouldn't be an infinite series, so the real number of expected boys would be lower than one, but there you go…
Now, for the bus case, you get +1 if your bus turns up first, and -1 for every other bus that turns up first. Assume that it is completely random, then:
expected + score is: 1/2 1
expected - score is: 1/2 * -1 + 1/4 * -2 + …
The expected number of sons is 1, and the expected number of daughters is 1 (by the framing of the problem, in every possible scenario, there is exactly one daughter), but the expected value of the ratio is not 1:1. E[X]/E[Y] = E[X/Y] is not a valid identity.
I read that and it seems wrong. The question asked "what fraction of the pop is female" but his argument is that 3 families of 4 girls and 1 family of 12 boys make the fraction of girls in the average family 75% (the average of 100% x3 and 0% x1) which is non-sensical to me.
>E[X]/E[Y] = E[X/Y] is not a valid identity.
is completely irrelevant here because it is being used to point out that a non-answer is wrong.
It is impossible to catch more than one inbound bus on any given occasion, whereas any number of outbound buses might pass.
BTW, on a slightly unrelated point, if there's no timetable, but the interval between buses is maintained reliably, the expected waiting time is uniformly distributed over that interval.
If you have to get a second bus, you need to convolve two of those two uniform distributions to find out the distribution of overall journey times. This is a trapezoidal distribution, which is just about analytically manageable.
But a journey with two transfers (3 buses in total) results in a likely overall time distributed according to a uniform distribution convolved with a trapezoidal distribution, which is a very weird non-smooth shape. You can see why people choose to model distributions with Gaussians, which are well-behaved (convolve two Gaussians, get another Gaussian). The Gaussian just lends itself ideally to recursive applications, hence recursive filtering (e.g. Kalman filters).
Also, gaussians are great approximations for large n, too, since the convolution of any distribution with itself n times (for n "large enough") is close to gaussian (by the CLT. More generally, there are very nice error estimates for many distributions).
I suspect this analysis can be carried out and yield quite good results in the gaussian case (a careful analysis might even yield error bounds on the result).
Yes. If you spend your whole life on one long multi-transfer bus journey, you'll end up with a gaussian.
It's a bit less clear that gaussians should be used when e.g. fitting a coordinate to an astronomical feature, which might not actually be symmetrical.
The other useful property that the gaussian has is its separability, in the 2D case. That is unique to the gaussian and counts for a lot.
Eh, I don’t think that many are required. Convergence to a Gaussian is pretty fast (you should check out page 299 of [0]), at four or five a Gaussian is already a quite good approximations.
It's not the next bus, it's the number of buses going the other way, which will on average be greater than 0.5 by the same reasoning given in the article.
It's also possible for the result to be biased because of scheduling. If inbound buses pass every 10 minutes at 16.00, 16.10, 16.20,... and outbound buses at 16.01, 16.11, 16.21, ... you'll usually see an inbound bus first. Though I expect this was not the case here.
This is pretty counterintuitive. In the game described you should expect to see one 'wrong way' bus per play on average, not half as you might expect. On the other hand, you have an exactly even chance of catching your own bus before seeing a wrong one, so if your scoring system had been +1 for your bus and -1 for one or more wrong ones, then you would indeed score 0 over time. But with your point per bus scoring system your expected score turns out to be -0.5 per play.