Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Ask HN: How to Calculate?
4 points by mech422 on Oct 18, 2021 | hide | past | favorite | 20 comments
a friend and I were talking about holiday plans and realized we have no idea how to go about solving this:

All other things being equal, which would be 'better'/'safer' statically(?) - 6 hours on a bus with 50 people or 2 hours on a plane with 200 ?

(forgive the covid reference - its just how it came up. Only interested in how one goes about calculating things like this..)

P.S. Is there a name for this sort of problem/math? Something I could google for more info?

Edit: Hmm - another way of saying this might be - whats gives you the best chance of having your lottery ticket drawn - 2 drawings from a pool of 50, or 6 drawings from a pool of 200?

Thanks!



> whats gives you the best chance of having your lottery ticket drawn - 2 drawings from a pool of 50, or 6 drawings from a pool of 200?

The standard trick for this type of probability problem is to reverse the question: What is the probability to not have your ticket drawn:

P(unlucky_first_time AND unlucky_second_time AND ...)

= P(unlucky_first_time) * P(unlucky_second_time) * ...

For two drawings from 50:

P(unlucky) = 49/50 * 48/49 = 0.96

so P(lucky) = 0.04

For 6 drawings from 200:

P(unlucky) = 199/200 * 198/199 * 197/198 * 196/197 * 195/196 * 194/195 = 0.97

so P(lucky) = 0.03

In other words, your best chance of having your lottery ticket drawn is in 2 draws from 50.

Edit: Note that I assume that the ticket chosen at the first draw is taken aside and not put back in the pool. Hence the 49/50 becoming 48/49 the second time, since the pool now contains only 49 tickets.

> 6 hours on a bus with 50 people or 2 hours on a plane with 200

I believe that the original problem is very different actually.

Let's specify the question a bit more. Assume that, for each hour, for each fellow passenger, you get a probability U of getting covid. That's a strong assumption (it actually implies a specific model of contagion) but let's go with that.

(Edit: You could think of it the following way. Imagine U=1/6, then you could roll a dice and a "6" would give you covid. Then the model I propose would be: Once an hour, you go to each one of your fellow passengers and roll the dice. If you get a "6" even once, you get covid. Obviously U=1/6 is way larger than what would be realistic, but I hope you get the picture.)

Then the probability to not get covid is, in case 1:

P(no_covid) = ((1-U)^50)^6 = (1-U)^300

and in case 2:

P(no_covid) = ((1-U)^200)^2 = (1-U)^400 < (1-U)^300

So the probability of getting covid is higher for the 2 hours on a plane with 200.

Again, it's an extremely naive assumption, and it supposes a very specific contagion model.


Thanks for the detailed response! On a quick look, for the second (bus/plane) example, it appears you didn't use the binomial distribution formulas for this? Is there a name I could google for the formulas you used ?

edit: I'm puzzled by the fact that the probability term was RAISED to the number of hours, rather then multiplied by it? so the chance isn't linear with the amount of time ?

Thanks!


> it appears you didn't use the binomial distribution formulas for this

You would use binomial distribution formulas if the problem was more complex. Say, if you cared about how many times you get covid infected, or how many times your winning ticket is drawn. The problem here is a special, simpler case.

> Is there a name I could google for the formulas you used ?

Any intro to probabilities book will start with such problems.

If you really want to google something, the basic assumption here is that the events are "independents". You can find examples here: https://en.wikipedia.org/wiki/Independence_(probability_theo...


Thanks! I appreciate the links/google fodder - it'll let me RTFM more on this stuff...


> I'm puzzled by the fact that the probability term was RAISED to the number of hours, rather then multiplied by it? so the chance isn't linear with the amount of time ?

Yes it is not linear. Imagine you are repeatedly throwing a dice. The probability that you get a "6" at some point is absolutely not linear with the number of throws:

P(not a single "6" in N throws) = (4/5)^N

so

P(at least one "6" in N throws) = 1 - (4/5)^N

Edit: Also, raw probabilities are constrained to the interval [0, 1]. So they are rarely linear in a parameter, because, then, they would easily escape the [0, 1] interval. Imagine that we had P(covid) = some_constant * some_parameter. There would be values of the parameter such that P(covid) > 1, which does not make sense.


Thanks! This will probably make more sense once I can RTFM the probability stuff..


Are you worried about climate change, crash risk or covid, or all of them?


not really any of the above per se, I just wanted to know how to figure the probabilities..


2/50 > 6/200 So 2 from a pool of 50 has more chance.


I _think_ thats for a single drawing? From the document my friend linked, I think for multiple drawings you use the binomial thing ?


It is a good approximation when you have very small probability like this. For larger probabilities you can do the same calculation and their relative position is still true even though the probability isn't, since both will be approximately exponential with growth factor roughly equal to their fraction, and exponentials adds linearly.


thanks very much!


Ahh - a friend pointed me to this(1) - a 'binomial distribution'. I think thats what I want ?

1: https://www.statisticshowto.com/probability-and-statistics/b...


Hmm - looking at the lottery version of this...

Is it really as simple as 2 out of 50 chance vs 6 out of 200 chance ?


Try this calculator. I don’t know if they have updated it with options to account for increased infectiveness of the delta variant or for the protective effects of masks and vaccines since I found and submitted it here though. https://www.mpg.de/16015780/corona-covid-19-aerosol-infectio...


Thanks - but I'm actually looking for how to solve problems like this myself and google fodder to look for more info.

I think this is some sort of probability, but I'm not sure how the multi-hour stuff plays in...


You start with a model. Depending on how good you want to match real-world data, that may involve answering questions such as

- if somebody sneezes, how many virus particles get out? (may depend on person’s age, mask-wearing, etc)

- how far (may depend on person’s height, lung volume, etc)

- how long do they survive? (may depend on temperature, humidity, etc)

- how likely are they to get to uninfected people? (may depend on what people are there. Kids in a kindergarten, for example, tend to get closer together than workers in an office)

As you can see, that can get complex fairly fast.

If you want it simpler, you can start with such things as

- If you’re in a room with P persons who are spreading virus for an hour (or a minute, a second, etc), how many virus particles will you receive?

- What’s the probability you’ll become ill if you receive P virus particles in a given time period?

- if you become ill, how serious will it be?

- if you receive P particles in a time period and don’t get ill, and repeat that immediately, is the probability of getting ill in that second period the same as in the first, higher, or lower?

You can make each as simple or complex as you want.

Once you have a model, calculations can be easy or very complex.

That’s what the spreadsheet given in that link does for one model.

If you’ve done the calculations you’re not done. You have to validate your model to know whether the numbers it spits out actually make sense.


>>if you receive P particles in a time period and don’t get ill, and repeat that immediately, is the probability of getting ill in that second period the same as in the first, higher, or lower?

This is the part I was trying to figure out how to compute. I think the binomial distribution stuff deals with that.. eg how to calculate an 'overall' probability from n trials with p probability each.

But I'll have to read up more on it


That depends on how you model infection. If you treat it as gunshots, that is: if you didn’t get hit by the infection in the first time period, you’re 100% healthy, and just as likely to get hit in the second time period, the probability of surviving n trials each of probability p of getting infected is 1 - (1-p)^n.

That’s a simple model, though. Another model may have your resistance decrease after being ‘attacked’ by the virus in the first period (as a boxer who gets knocked out in round 10 due to a blow that wouldn’t have hurt him in round 1), and/or it increasing (I would guess over longer time periods) because your body develops some resistance.

Such models are likely better in the sense that they can better predict what happens, but also more complex, and their parameters may be harder to estimate.


thanks - those were very understandable examples!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: