Hacker News new | past | comments | ask | show | jobs | submit login

I've been thinking about this for an hour, and I'm now convinced that the author is wrong. The fact that we found out about one of the children from the father means that all probabilities are not equal, even though they're treated here like they are.

The difference is between the information being offered, and determined independantly. I'll do this with the boy/girl problem, for simplicities sake.

If we ask a man if he has at least one boy, and he says yes, we can work out the chance the other child is a boy like so:

Assume all four possibilities are equally likely:

   BB
   BG
   GB
   GG
If we ask him if he has at least one boy, and he says yes, we effectively filter off GG, which brings the list down to:

   BB
   BG
   GB
Therefore, the chance of the other child being a boy is 1/3. Pretty straight forward.

However! Because the father offered the information on his own, it effectively turns it into the author's other problem, where the older child is a boy, find out the gender of the younger child.

The trick is that he's equally likely to give information about either of his children, therefore there are eight possibilities:

He gives information about child A:

   BB - B
   BG - B
   GB - G
   GG - G
He gives information about child B:

   BB - B
   BG - G
   GB - B
   GG - G
There are still 3 possibilities, but BB is twice as likely as the others, because if both children are a boy he's definitely going to reveal the gender of one of them as a boy; whereas if one is a girl and one is a boy, there's only a 50% chance he will.

   BB - 50%
   GB - 25%
   BG - 25%
So, the answer to this:

You meet a man on the street and he says, “I have two children and one is a son born on a Tuesday.” What is the probability that the other child is also a son?

Is 1/2

The answer to this:

A man has two children, and one is a son born on a Tuesday. What is the probability that the other child is also a son?

Is 13/27

It's nitpicky, but I think the author should be very exact about this kind of thing, since he's trying to clear things up.




Yeah, what matters is the contents of the initial set of families over which we determine probability.

"A man has two children, and one is a son born on a Tuesday. What is the probability that the other child is also a son?"

If the man is randomly chosen from the set of all families the answer is 1/2.

If the man is randomly chosen from the set of all families with a son born on a Tuesday then the answer is 13/27.

The reason for the difference is that a boy/girl family has a 1/7 chance that the boy was born on a Tuesday whereas the boy/boy family has only a 13/49 chance.

B G (7/49 probability of a Tuesday boy)

G B (7/49 probability of a Tuesday boy)

B B (13/49 probability of a Tuesday boy)

13 / (7 + 7 + 13) = 13/27


>If the man is randomly chosen from the set of all families the answer is 1/2.

By assumption, the man has a son born on Tuesday, so this is hardly relevant.

If A is a subset of B, then choosing x uniformly at random from A given that x is in B is the same as choosing uniformly at random from B.


I agree but set B is not a uniformly chosen subset of A in this case. That is the core of the trick. The rule for choosing B is intuitively uniform but actually slightly favours families with a girl and a boy over those with two boys.


You're making the exact mistake the author is cautioning against, which is assuming the day doesn't matter. Write out all the possibilities (see my other comment in this thread), eliminate the dupe, and you get 13/27.


Did you read my post carefully? I do get 13/27, when presented with the information from a neutral third party - i.e. filter for all 2 child families with one son male/Tuesday, what is the chance the other is also male/Tuesday.

But the fact that the father voluntarily offered up the information changes the probability distribution. We can assume he's selecting one of his children at random, and revealing their birthday and gender.

If only one of his children is a male/Tuesday, there's a 50% chance he'll say male/Tuesday.

If both are, there's a 100% chance.

So I'm not counting the possibility twice; I'm saying that given that the father reveals male/Tuesday, it's disproportionately likely that's as a result of having two male children born on a Tuesday compared to any other possibility.


We can assume he's selecting one of his children at random, and revealing their birthday and gender.

This is the entire point of the article, IMO: we have to make some assumption about how we selected this guy to talk to, and how he chose what to tell us. It's not pinned down by the statement of the problem, and what you might consider a natural assumption is not necessarily what other people might assume.

Which is why these problems tend to suck...


I definitely agree - I hate problems like this, because the difficulty is caused by ambiguity of English, not the problem itself.

But even given other assumptions as to why the father selects the child he does - sort by date, males first etc, the author's answer of 13/27 is still almost certainly wrong - he should have just taken the father out of the equation completely.


I don't think this has anything to do with the ambiguity of language or English. It does, however, like the Monty Hall problem, require you to make assumptions about how/why a speaker presents certain information.

Think of it like this. What would the father say if he did in fact have two boys who were both born on Tuesday. Would he really say, "I have two children and one is a son born on a Tuesday"? Wouldn't he instead say, "I have two children and both are sons born on a Tuesday."?

I mean he could say it the first way, but such a comment would be borderline misleading. To say you have one son born on Tuesday when in fact you have two is technically correct, but I think the problem assumes that the man is speaking somewhat plainly.

So I do agree with you that communication intent is ambiguous, but I agree with some others that this is sort of the whole point of the problem, and it's not always immediately obvious that statistical information is hiding in seemingly irrelevant data.


I agreee -- this problem should not be discussed in English. Python seems better: http://news.ycombinator.com/item?id=3290313


The probability that he would make that statement given what his children are is a different question than the probability that the other child is also a boy given that he made that statement. Your conditional probabilities are correct, but your conclusion about what they mean for the original question is flawed.

    N = Total number of ways to have 2 children over 7 days = 14^2 = 196
    B = Two sons born on Tuesday.
    O = Exactly one son born on Tuesday.
    A = At least one son born on Tuesday. 
    T = Two sons.
    S = The statement.
Priors:

    P(B) = 1/N = 1/196 = 0.005
    P(O) = 26/N = 26/196 = 0.133
    P(A) = P(B) + P(O) = 27/N = 27/196 = 0.138 
Your conditional probabilities:

    P(S|O) = .5  
    P(S|B) = 1  
An interesting number we can infer from your conditionals is the probability that a father selected at random would make the statement:

    P(S) = P(S|B)P(B) + P(S|O)P(O) = 1.0 * 0.005 + 0.5 * 0.133 = 0.0715 
But the question we asked about the other child already takes into account the fact that he did make that statement, meaning we're back to only caring about those 27 cases:

    P(B|S) = 1/27 = 0.037
    P(O|S) = 26/27 = 0.963
    P(A|S) = 27/27 = 1.0
    P(T|S) = 13/27 = .481 
Another interesting number we can infer from your conditionals is the probability that a father would make the statement given that at least one of his children was a boy born on Tuesday:

    P(S|A) = P(S|B)P(B|A) + P(S|O)P(O|A) = 1.0 * 0.037 + 0.5 * 0.963 = 0.519
If you still think this is incorrect, can you point to exactly which number is wrong and explain why?


The problem I have is that you're assuming that each of the 27 outcomes has equal probability, but the chance that we received the information in the way we did makes that a flawed assumption.

The best analogous problem is the German Tank Problem:

http://en.wikipedia.org/wiki/German_tank_problem

If we have destroyed a single German tank with a serial number 100, we can at least to begin to make an estimate on the size of the German force, by basically asking the question:

"If they have 200 tanks, what was the chance one we randomly killed was this serial number? 500? 1000?"

And then combining n=100->infinity to form a probability distribution. You can then say that there is an x% chance that Germany has 500 tanks, and a y% chance that Germany has 10,000 tanks.

However - if instead, we asked 'does there exist a German tank with a serial number 100', and the answer is yes, this does NOT tell us anything past the fact that their tanks are >= 100 in number.

We have the exact same information, but how it was determined changes the outcome drastically.

Does that make sense?


No, it doesn't make sense, because it doesn't apply here. We aren't estimating the count of anything. We know he has two kids, we know there are two genders, and we know there are seven days. All other relevant counts can be calculated directly from these, no estimation required.

I already showed the probability that we would receive the message the way we did, assuming we're sampling fathers with two children, and its pretty low. If we drop the sampling assumption, it would go even lower. But that's irrelevant to the actual question, because we've already won that lottery. I've also shown the probability that we would get the statement we got given that the father had at least one son born on a Tuesday, but again, we already won that lottery.

If you still insist, can you please stop talking in hand-wavy fake math and show some actual concrete numbers? To start, if each of the 27 possibilities are not equally likely, what are the actual probabilities and why?


The issue here is with assumptions - you have made a different set of assumptions from the author, and hence are getting a different result.

A lot of people here are having similar issues, by misreading exactly what the initial proposition means.

Your reasoning above relies on the 'likeliness' of a man giving you the information, which is something that is not meant to be a part of the problem. Although it is phrased as a man 'telling' you something, that statement is really a metaphor for 'you determine the following piece of information, 100% truthfully'.

In particular, your explanation assigns agency to the man - that if he does in fact have a son, he may or may not choose to reveal the truth 'I have a son'. However it makes no allowance for the man lying - so you are assuming if he answers it will be truthfully, but you are allowing him the lie of omission.

Whilst there is nothing specific that rules out your interpretation, it is not what is intended. Read it instead as:

----

There exists a man, A.

A has exactly two children.

The statement 'A has at least one Son, B' is true

What is the chance that the statement 'The Non-B child of A, is a son' is true?

----


I agree that that is what the author intended to communicate. But I actually think that's a bigger stretch than my own interpretation - it changes how the information was determined, which has a definite impact on the outcome.

The reason why I posted was to suggest that the author should have worded it the second way, i.e.

A man has two children, and one is a son born on a Tuesday. What is the probability that the other child is also a son?

Which leaves no doubt. I guess I didn't really make that clear enough with my original post.


I'd also observe that carefully read, this article is really about how important assumptions are, and not about the problem per se. The Peter Winkler quote is key.


The distinction you're making is in your first example both the man is arbitrary, as well as the day of the week.

In the second example, only the man is arbitrary.

With those assumptions what you say is correct.

However, the question is "You meet a man on the street and he says, “I have two children and one is a son born on a Tuesday.”" and not "You meet a man on the street and he tells you he has two children, that one is a son and he tells you what day of the week he was born". So the day is not arbitrary, it's specifically Tuesday.


0.48 ~ 0.5 Close enough for statistical purposes!


That last comparison is false. In both cases p=13/27 .




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: