In the correct version of this story, the mathematician says "I have two children", and you ask, "Is at least one a boy?", and she answers "Yes". Then the probability is 1/3 that they are both boys.
If I told someone that in real life and they replied by asking, "Is at least one a boy?", I would probably back away very, very slowly.
The main point is that given the way Atwood asked the question the correct answer is 50% which wasn't the answer he intended to discuss and indicates he himself didn't fully understand the subject he was writing about. It wouldn't be the first time either.
Atwood doesn't seem wrong or particularly unclear to me. If you "met someone who told you they had two children, and one of them is a girl", then presumably we should imagine this person saying: "I have two children and one of them is a girl," and not "I have two children, X and Y, and X is a girl." Obviously in the first case, we don't know if it's X or Y that's the girl, so the set of possible worlds is [(X-G & Y-G), (X-G & Y-B), (X-B & Y-G)], so we get the 2/3 answer. But maybe other people don't share my linguistic intuitions...
If I met someone who told me "I have two children, and one of them is a girl," I would be pretty sure they have one girl and one boy. If they had two girls, they would have said "I have two children, and both of them are girls." So the English statement has some implicit information. On the other hand, if I were given a riddle with that statement, I would know the implicit information may be intentionally misleading. In this case, as Paul Buchheit says, the statement does not have enough information.
Yes, the question must be very carefully worded (which is kind of my point, if I have one).
In order to get the "unintuitive" outcome, there must be some element of selection (much like the Monty Hall problem has). In your formulation of the question, the GG possibility has already been eliminated because the mathematician answers "yes".
What do you think of this story? I get a frantic call from my friend at the airport. He's been captured by the TSA for bringing a nose hair trimmer onto the plane. He was taking his brothers two kids (whom I know nothing of) back home, but now needs me to pick them up at the airport. The line goes dead. I show up at the airport and there are 4 pairs of children waiting behind the glass to be claimed, conveniently arranged in pairs of GG, GB, BG, and BB. Each pair is equally likely to be the one I am supposed to pick up.
If my friend had told me "their names are Sarah and--" before the line went dead, I would mentally eliminate BB. Sarah could be any of the four girls. The three remaining options are not equally likely to contain Sarah, because the first one has two girls, and either one could be Sarah. So, the probability of Sarah being the girl in BG is 1/4 and of being the girl in GB is 1/4. The probability of the other child being a girl is also 1/2. It's not so much that BG or GB is eliminated in this case, but that the probability of her being in the GG group is better. But wait a second, I am not trying to find Sarah, I am trying to find the pair of kids that is his pair of kids. Is the probability that a group contains Sarah different from the probability that a group of kids is his? Maybe that's my bad assumption...maybe "at least one is a girl" is some precise formulation I don't understand that means the GG group isn't twice as likely to contain that girl. I'll get back to that. (Possibly her name is "at least once"?).
But what if he instead told me something over the phone that would eliminate the possibility of it being two boys only? What if when he had looked out over the groups of children waiting to be picked up before they blindfolded him and panicked and said "it's not the two boys--"? Is knowing that one of the children is a girl different from knowing they both aren't boys? I think it is, because the latter is a statement about the set of events, and the former is a statement about a single event. When we talk about the pairs of children, the set of events, we are not conveying direct information about the individual events. In this second case he is not conveying information about the individual events and the three remaining choices remain equally likely, 1/3 each.
Looking back to "at least one is a boy"- if we interpret that as a statement about the sets of probabilities, I would more precisely restate Eliezer's question as "Does the set of two children contain at least one boy?". This is why the important part of the story is the word "mathematician". The mathematician is talking about eliminating sets of events when he says "Yes". Ordinary people would just talk about the gender of a child.
You know what's really scary? I saw this out of context, on news.yc/newcomments, and I didn't realize till I came to "pairs of GG, GB, BG, and BB" that it was a fictional setup.
The author apparently doesn't understand where the confusion in the original question (or variations) comes from.
One way of phrasing it is asking,
"If you have an unrelated man and woman and they both have two children (one of which is a boy), where the oldest son of the man is a boy - what are the odds that they both have two boys?"
(There are any number of variations on this - what are the odds of the man having two boys? what are the odds of the woman having two boys what are the odds of the man but not the woman having two boys .. all the same problem, but stated differently).
This 'loads' the question by implying (superficially incorrect) that there might be a difference between the chances of a man and a woman having two boys.
Next up is a bit of probability theory. In the case of the woman, no order is stated, so the chances of her two children have no connection - the events are unrelated. The man, however, has as a first child a boy (which eliminates the possibility of this being a girl).
And as for the overall birthrate of men vs women or the possibilities of having twins/triplets/etc (and their male/female ratio) ... well, that's really out of the scope of a fairly trivial question statement such as this.
No, you're wrong. The thing is that there are two kinds of confusion that arise from Atwood's problem. The first kind of confusion comes from not understanding how probabilities work, which you discuss. The second kind -- which is what Paul Buchheit is talking about -- comes from noticing that the statement "I have two children, and one of them is a girl" can be parsed in two different ways. It can be parsed as, "I have two children, and at least one of them is a girl," or as "I have two children, and the gender of one of them is #{my_first_child.gender}." Despite what many commenters in this thread are saying, these are not the same.
That's the real problem here: the English language is inexact. The words that Atwood used to describe the scenario actually describe at least two mathematically distinct scenarios.
The Monty Hall problem suffers from this fact, too, but not as badly -- because both interpretations yield the same conclusion, namely, that you should switch doors. It's just that under one interpretation, you get a car 1/2 of the time, and under the other you get it 2/3 of the time. Also, under the interpretation that yields a car 1/2 the time, it's logically implied that the host is willing to open a door and reveal a car -- which most people use to rule out that interpretation, if only subconsciously.
Maybe this is because English is my first language - but I cannot imagine how "I have two children, and one of them is a girl" could mean: "I have two children, and the gender of one of them is #{my_first_child.gender}." - if someone was to confer that meaning I would expect him to say: "I have two children, and the gender of my first child is girl". But frankly as someone already pointed out - normally the sentence "I have two children, and one of them is a girl" would implicitely mean that the other child is a boy.
It's the difference between the situation where a child is chosen and then the gender announced, and the situation where a girl is found, and then the gender announced. If you started out saying "Is one of them a girl?", and only continuing if the answer is yes, then the probability changes.
Actually, even under the "correct" interpretation of the host's actions in the Monty Hall problem (where he knows where the car is and will never open that door and the participant knows this), there's still an analogy, an even closer one, to this ambiguity.
Just as here, if you ask the question in a certain way, the choice of the person, when they have the choice, becomes important (which gender to announce in case of GB/BG), in Monty Hall, if you ask the question in one common way of asking it, the choice of the host becomes important. The host can choose which door to open if they're both goats. If, in that case, the host will pick randomly, the probability of a win by switching remains 2/3. But suppose you picked door 1, and the host will always prefer door 3 when he can, for whatever reason, and you know this. Then given the information that he opened 2 or 3, the probability to win is 100%/50%, respectively. So if you ask the Monty Hall question this way: "I picked a door and the host opened another, what's my probability of winning now?", to get 2/3 the question should include the information on the host's random selection between 2/3 when the car is in 1. Admittedly that's a bit pedantic, but there you go.
"I have two children, and the gender of one of them is #{my_first_child.gender}." Are they not the same because of FIRST_child (which would be true), or is the FIRST just an accident here? I for one don't see how the original statement could be interpreted as describing the second algorithm. We clearly get the information that there is a girl among the children, and nothing is said about it being the first child.
Basically we have a situation where we see a person and we know he has two children, at least one of them being a girl. If it is impossible to derive correct probabilities from that information, then probability theory is useless. There is no algorithm, it is just a situation.
In retrospect using "#{my_first_child.gender}" was a pretty big mistake. I should have put "#{one_of_my_children.gender}". For some reason I thought the former would actually be less confusing than the latter.
Isn't the probability still 2/3 even in the proposed alternative? The fact that we "arbitrarily announce the gender of one of the children" provides new information for calculating the probability, which is ignored if you assume that the original 50/50 distribution is unchanged by the announcement.
In an extreme example, consider this algorithm, analogous to the one in the article:
1. Choose a random parent that has exactly two children
2. Announce the gender of _both_ children
3. Ask about the odds that the parent has both a boy and a girl
In this situation, it would be ridiculous to say that the odds calculated at step 3 are still 50/50. Because of step 2, it is either certainly true (100%) or not true (0%) that the parent had both a boy and a girl. Granted, on average with repeated trials, half the time the step-3 value will be 100% and the other half 0%. But still, the step 2 information changes the value at step 3.
You've gained no useful information. The person who announces the gender of one child can say one of two things: "this child is a girl" or "this child is a boy." However, these two statements are symmetric in their effect on P(one child is a girl and one is a boy).
There's a huge difference between announcing the gender of one child and announcing the gender of both children. If the announcer announces both children's genders, he can say that "both are boys," that "both are girls," or that "one is a boy and one is a girl." But the last statement is not symmetric to the first two in how it affects P(one child is a boy and one is a girl).
Manolis comments on Paul's post summed it up pretty well for me.
What feels counter intuitive is that announcing the gender of one child seems to increase the chances of the other child being of a different gender, but it's actually the opposite.
"Family with 2 children" ->
chance of having at least one boy: 75%
"but I have at least one girl" ->
chance of having at least one boy: 66%
"but the first one is a girl" ->
chance of having at least one boy: 50%
"but they're both girls" ->
chance of having at least one boy: 0%
It's true, as a commenter on this article mentioned, that successive children are (more-or-less) independent events, but that doesn't mean what that commenter (or our author) apparently thinks. If you specify that the older child is a girl, then the chances of the younger child being a girl are 1/2. If you don't specify which child is a girl, only that one is, then it's 1/3. This and the Monty Hall problem are just examples of the same limited-information situation.
Nope. Announcing the gender of one child does not magically alter the gender of the other child. It's like a print statement in code. If you don't believe me, try writing some code to simulate this. I will bet you an arbitrary amount of money that I'm right :)
Here's another way of looking at it: By your logic, if I announce that one of the children is a girl, then the other child only has a 1/3 chance of also being a girl. Likewise, if I announce that one of the children is a boy, then the other child only has a 1/3 chance of being a boy. Therefore, by your logic, the act of my arbitrarily announcing the gender of one of the children increases the probability that the other child is of the opposite gender from 1/2 (what it was before I spoke) to 2/3, regardless of whether I said it was a girl or boy. Hopefully you can see why this is not correct.
"Announcing the gender of one child does not magically alter the gender of the other child."
Nor is that what I said. :)
Assuming you meant "chance of being a boy" where you said "chance of being a girl", I agree that announcing the gender of one of the two children increases the probability of the other child being the opposite gender from 1/2 to 2/3.
The reason this is so is that some of the original probability was sunk in a case you've now eliminated: the case where there were two of the gender you didn't announce.
[Edit: Jeff already wrote some code, and while I haven't bothered to review it, I assume the lack of outcry about it indicates its correctness. Care to share where he got that wrong?]
So if I ask all of my friends who have two children to "tell me the gender of one of their children", then you think that after they answer the question, there is a 2/3 chance they have both a boy and a girl? (but before answering the question the probability was 1/2) Doesn't that seem a little absurd to you?
If Jeff wrote code that yielded 2/3, then he was implementing my "algorithm 1" (which has selection), not the second algorithm (which does not do any selection).
I just updated my post with an explanation that may clear things up for you.
I thought it might be instructive to implement both algorithms side-by-side in the same loop:
import random
jeff_stats = { 'boy_and_girl': 0, 'two_girls': 0 }
paul_stats = { 'boy_and_girl': 0, 'other': 0 }
for i in xrange(100000):
children = [random.randint(0, 1), random.randint(0, 1)]
# Jeff's algorithm
if children[0] or children[1]:
print 'I have a girl'
if children[0] and children[1]:
jeff_stats['two_girls'] += 1
else:
jeff_stats['boy_and_girl'] += 1
else:
print "Oops, I'm a liar - I have no girls, so don't count me"
# Paul's algorithm
print 'I have a ' + ('girl' if children[0] else 'boy')
if children[0] != children[1]:
paul_stats['boy_and_girl'] += 1
else:
paul_stats['other'] += 1
print "Jeff stats: " + str(jeff_stats)
print "Paul stats: " + str(paul_stats)
Output (ignoring the actual printouts for each person):
Jeff stats: {'boy_and_girl': 50027, 'two_girls': 24949}
Paul stats: {'boy_and_girl': 50027, 'other': 49973}
I liked tromino's explanation best: it's really an ambiguity in the English language. Under your 2nd algorithm, 1/4 of parents can't truthfully announce "I have a girl" - but in ordinary conversation, this doesn't matter, because they'd just truthfully say "I have a boy" and we don't make a distinction. It's only when we explicitly filter out parents who don't have a boy that we change the probabilities.
The question "what is the gender of one of your children" gives you some information about the gender of both of their children. It's enough information to eliminate the case where both children are the gender opposite the one they mentioned.
It can still be the case that the chances of there being a boy and a girl are 1/2, while the chances of there being a boy after you find out about a girl are 1/3, purely because you don't know whether you found out about an older or younger girl. Probability is just a function of our ignorance (at least in this case; I'm not qualified to have an opinion about whether it always is).
[Edit: Hm. Looking again, it seems Jeff Atwood's code is for a different problem he was also discussing, so nevermind. After considering your update, I think you might be right.]
[Further edit follows:]
Here's some python that essentially shows you're correct, unless I flubbed it:
I'm willing to bet $5 that the probability (according to the current statistical science corpus) is still 2/3 (even if you're right that announcing the child doesn't alter the sex of the other). See my comment in the main thread.
Ok this thread is confusing. Actually, your second algorithm wasn't the point of my argument. You're right that according to your second formulation, the probability is 1/2, of course.
My point is that there is nothing wrong with Jeff's question, and I believed that you made a distinction with the Monty Hall version. But your second algorithm isn't the same as Jeff's question, my mistake.
It is very clear that my argument specifically adresses the question the way Jeff has formulated it, so you can even forget the part about "your second formulation", what I meant was the question of Jeff and why there is nothing wrong with it.
In statistics, it is commonly accepted that we don't know what we don't state (for an obvious practical purpose). Therefore, when we say : "One of them is a girl", it is implied that we don't know wich one because we didn't specify it.
Therefore there is nothing wrong with Jeff's question (and the probability is 2/3), at least for anyone familiar with basic statistics and probabilities conventions (and I suspect you are), anything else is just quibble. I assumed your argument wasn't about nitpicking this kind of convention, but it turns out I was wrong. I guess nobody wins this one :)
This hinges on whether we decided ahead of time that we would only consider cases in which there is a girl, something which I didn't see earlier. :)
As mentioned in another thread started by Eliezer, it's the difference between "What is the sex of one of your children?" and "Is (at least) one of your children a girl?". For the second question, the results skew to 1/3 and 2/3 because we're discarding the cases where the answer is 'no'.
The light went on for me when Paul pointed out in his update that getting a random answer for "What is the sex of one of your children?" eliminates (an unknown) one of BG and GB.
Basically you're right, and Paul is right too. This is just a matter of convention, it depends on what you want to hear in the question. If Jeff said "one of the children is a girl", but added "but we don't know wich one of the two is", this post would have never existed and we wouldn't have argued so much over nothing.
In the world of mathematical conventions you learn in school, the question is understood and you're right. In the "normal" world where you can quibble with language because there is no specific agreement, Paul is right and the question should be more precise. In both cases the conclusion remains the same : what a waste of time.
You are right about the question ("Announcing the gender of one child does not magically alter the gender of the other child"), but wrong about the statistics, even in your second formulation the probability is 2/3, not 1/2. This problem, the way Jeff has formulated it, has nothing to do with the Monty Hall problem. But still, the end probability is the same : 2/3 (that's the only common point). Tell me if there is a mistake :
The question is : What are the odds that the person has a boy and a girl, if we already know that one of the child is a girl ? (If we agree on the question, then we must agree on the following probabilities).
Possibilities are :
1/ boy / boy
2/ boy / girl
3/ girl / boy
4/ girl / girl
Since one of them is a girl, we must remove possibility number 1. That leaves us with 3 possiblities, and 2 of them have a boy. Probability : 2/3
(edited for correction, we search the probability of having a boy, not a girl :p)
This is precisely the first subtility you encounter when you learn probabilities : they are not identical. Paul is right that the formulation of the question doesn't refer to the Monty Hall problem at all, this isn't the same algorithm. But in this case, the probability turns out to be the same. That's the real confusion in Jeff's article.
I mean, it's not even my own deduction, it's what you are taught when you learn probabilities. It's a basic and core paradigm, and I'll digg it up from Wikipedia if somebody still doubts it :)
Again, ordering is irrelevant in this problem. We want to know only the probability that there will be a boy/girl pair, not the probability that the boy/girl pair was born in a particular order.
But the same result can be obtained when taking ordering into account -- the key observation is that if you subdivide options 2 and 3 to account for sibling ordering, then you also must subdivide options 1 and 4 to account for sibling order, resulting in 6 total possibilities, of which 2 are M/F sibling pairs, 2 are M/M pairs, and 2 are F/F pairs:
1) M/m
2) m/M
3) M/F
4) F/M
5) F/f
6) f/F
Given the knowledge that one sibling is female, you then exclude 2 of the 6 possibilities (the m/M and M/m pairs), to obtain 2/4 = 50% probability that the pair of siblings is of mixed sex.
The mistake you're making is that you're including ordering on the mixed-sex pairs, but not including ordering on the same-sex pairs.
Don't be silly -- do you think I invented a couple of new sexes by adding lower-case letters? You're just getting thrown by the notation. I could have written the options as:
1) M/M
2) M/M
3) M/F
4) F/M
5) F/F
6) F/F
but I thought that was confusing, so I introduced a symbol to more clearly illustrate the differences between the ordering of the same-sex options.
"Somebody is wrong on the internet". I'm wasting my time and this is my last answer. If you haven't noticed, I only use strict mathematical arguments and I invite you to do the same if you intend to answer. Pure and clear maths please, no litteral arguments about the sexes or god knows, this is the only field where we can verify it.
Lacking elementary probabilities knowledge isn't as dramatic as refusing to learn it, please teach yourself now since nobody will look at this thread again.
Here is my last point, and you can ask any teacher of formal logic, probabilities or maths to verify it :
For universe [M,F], the table of possibilities is :
MM
MF
FM
FF
And that's it. I'm sorry but the table you just made up doesn't exist at all, please go ask one of your teacher about it. If you still believe you are right, and you can prove it, you just discovered a new field in mathematics and probabilities, congratulations.
Rather than insulting me, take a moment to think about the consequences of what you're saying: you're arguing that by announcing the sex of one child in a pair, the probability of the other child's sex being a particular value changes to 2/3. Does that make sense to you? Really?
Again, this has nothing to do with symbols or notation. There are two sexes, two symbols: M, F. The undergrad probability 101 mistake you're making is that options:
MF, FM
take order into account, while options:
MM, FF
do not. This is incorrect. If you take order into account for the mixed-sex case, you must take order into account for the same-sex cases. MM and FF encompass four options with ordering, not two.
I'm sorry if I sounded offensive, and I indeed was, I was tired when I wrote my last comment
and my words didn't reflect my tought. I'd like to elucidate this problem once and for all, I really
do believe we can both agree on a conclusion.
I'd like you to notice that your new table of probabilities imply that I have a chance of 1/6 to guess the gender setup of a
family of 2 children. I don't think that makes sense either to you.
Let's make the experiment a bit clearer :
- We gather a number of families who have 2 childs.
- For each family, we announce the gender of one of the child, but we don't know wich one.
- We are then asked to guess the sex of the other child.
At this point, you believe that the probability to guess right is 1/2, and that the 2/3
probability doesn't make any sense. My claim is that you fall in the Monty Hall problem trap,
wich is very counter-intuitive and doesn't seem to make sense at first.
But here is some clarification of the problem :
- What we are really asked is to guess the _gender_ setup of the family. So we need to establish
the universe of possible family setups before answering. What are they ?
Even if we don't care about the order, we must acknowledge that there are 2 childs in
the family, so there must be a first child, and a second child.
Setup 1 : both childs are boys : M/M
Setup 2 : both childs are girls : F/F
Setup 3 : the first child is a boy, and second child is a girl. M/F
Setup 4 : the first child is a girl, and the second child is a boy. F/M.
Why order matters in setup 3 and 4 ? Because M does not equal F, while M=M and F=F. We investigate not the individual itself, but the property of the
individual (in this case, the gender). Therefore, M/F is not equal to F/M, and in the real
world there must be a first and a second child.
If you are asked to write down all the possible setups of a family of two in the real world, you would write the same
table. You'd say that :
Some families have 2 boys = 1 setup
Some families have 2 girls = 1 setup
Some families have one girl and one boy = 2 possible setups (1st one is a girl OR a boy).
Your argument of F/M = M/F implies that all families have _either_ one of the 2 setups, every first child is a boy,
or a girl. But it doesn't work like this in real life. That is why order matters.
Conclusion : if we agree that there are 4 possible setups in a family of 2 childs, then we have a probability of
1/4 to guess the correct setup of the family given NO information. But if we are informed of the gender of
one of the child, then one solution of the setup is removed (F/F or M/M), and we have a chance of 2/3 to guess right IF we chose the opposite gender (see Monty Hall problem).
And if we are informed of the gender of one specific child (1st one or 2nd one), then it leaves us with only 2 solutions ! And here, the probability
becomes 1/2.
It's all a question about the correct precodition definition. I think people who heard about conditional probability know that the correct answer is "2/3". Most people will just imagine the first precondition "they have two children" and forget the second one.
What about going in the opposite direction and simply ask "What is the probability that a mother has a boy and a girl?" I think many people would answer "1/2", too, simply adding the precodition "she has 2 children" ;)
The origin for this is - I think - an implicit context people are thinking of without real awareness of it. You think quicker with implicit context.
If Jeff Atwood argument would be right, couldn't it than be applied to roulette as well?
If you play two games for example, there are 4 different orders of black/red possible (neglecting 0/00):
b,b
b,r
r,b
r,r
Each has the same possibility of 1/4. If you place 1$ in the first round on black, your chance to win is 1/2. If it is black (or red), only 3 of the original 4 option are left and two will lead to red...
If the FIRST round is black, only two of the four possibilities are left, one of which leads to red.
Similarly, if Atwood's problem stated that the FIRST child is a girl then the probability that the second is a boy is 1/2.
The problem, however, states that one of the children is a girl -- it could be the first or the second. In this case the probability that the other child is a boy is 2/3.
randallsquared is correct. You are misunderstanding sequences of related events.
Running the roulette wheel twice is a sequence of unrelated events. The (ideal) roulette wheel is totally random. The results of run #1 are unrelated to the results of run #2.
However, in this case of the man and woman both having two children, one of which is a boy, but the man having as an oldest child a boy we are talking about two different scenarios. The events are suddenly related. If you want it spelled out, the matrices are,
The childbirth options are this,
BB
BG
GB
GG
However, for the man it's been stated that his oldest child is a boy, so it becomes,
BB
BG
XX
XX
For the woman (who has as a condition 'at least one boy, but not necessarily the eldest') it is,
BB
BG
GB
XX
(Where 'B' is Boy, 'G' is Girl and 'X' is a non-option).
In Jeff's post the question was posed to show how bad humans are at probabilities, and it got the point across very well. I know that I fell right into the trap...
This post takes what was meant to be an enlightening and refreshing take on how bad we as humans are at probabilities and turns it into a numbers-duel.
No the original questions doesn't clearly state the definition of who is asked, it doesn't take into account that slightly more girls than boys are born, and it doesn't answer the question about whether this all happened in a cult where all boys are killed at birth. It was a simple metaphor. And it worked...
Really, a (not the, but a) problem is that almost everybody is right. We are certainly fooled by probability. But also, if someone volunteers that one out of their two children is a girl, then it's effectively-100% that the other is a boy, in real life.
Slightly amending a quote in a similar context from "Principles of Economics, Explained" (http://www.youtube.com/watch?v=VVp8UGjECt4), nobody says "I have a girl. I have another girl. I have another girl."
One way of interpreting this perennial debate is as proof of how just how fuzzy English can be. People aren't just arguing about the answer, they argue about the question too.
Really, a problem is that almost everybody is right.
Yes - this is why I pointed out that in Jeff's post this riddle was used as a metaphor. Otherwise we could spend all our time bickering about the correct spelling of "colour" or other trivialities instead of addressing the interesting questions.
The interesting question in Jeff's post was the poor human understanding of probabilities.
See here for why I think this problem confuses people: http://www.overcomingbias.com/2008/03/mind-probabilit.html