Talk:Boy or girl paradox/Archive 1

This page is an archive of past discussions. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page.

Dice Example

Perhaps this example will help make things more clear for the doubters:

Suppose that I have two dice, and each has three of its sides marked with a “1” and three of its sides marked with a “2”. If I throw both dice and sum the result, I could end up with either 2 (if both dice rolled 1), 3 (if one dice rolled 1 and the other 2) or 4 (if both dice rolled 2). If I roll the dice a large number of times and keep track of the sums, I will quickly begin to see that the frequency of the sum being 2, 3, or 4 follows a 1:2:1 distribution.

If (Dice A, Dice B) is the result of each roll, then the following sample space results:

(1,1) (1,2) (2,1) (2,2)

As you can see, there are four possible results of rolling the dice, but two of the possible results will give a sum of 3 while only one possible result could give a sum of 2 or 4. Thus a sum of 3 is twice as likely as a sum of 2 or 4. If you don’t believe me about the 1:2:1 distribution, try it yourself with real dice!

Now, suppose I have rolled my two dice and I tell you “One of my dice rolled a 1. What are the odds that the sum of my two dice will be 3?” You should answer 2/3, because you know that a sum of 3 is twice as likely as a sum of 2. We can eliminate (2,2) from the sample space and are left with only (1,1) (1,2) and (2,1). On the other hand, if I say “Dice A rolled a 1,” then the probability of my sum being 3 is only 50% because now we have eliminated both (2,2) and (2,1), leaving us with only (1,1) and (1,2). -10-21-06

As this is linked from MontyHallProblem we need to be very careful (as described there) as to how the question is phrased:

In a two-child family, one child is a boy. What is the probability that the other child is a girl?

The 'one child is a boy' is ambiguous because it doesn't explicity explain that the other case 'one child is a girl' is being excluded.

For instance if a parent of a two-child family walks into a room accompanied with a boy (one of their children) is the probability that the other child is a boy or girl anything but 50/50? The answer is NO.

If I have a room full of parents of 2 children families (randomly selected) and I ask all those with a boy (implying 1 or more) to step forward - THEN and only then have I skewed the odds that for the parents that have stepped forward to 1/3 2/3.

Again this is a very subtle point, and worth making explicitly. The fact to are making a decision is very important in this problem.

This example from rec.puzzles.faq makes the 'question step' explicit http://www.faqs.org/ftp/faqs/puzzles/faq

2.3. ==> oldest.girl <== [probability] If a person has two children, and truthfully answers yes to the question "Is at least one of your children a girl?", what is the probability that both children are girls?

The answer is 1/3, assuming that it is equally likely that a child will be a boy or a girl. Assume that the children are named Pat and Chris: the three cases are that Pat is a girl and Chris is a boy, Chris is a girl and Pat is a boy, or both are girls. Since one of those three equally likely possibilities have two girls, the probability is 1/3.

You're welcome to clarify the article, but please keep in mind that the "ambiguity" here is just part of a very general phenomenon affecting all probability problems: giving yourself too much information skews the answer from the textbook result. For example, if I see a two-child family with two boys, then they certainly have at least one boy; nonetheless the probability that they have a girl is 0, not 1/2 or 1/3. I don't get 1/3 because I have more information than is in the statement of the question.

Likewise, in your example, if the parent walks into the room accompanied by a boy, I know that "the child here is a boy", which gives me strictly more information than "one of their children is a boy". With the extra information, it is no surprise that I arrive a correct probability different from the textbook result.

My point is that the abstract question "In a two-child family, one child is a boy. What is the probability that the other child is a girl?" is unambiguous, even if it can be confusing. When translating it into a real-world situation, we must be careful not to introduce the wrong information, and the "question step" you identify is a good way of doing that. But strictly speaking, it is not necessary to analyze the problem. Melchoir 19:09, 5 April 2006 (UTC)

Each child has a 50% chance of being a boy, 50% chance of being a girl. That will never theoretically change as long as there are 2 naturally born genders (most PC term I can find at the time). The problem that led people to say 50% in regards to question #2, that is, "At least one of them is a boy. What is the probability that both children are boys?" is that people sometimes use naive heuristics that fail to PROPERLY DEFINE THE NUMBER OF POSSIBLE OUTCOMES. There are 4 outcomes total when it comes to 2 kids and their genders; BoyBoy, BoyGirl, GirlBoy, and GirlGirl, NOT BB, BG, and GG. The reason is because categorizing all couples with 1 boy and 1 girl is far too broad of a brush, and fails to properly define the number of possible outcomes. It would be like me saying "there's a 50% chance of me winning the lottery, either I will win or I will lose in 2 outcomes" when in actuality, there are much more outcomes where I will lose than I will win. Rock8591 (talk) 06:18, 9 July 2009 (UTC)

Actually there *are* two possible outcomes at winning the lottery (win and lose), they're just not equiprobable. ;-) Diego (talk) 08:42, 9 July 2009 (UTC)

"Winning" the lottery is not an outcome in any sense, save for the social-construct sense. Is getting 1 number right out of 6 winning? Or is getting all 6? See what I'm getting at? There are not 2 possible outcomes of a lottery, mathematically; there are MANY combinations of numbers you can pick; simply because you can socially construct and CATEGORIZE "2 outcomes" as "winning" and "losing" MEANS NOTHING. Rock8591 (talk) 03:25, 29 July 2009 (UTC)

Independence

both questions are the same.

two independent elements can have one of two values each.

in the frist case, one element is "the older child" being "boy", with the question of the probability of the independent, second element "younger child" being "girl".

the second case, one element is "the identified child" being "boy" with the question of the probability of the independent, second element "other child" being "girl"

in both cases the form of the question is identical. in the second question there is no reason to seperate 1 boy, 1 girl into 1 boy 1 older girl and 1 boy 1 younger girl -- it is only the unrelated first question that makes this seperation seem justified.

Also, the sample space is not correct. let character 1 be elder/younger, and let uppercase signify "identified".

Set = {Bb, bB, Bg, bG, Gb, gB, Gg, gG} each with equal probability.

in problem 1, we are left with: Bb, Bg, and asked for P(Bg) in problem 2, we are really left with: {Bb,bB,Bg,gB}, and asked for P(Bg) + P(gB)

these are just 2 disproofs of this "paradox".

You are making the very common mistake of not realizing that many of the elements in your sample space are degenerate. For any couple that has had two children, there are four equally likely possibilities - they could have a boy and then another boy (probability 1/4,) a boy and then a girl (again 1/4), a girl and then a boy (again 1/4) or a girl and then another girl (again 1/4). That is the only sample space they you need to concern yourself with. Your elements Bb and bB are really just degenerate cases of the element "they had one boy and then another boy", and Gg/gG are degenerate representations of "they had a girl and then another girl". Your elements bG and Gb are both really just cases of "they had a girl and then a boy."

Let’s assume that the parents will name their first boy Bob and their second boy Tom, and will name their first girl Jane and their second girl Jill. This should make it more clear that there are only four possible scenarios, each of which has a possibility of 1/4:

-Bob has a younger brother Tom

-Bob has a younger sister Jane

-Jane has a younger brother Bob

-Jane has a younger sister Jill

If you try to add any more elements you will only succeed in rephrasing one of the already-existing elements, and will say something like “Tom has an older brother Bob.” However, that element is already accounted for in the original set.

Questioning

Well, this page is busy today. The recent addition is mistaken; the intent of the question is not to identify any boy at all. I can get a reference on that, but for now I'll revert and clean up the language. Melchoir 19:46, 5 April 2006 (UTC)

Small Change

I made a small change to the article to try and clear up a common problem. (One I had myself once.)

This doesn't seem to work

Given two variables, U and K, each of these being a children, one (U) whose gender is unknown to us, and one (K) whose gender is known to us. This is the only difference between U and K ; U may be older than or younger than K without any relevance to the case.

U can be either a boy, or a girl. K can be either a boy, or a girl.

The following combinations are possible.

U is a boy, K is a boy. U is a boy, K is a girl. U is a girl, K is a boy. U is a girl, K is a girl.

Since we know the gender of K (in this case, let's assume K is a girl), two possibilities are eliminated : the two possibilities where K is a boy. That leaves us two possibilities, one of which has U as a girl, and one of which has K has a boy. Note again that ages is of absolutely no relevance here : U can be either older than or younger than K without changing a thing.

Essentialy the problem appears to be the mistaken notion that, because you don't know the ages of the children, both G-B and B-G are possible. However, this is false : if both G-B and B-G are possible, then G-G need to appear twice : once for G-G where the girl whose gender is known to us is the first G, and one for G-G where the girl whose gender is known to us is the second G.

Just my two cents.--Damian Silverblade 17:44, 13 April 2006 (UTC)

False. G-B and B-G are different, equally likely events that lead to a couple having one boy and one girl.

A couple could have a boy and then a boy, a girl and then a boy, a boy and then a girl, or a girl and then a girl. G-B and B-G are both valid because they both represent valid possibilities. If you had double elements for B-B, you are implying that there's more than one way for a couple to end up with two boys. In fact, there isn't - the only way for a couple to end up with two boys is to have a boy and then another boy. However, there is more than one way for a couple to end up with a girl and a boy - they could have a girl and then a boy, or they could have a boy and then a girl. That is why you have separate elements for B-G and G-B, but only one element for B-B and G-G.

Agreed - as I pointed out earlier how the question is phrased makes all the difference, there is a very strong argument that as the question is *currently* phrased the correct answer is 50/50.

Also agreed (or phrased alternately, the three possibilities listed in the sample space are not equally likely). This article is ridiculous. Can we make a motion for deletion? Topher0128 02:58, 12 July 2006 (UTC)

The article may be phrased badly, but it is indeed a well-known example, and I've seen it discussed in a book describing probability problems from a cognitive perspective. For now, I'll simply add the {{unreferenced}} tag. But it's not a joke. Melchoir 03:04, 12 July 2006 (UTC)

The phrasing of the first formulation of the question says nothing about the set of Boy/Girl being ordered, and yet the sample evaluation assumes that it is important. The maths are correct just for a different formulation. The second formalization does explicitly order the children(older/younger). The first should read something like

In a two-child family, one child is a boy. What is the probability that oldest child is a girl?

and the samples would be oldest then youngest {BB, BG, GB, GG} GG is not possible, since one is a boy so three possibilitys remain, {BB, BG, GB} The girl is the oldest in only 1/3 prob.

In a two-child family, the older child is a boy. What is the probability that the younger child is a girl?

This one is described correctly. Domhail 04:03, 12 May 2006 (UTC)

Rules of Conditional Probability?

Precisely. For example in a tree diagram: the way the question is phrased, "In a two-child family, one child is a boy. What is the probability that the other child is a girl?" , we are surely not only excluding the G-G branch from the conditional tree, we are also assuming that the G-B and B-G trees are identical, as the order in this question is irrelevant; therefore these branches should be merged and the answer remains 1/2. —The preceding unsigned comment was added by 15:34, 5 August 2006 (talk • contribs) 89.145.196.3.

That is not a valid line of reasoning. You might as well say that the probability that a two-child family has two boys is 1/3, since there are three possibilities and the question does not care about order. In probability, there are no rules that deal with such notions as irrelevance and "merging branches". Melchoir 16:59, 5 August 2006 (UTC)

"In a two-child family, one child is a boy. What is the probability that the other child is a girl?" Technically, the answer to this question depends on how the information "one child is a boy" was obtained. A precise, unambiguous question would be:

Suppose that, from all families with exactly two children who are not twins, you select one parent at random, and you ask the parent: "Is at least one child a boy?" If the parent answers, "Yes," what is the probability that both children are boys?"

I posted this version in the article on 4 Dec 2007, and it was deleted the same day by Dorftrottel. Italus (talk) 23:34, 4 December 2007 (UTC)

Coin Examples

(A) If I throw 2 coins and let you see one, have I given you any information about the 2nd (hidden) coin? - Obviously not, its probability of heads or tails remains .5/.5

(B) If I throw 2 coins, and I look at them, you ask me is there at least one head, and I answer truthfuly 'Yes' and show you that coin.

We've now arrived at the subtle (and counter-intuitive) case where the probability that the other coin is a tail is now 2/3. The reasoning is explained in the article page and can be verified using a simple computer program (or indeed throwing coins yourself)

But again the atual 'questioning step' is critical in differentiating (A) & (B) and Missing from the article page.

--Pajh 21:33, 21 April 2006 (UTC)

Ok, I just ran through several thousand simulations of B: two random tosses.
check if either of the two is a head.
in cases where one is a head, check if there is a tail present.
the probability comes out as 0.50

If you disagree with my method, can you correct me?

--Wes , 26 September 2006

What happens if you increase the number of coins in your program to ten? If you still get 0.5 I think it's something wrong with your program. If you get something else please make a diagram of all your found probabilities with two coins up to ten coins. Does the diagram make sense? INic 20:53, 26 September 2006 (UTC)

You are committing the same fallacy as the people who believe that there is a 2/3 possibility that the other child is a girl. You are seeing three distinct categories (heads and tails, heads and heads or tails and tails) where there actually are four categories. So of course nothing is wrong with the man's program; it shows a tail in half the cases where the other emblem is a head because that makes sense.

The four possible sequences are:

1. First head, then head.

2. First head, then tail.

3. First tail, then head.

4. First tail, then tail.

Using the logic for supporting the claim that "it's two chances out of three for a girl", then there would be a 2/3 chance that both a head and a tail are present. But if you look at the list, you can see that this isn't true; there is a 1/2 chance for both a head and a tail and 1/4 of a chance for two heads and two tails, respectively. That's because "head, then tail" and "tail, then head" are two different sequences although the result, when looking at the coins, is the same. Note that the man said: "head present", not "head first" or "tail first". Those are separate categories which together make up exactly 50% of the results. Oh, and by tthe way, I tossed coins 100 times, and the result was 54 for head and tail or tail and head versus 46 for two of the same. If I had continued to 200, I have no doubt the result would have moved even further to 50-50. Ojevindlang (talk) 19:16, 27 January 2009 (UTC)

No, he's not, there is probably something wrong with the program. Using your own diagram, the program should select only the first three sequences. Of those three equally likely ones, the last two have a tail as the other flip, the first has a head. Baccyak4H (Yak!) 19:40, 27 January 2009 (UTC)

You say the program should select only the first three sequences because otherwise it doesn't give the result you desire. But the fact remains that "first head, then tail" and "first tail, then head" are two distinct sequences. Omitting a sequence to get the desired result is tantamount to fiddling with the analysis. Incidentally, if one only chooses the first three combinations, then one gets only two categories with a 50% chance each: head and head versus head and tail. If, however, we speak of sequences instead of categories, then all four sequences must be included. And that also gives a 50% probablity for the second coin toss giving a head and a 50% probability for a tail.[[Ojevindlang (talk) 19:49, 27 January 2009 (UTC)

No again, otherwise it doesn't give the right answer. The statement of the problem is that we asked if there was a head, and the answer was "yes". The tail/tail possibility has been ruled out, once we know there is a head. I am not sure why you mention head/tail or tail/head, as neither of those is ruled out by either answer. Only the tail/tail/ sequence is ruled out. Baccyak4H (Yak!) 19:58, 27 January 2009 (UTC)

Exactly, it doesn't give "the right answer". That's because the probabilities for the four sequences are constant, and the fact that one sequence is half done doesn't suddenly change the probability for what the second half of the sequence will be. One last time: there is no way a coin has a 2/3 probability of coming up head instead of tail - ever. This is a false paradox, like the one about Achilles and the tortoise. And computer programs (or my personal experiment tossing coins 100 times) are not invalidated by your refusal to believe in the results they give.Ojevindlang (talk) 20:09, 27 January 2009 (UTC)

Ah, there is the source of your confusion. In actuality, you aren't given any information about having the first or second half of the sequence. If you were, then you'd be right. But read the question above: "is there at least one head". That is exactly why it's paradoxical: intuition is used to one-then-the-other type reasoning, but here it doesn't apply, so intuition is wrong. Baccyak4H (Yak!) 20:30, 27 January 2009 (UTC)

No, I am not confused. Coins have no memories of recent events. If you flip a coin and it comes up head and then flip it again, the likelihood for a head or a tail are both exactly 0.5. Intuitition has nothing to do with it.Ojevindlang (talk) 04:10, 15 February 2009 (UTC)

Java Code solution

public class Coins {
    
    public static final boolean HEADS = true;
    public static final boolean TAILS = false;
    
    public Coins() {
    }
    
    public static final void simulate() {
        
        java.util.Random generator = new java.util.Random();
        int pairheads = 0;
        int tailpresent = 0;
        for (int count=0;count < 10000; count++) {
            
            boolean coin1 = generator.nextBoolean();
            boolean coin2 = generator.nextBoolean();
            
            if (coin1 == HEADS || coin2 == HEADS) { // At least one head
                if ( coin1 == HEADS && coin2 == HEADS) // Both heads
                    pairheads++;
                else
                    tailpresent++;
            }
            
        }
        int total = pairheads + tailpresent;
        System.out.println("Pairs of heads = " + pairheads );
        System.out.println("Tail present = " + tailpresent );
        System.out.println("Total = " + total);
    }
    
    public static void main(String[] args) {
        simulate();
    }    
}

Output

Pairs of heads = 2492
Tail present = 5034
Total = 7526

--Pajh 10:13, 28 September 2006 (UTC)

YES BUT NO

The 2-child family may be either : 2 boys (p = 1/4), 2 girls (p = 1/4), 1 boy and 1 girl (p = 1/2).

The rule is :

p (A / B) = p (A and B) / p (B)

Probability that A is true if B is true = Probability that A and B are true at the same time / Probability that B is true

The statement for A is clear :

A = "One child is a girl"

There are 2 different statements for B :

Case 1. B = "(I know one of them,) he is a boy" -> p = 1/2 (probability for him to be a boy)

Case 2. B = "(I know that) at least one of the two of them is a boy" -> p = 3/4 (probability that a 2-child family has at least one boy)

Thus, 2 different statements for (A and B) :

Case 1. (A and B) = "The one I know is a boy, the one I don't know is a girl" -> p = (1/2).(1/2) = 1/4

Case 2. (A and B) = "One is a boy, one is girl" -> p = 1/2

And 2 different results :

Case 1. p (A / B) = (1/4) / (1/2) = 1/2 (= p (A) actually, B has no influence)

Case 2. p (A / B) = (1/2) / (3/4) = 2/3

Case 1 sounds ok to me. In case 1, we don't need to know that it's a 2-child family. I still find Case 2 very disturbing. The thing is : a simple, almost automatic, deduction leads from a "Case 1" B to a "Case 2" one. I wonder, is it good to know too much about something ?

--[Strahd] 5:35, 12 August 2006 (Orléans, France)

I guess the simple answer is that you can't use deduction in these problems. Melchoir 17:29, 12 August 2006 (UTC)

I agree --[Strahd] 19:57, 12 August 2006 (France)

YES BUT NO (II)

Trees !

Case 1 :

First, the child we know, then the other child : BB, BG, GB, GG.

The child we know is a boy : BB, BG.

The other one is a girl : BG.

Probability : 1/2

Case 2 :

First, the elder, then the younger : BB, BG, GB, GG.

One child is a boy : BB, BG, GB.

The other one is a girl : BG, GB.

Probability : 2/3.

In Case 2, I'd use the word "frequency" rather than "probability".

--[Strahd] 6:17, 13 August 2006 (Orléans, France)

Rebuttal to Solution

Looking at the original problems (and original solutions), I have no difficulty understanding the first scenario. In the first problem, time is introduced as an initial condition. It says 'the older child is a boy'. Therefore it is acceptable to list the scenarios as:

(BB, BG, GB, GG)

This is because (using time only) the possibilities are: 1. a boy was born and then a boy was born 2. a boy was born and then a girl was born 3. a girl was born and then a boy was born 4. a girl was born and then a girl was born. Since you are acknowledging that the first child is a boy, you select the scenarios with B as the first letter, and that leaves you with a 50/50 chance of the second child being a girl.

For the second scenario, you have eliminated time from the initial condition (no one knows whether the boy is older or younger). You have two choices here, either to introduce time, or keep everything timeless. If you introduce time, you have the following scenarios (as previously used, the first letter will be the older child, and the letter that is upper case is the child that has been randomnly mentioned in the statement 'has at least one...'):

(Bb, bB, Bg, bG, Gb, gB, Gg, gG)

The reason there are 8 scenarios here instead of 4 is because of one thing. It is because of the words 'has at least one...'. Since in problem 2 we are randomnly pointing to one child, we have to include all possibilities of pointing as well as timing, which increases our scenarios from 4 to 8. The reason this was not done in problem 1 is because we already knew what we were pointing at (the first child is a boy). We could use these same 8 scenarios in problem one, but the information from the problem simply reduces it in the same way to the solution 50/50.

Back to problem 2: After selecting the scenarios where there is an uppercase B (as mentioned in the problem), we are left with:

(Bb, bB, Bg, gB)

This leaves the chances of having a girl at 50/50, which is counter to the original solution to problem 2. If you wish to keep time removed from the situation, then the order of birth does not matter. Therefore the highest possible scenarios are that the parents will have 2 boys, 2 girls, or one of each.

(BB, BG, GG)

Again, since problem 2 says 'has at least one boy', we must remove the scenarios without a B, and so we are left with:

(BB, BG)

This again shows that the solution to problem 2 is 50/50, but is reached by keeping the problem independent of time. In conclusion it is important to note that the difference between the two problems is the inclusion of time and the inclusion of acknowledging a random child, which then dictates how you approach the solution. Either always include time, or never include it, but mixing it up will give you skewed answers, such as the 2/3. For example in the original solution to problem 2, only one selection was given for BB. This is an error because the problem says 'has at least one boy', which could be addressing either the first boy, or the second boy. Therefore you must include those possibilities in your scenarios, giving us the 8 listed above. This is why the original solution to problem 2 is flawed. AFpilot157 12:34, 31 October 2006 (UTC)

Hi AFpilot157. I'm glad you are thinking about these problems for yourself. Working on them is exceptionally helpful, and will increase your understanding of probability. However from the point of view of the encyclopedia I can assure you that the solution given in the article is the correct one. There are many sites that discuss mathematical probability and I would strongly suggest that you post your comments on one of those. They will be more than willing to discuss them. DJ Clayworth 21:10, 31 October 2006 (UTC)

Thank you for your response. Yes, I have studied probability at times, and I just thought I would throw my two cents in on this interesting problem. In all honesty, I would have no problem accepting the fact that I am wrong, but I truly would like to see in what way I am wrong. I know it is a long shot, but is it possible for the encyclopedia to be wrong? All I ask is for a counter argument, one that takes it all into perspective (which is interesting, because probability at times can change based on perspective). To further illustrate my point for problem 2 of the Boy or Girl scenario, I will go into more detail into what I mean by my solution. If we say that 'a family has 2 children, and has at least one boy' then we have to either introduce age into the scenario (giving us 8 possibilities), or remove age from the scenario (giving us 3 possibilities). In either case, the end result becomes 50/50. My question is about the following statement from the original solution:

"The main reason is that the second question does not assume anything about the age of the boy, he might be the older and he might be the younger sibling. Therefore the loose thought that there are only 3 possibilities (2 boys {BB}, 2 girls {GG} or a mix) does not take into account that the latter is twice as likely than the formers, because it can be either {GB} or {BG}."

Why is the latter (one of each) more likely than BB or GG? By accepting both orientations (GB and BG) is that not the same as putting them in order of birth? Since the children had to come one after the other (assuming no twins), you are saying that either the first one is a boy and the second one is a girl, or the first one is a girl and the second one is a boy. By doing that, are you not introducing age (time) into a problem that does not have age or time associated with it? I can see how many people would think that 'one of each' is twice as likely as 'two of a kind', but I believe there is a flaw in that thinking. The flaw is that we cannot look at this scenario like we look at genetics of plants and animals. Going back to an old highschool lesson about tall plants and short plants, they always talked about TT, TS, ST, SS, when it comes to genetics, and the probability of getting a particular combination. The thing here is that when it comes to genetics, the tall and short traits joined all at the same time, with no particular order in which one came first. TS and ST were twice as likely as TT and SS, because everything happened all at once, not in a linear sequence like the birth of children (also, both parents of plants and animals contributed a T or an S to the offspring, unlike the B or G of children). Since time is naturally ingrained in this problem of boy and girl (because they occur one after the other), if we want to truly remove time from the situation, there really are only 3 choices: 2 boys, 2 girls, or one of each. If you take the set listed in the original solution (BB, GB, BG), the words come out as: (one boy and then one boy, one boy and then one girl, one girl and then one boy). No matter what, you cannot truly remove time from the situation (without reducing yourself two the 3 sets I listed in my solution), because the birth of children (which is sequential in this problem) is a fixed trait of the problem. AFpilot157 21:07, 31 October 2006 (UTC)

It doesn't really mattter what variable you use to distinguish the two genders (time or otherwise); the time of birth is simply a tool used to apply the calculus of conditional probability to find the correct answer. It turns out that to enumerate the sample space you will have to make some distinction anyway, somehow, in the sense of writing {BG} states that B comes after"{" and G comes before "}". Seems pedantic (maybe it is; I could do better with more time maybe), but try simulating the experiment: flip two coins 50 (say) times. You will get about twice as many flips where the two coins differ than when they are both heads (or tails). Here note that the "times" the two coins become readable is arbitrary, and can be assumed to be irrelevant. DJ Clayworth was right; the explanations here are, as counterintuitive as they may seem, correct.

For an example which operates under the same principal, consider a deal in the game of bridge. The game typically has some suit distribution among the four players where each has between 1 and 5 cards of any suit in their hand. No one would a priori be suspicious of that, yet the same logic that says that two boys is one outcome as is one of each gender, and no probabilistic difference can be inferred, would have you believe that a perfect deal (each player gets 13 cards of the same suit) is just as likely as whatever distribution of cards one typically sees. But do we ever see a perfect hand?Baccyak4H 03:37, 1 November 2006 (UTC)

I now understand exactly what you mean with your coin flip example. It is true that the solution seems counterituitive, but it becomes easily acceptable once it is understood. It is only counterintuitive because we at first glance do not see a difference in what problem 1 and problem 2 is asking us. The fact is that there is a difference, however slight, in the wording that changes the whole meaning of the problem, and doing the mathematics/experiment can aid us in understanding that. AFpilot157 07:00, 1 November 2006 (UTC)

In your notation why do you have bG and Gb at the same time as separate combinations? If bG is the scenario where the girl is the eldest how is that different from Gb? —Preceding unsigned comment added by 124.170.46.79 (talk) 12:47, 7 September 2008 (UTC)

  Capital letter means the child you know of, order is birth order. so bG means
  you know of the girl, and she is the second born. Gb means you know of the
  girl, and she is the first born. Omegathejack (talk) 15:36, 9 January 2009 (UTC)

Two-stage game

There is nothing odd about this. It is basically just a two-stage game. In the first example the outcome at stage 1 is still to be determined. Hence the outcome is 1/2. In the second example the outcome at stage 1 has been determined hence reducing the set of possible outcomes. What happens at stage 2 is independent (or assumed to be in the paradox) of what happened at stage 1. The confusion arises from the fact that the puzzle does not mention in which order the boy and girl is born. Assuming that the boy had been born first the possible outcomes at stage 2 would be {BB} or {BG} which would result in a 1/2 probability just as most people would assume. It is a trick question, the secret lies in the fact that the order has not been determined. It is as easy as that. No need for any long explanation. This is not original research, it is a well-known fact from simple game theory. It is a classic example of the difference between a 1 stage game and a 2 stage game. I am removing the original research tag. MartinDK 10:18, 10 November 2006 (UTC)

What's actually going on

The first situation: In a family with 2 children, the older one is a boy. So, you setup the list using the variable B first, to represent that he is older. Your possibilities are then {BB BG}. Therefore, there is onle a 1/2 probability that the other child will be a girl.

The second situation: In a family with 2 children, one is a boy. Now, you have to set this list up the same as in the previous situation, or there is no way to compare them. Lets first look at all the possibilities: {BB BB BG GB GG GG} Why are there two {BB}'s and two {GG}'s? Because you know the gender of the child but not the age. You can have two boys, but there are two possibility's when you have that. The boy that is known can be older, or he can be younger. And since you don't know, you have to consider both of the options. Since you know that one of the children is a boy, you can eliminate the two {GG}'s. This leaves you with {BB BB BG GB}. The probability that the other child is a girl, is 2/4, which is the same as 1/2. —The preceding unsigned comment was added by 168.103.88.39 (talk) 00:15, 9 December 2006 (UTC).

In analyzing the first situation, you have defined the first letter to be the oldest child. But you have violated that definition in the second situation by allowing one of the two "BB"s to have the youngest first. So you're analysis is invalid (and in fact wrong): there should only be one "BB" in the second scenario, but we do not know whether the first (oldest) or second (youngest) was referred to. Thus the answer to the second situation is 1/3 and not 1/2. Baccyak4H (Yak!) 02:34, 9 December 2006 (UTC)

Actually, no (in response to the immediately preceding response from Baccyak4H). Consider tagging the second situation X_s to denote which specific individual was seen. This gives the set of possible combinations: {BB_s B_sB B_sG GB_s GG GG}. Note that the only two options that have no _s must be removed; they are inadmissable because the one thing we really do know is that we've seen something. This leaves {BB_s B_sB B_sG GB_s}. And exactly half of these have girls. P(G)=1/2. Al —The preceding unsigned comment was added by 82.41.204.120 (talk) 15:16, 13 December 2006 (UTC). UberPuppy 15:24, 13 December 2006 (UTC)

Amendment of the above, for completeness: technically {BB_s B_sB B_sG GB_s GG GG} should be {BB_s B_sB B_sG GB_s GG GG GB BG} where you DID NOT see the boy in the last two, but they are necessary for the identical question asked from the perspective of seeing a girl. In other words, before you know which gender you saw, there are actually 8 candidate combinations, and then the 4 inadmissable ones are removed once you know which gender you saw. UberPuppy 15:24, 13 December 2006 (UTC)

(changing your notation for convenience) {BB_s B_sB B_sG GB_s} is a correct enumeration of the sampling space. But in doing it that way, the four elements no longer have equal probability; the first two have exactly half the probability of either of the last two (remember {BB BG GB} have equal probs; you just split BB into equal halves). If you add up the relevant probs, you will get 1/3. Baccyak4H (Yak!) 15:45, 13 December 2006 (UTC)

A correction to the "Mistakes" section

The following is a quote from the mistakes section:

The error here is that the first two statements are counted double. We do not know which brother is the older, as that was not stated in the question. Call the brothers Tom and Harry.

1. Harry has an elder brother Tom

2. Tom has a younger brother Harry

The second statement repeats the first and therefore should be removed.

The logic is flawed here. It assumes that whoever was making the argument knew the ages when making the argument, which makes no sense. I have a better version of that argument:

In this situation, let's call the known-male child Jeff and call the unknown child Pat. There are four possibilities:

Jeff has an older brother named Pat

Jeff has a younger brother named Pat

Jeff has an older sister named Pat

Jeff has a younger sister named Pat

Each is equally likely to happen, and the "statement that repeats the first and needs to be removed" is nowhere to be found. Therefore, it can only be half. E946 12:12, 4 January 2007 (UTC)

Fundamental flaw

Although I understand both parties' reasoning, the following fundamental flaw exists in the problem:

Let us work from the assumtion that there are four possible permutations for the siblings' gender: BB BG GB GG

According to the most simple interpretation of the problem, if you go to a park and ask boys if they are from a two-child family, those answering 'yes' have a 2/3 chance of having a sister, because only the GG probability is eliminated.

Now, by asking the boy whether he is the oldest or youngest child, his chances of having a sister change to 1/2, regardless of his answer, because either BG or GB will be eliminated along with GG.

While both reasonings are mathematically correct, only te 1/2 solution has any bearing on reality. —Preceding unsigned comment added by 196.216.16.10 (talk • contribs) 14:52, February 22, 2007

Nope, the 1/2 solution has no basis in reality. The arguments presented are clear. — Arthur Rubin | (talk) 15:23, 22 February 2007 (UTC)

In the park version, the correct answer is 1/2, even before you ask about age. When we list the four possibilities BB BG GB GG, we assume there is a way to define which child we mention first, and which one we mention last. In the original problem, it's natural to do this based on age. In the park problem, it is more natural to mention (say) the boy you ask first, and his sibling last. Then, of the four possibilities BB BG GB GG, two can be discarded as soon as we observe the gender of the boy, leaving us with BB and BG, and a 50% chance the sibling is a boy. The important thing is symmetry arguments, and there is no symmetry argument making 196.216.16.10's three possibilities BB BG GB equally likely in the park version.

If someone can express this more clearly, I'll be very happy to read it here!--Niels Ø (noe) 19:50, 22 February 2007 (UTC)

I think the fundmanetal problem is that the 1/2 people are looking at thiat individual family, while the 2/3 people are trying to look at two-child families as a whole. That other chold has a 1/2 chance of being a girl no matter how many other families there are being tested. E946 03:53, 7 March 2007 (UTC)

Simple Explanation of Flaw in Logic

There is a 50% chance that the child who is known to be a boy is either older or younger. This represents itself as [B,x] or [x,B]. There is a 50% chance that the unknown child is a girl. This represents itself as ([B,g] or [B,b]) or ([g,B] or [b,B]). There are two equal chances for there to be two boys. They do not equal one another and one should not be thrown out. The Harry statement should be as follows:

1. Harry has an elder brother Tom

2. Harry has a younger brother Tom

These two statements are not the same and should not be thrown out.

Here is a Python script that correctly calculates the percentage that will be girls.

def boyorgirl():
	import random
	# Boy == 1
	# Girl == 0
	boys = 0
	girls = 0
	pairsOfChildren = 10000
	for k in range(pairsOfChildren):
		children = [-1,-1]
		boyPosition = random.randrange(0,2)
		if boyPosition == 0:
			otherPosition = 1
		elif boyPosition == 1:
			otherPosition = 0
		children[boyPosition] = 1
		children[otherPosition] = random.randrange(0,2)
		if children[otherPosition] == 0:
			girls += 1
		elif children[otherPosition] == 1:
			boys += 1
	print "If one child is a boy, what is the likely hood that the other child is a girl?"
	print "Testing a sample of " + str(pairsOfChildren) + " pairs of children reveals that " + str(girls) + " will be girls and " + str(boys) + " will be boys." 
	print "The percentage of girls is " + str(float(girls)/(float(girls) + float(boys))*100.) + "%."

AlexEagar 02:57, 21 March 2007 (UTC)

I agree the explanation in the section "Mistakes" in the article is rather weak, but I cannot see what you want to replace it by, or whether you want to change the conclusion too. I do not read Python. You may have misunderstood the logic in that section: We cannot give "the boy" a name - we're not told about a particular boy in this problem; we're just told they have at least one boy. I will not vote below; I think you should clarify the change you have in mind.--Niels Ø (noe) 09:48, 21 March 2007 (UTC)

Either you take into account the position of the other child in which case you have to take into account the known child {Bg, Bb, gB, bB} or you don't take into account the position of the other child {Bg, Bb} but you can't just choose to take into account the position of the other child if it is a girl, but not if it is a boy {Bg, gB, Bb}. If you are going to allow for {Bg, gB}, you have to also allow for {Bb, bB} which are not the same. Really the age doesn't matter. There is one position that is already filled with a boy and there is one position open for either a boy or a girl and that second position has a 50% chance of being either a boy or a girl regardless of who is older or younger. And that's not mentioning the fact that the worldwide ratio of boys/girls is 1.05 at birth and for children under 15 and 1.03 for people 15-64 which further throws off any estimates of actual children. And if you are not comparing children, but simply the statistics, then there is no question about age, or which occurred in which order, all you know is that given two possibilities {x, x}, one is known, be it {B, x} or {x, B}, which is really the same, and one is not known. Given one variable at 50%, you only have 50%. All the possibilities or children if you don't take age into account are {BB, BG, GG}. This is the same as {BB, GB, GG} because age is not a factor, thus which came firs is not a factor. Take out the GG and you've got {BB, GB} or {BB, BG} which is the same thing because age is not a factor. Only existence is a factor. The only way for age to become a factor is to state the age relative to another child. So long as no age has been stated, it is not a factor, only existence is a factor. The problem states that there exists two children and one is boy. Thus there are two possibilities with no relation relative to position. One of the possibilities is taken, thus there is only one possibility available. -- AlexEagar 17:56, 23 March 2007 (UTC)

Wrong. In the original statement of the problem, we don't have a known child for that analysis to work. We know that one of the children is a boy, or equivalently, we eliminate GG from {BB, BG, GB, GG}. — Arthur Rubin | (talk) 18:53, 23 March 2007 (UTC)

Vote to Change Logic in this Article

I propose that the logic of this page be changed. Voting will be held from 03/20/2007 to 03/27/2007. The change, if approved, will be made on 03/28/2007. Vote by entering:
~~~~ Your Title or Position<br />

Clarification of the change to be made as requested by Niels Ø: When I first posted my explanation, my opinion was that the 2/3 answer should be completely removed. But although I still don't agree with the answer, perhaps it would be best to say that there are two answers to the second question. One answer is that there is a 2/3 chance that the other child is a girl because the choices are {BG, GB, BB} and the other answer is that there is a 1/2 chance that the other child is a girl because the choices are {Bg, Bb, gB, bB} where the capital B is the boy who is known to be a boy. The 'Conclusion' and 'Mistakes' sections should be combined into a 'Opposing Views' section where both the 2/3 and 1/2 perspectives are compared. Let the proponents of each view state their arguments. The reader can decide whether the question is truly paradoxical such as the 2/3 answer suggests or whether it is not a paradox such as the 1/2 answer suggests. AlexEagar 16:43, 22 March 2007 (UTC)

All Those In Favor of the Change

~~AlexEagar 02:57, 21 March 2007 (UTC) Software Engineer~~

All Those Who Oppose the Change

Niels Ø (noe) 21:28, 22 March 2007 (UTC) -- there is only one correct solution, as presently given in the article.
— Arthur Rubin | (talk) 21:34, 22 March 2007 (UTC). (Use of "titles" or "positions" is frowned upon here in Wikipedia, but I have a Ph.D. in mathematics and have recently been working on some complicated statistical problems.) There is only one correct solution to the problem as presented. If the the elder child is known to be a boy, the answer clearly becomes 1/2.
AlexEagar 19:26, 23 March 2007 (UTC) Software Engineer -- Ok, I change my vote. I just changed how I'm calculating the percentage and sure enough, it is 2/3 chance assuming a 1:1 ratio of boys to girls. And about 65.5% chance given the 1.06:1 boy to girl ratio of children as stated on the CIA Factbook [1]. I still don't like how you present the argument, I'll offer another explanation when I get a chance.

I Can Admit I Was Wrong

Here's some python code to calculate the 2/3 correctly.

def boyorgirl(isOneToOneRatio=True,pairsOfSiblings=100000):
	import random
	# Boy == 1
	# Girl == 0
	twoGirls = 0
	twoBoys = 0
	olderBoyYoungerGirl = 0
	olderGirlYoungerBoy = 0
	for k in range(pairsOfSiblings):
		pair = []
		for i in range(2):
			if isOneToOneRatio == True:
				if random.randrange(0,2) == 1:
					pair.append(1)
				else:
					pair.append(0)
				if pair == [0,0]:
					twoGirls += 1
				elif pair == [1,0]:
					olderBoyYoungerGirl += 1
				elif pair == [0,1]:
					olderGirlYoungerBoy += 1
				elif pair == [1,1]:
					twoBoys += 1
			else:
				if random.randrange(0,207) < 106:
					pair.append(1)
				else:
					pair.append(0)
				if pair == [0,0]:
					twoGirls += 1
				elif pair == [1,0]:
					olderBoyYoungerGirl += 1
				elif pair == [0,1]:
					olderGirlYoungerBoy += 1
				elif pair == [1,1]:
					twoBoys += 1
	print "If one child is a boy, what is the likely hood that the other child is a girl?"
	print "For every " + str(pairsOfSiblings) + " pairs of siblings, " + str(twoGirls) + " will be both girls, " + str(twoBoys) + " will be both boys, " + str(olderBoyYoungerGirl) + " will be split with an older brother and a younger sister, and " + str(olderGirlYoungerBoy) + " will be split with an older sister and younger boy."
	print "The percentage of pairs that have girls out of those that also have boys is " + str(((float(olderBoyYoungerGirl) + float(olderGirlYoungerBoy))/(float(olderBoyYoungerGirl) + float(olderGirlYoungerBoy) + float(twoBoys)))*100.) + "%."

AlexEagar 19:38, 23 March 2007 (UTC)

About the 2nd question and it's vagueness

Assuming simple 50/50 G/B probabilities...

A boy living in a two-child family has a 50% chance of having a brother, or in other words if one child is a boy there's a 50% chance of having a second boy depending on the selection method.

In even division {BB BG GB GG} situation it's easy to notice the simple fact that 50% of the boys live in BB families, while BG GB families do indeed outnumber the BB famlies 2:1 ie. there are two times more such (GB, BG) families, but those families have 50% less boys per family. In other words out of, say, a 600 two child families with at least one boy you'd get (approx.) 400 BG/GB families and 200 BB families with 400 boys living in both BG/GB and BB families.

Back to my original point...If the selection is done by:
A) Choosing a random boy living in a two child family (with at least one boy) there's a 50% chance he has a brother

Older child is a boy: BB, BG (but not GB or GG)

Younger child is a boy: BB, GB (but not BG or GG)

B) Choosing a random two-child family with at least one boy there's a 33% chance of a boy having a brother.

Older or younger child is a boy: BB, BG, GB

The article's question - A two-child family has at least one boy. What is the probability that it has a girl? - is bit too vague to give it a specific answer as it depends entirely on how the selection is made. For example, if you see one child of a two-child family and it happens to be a boy there's a 50% chance he has a brother (50% of boys live in BB families, case A: you 'choose' a random boy), on the other hand if you, for example, meet two child families you'll notice that 66% of those families with a boy also have a girl (BB vs BG & GB, case B: you 'choose' a random family).

- G3, 01:21, 27 April 2007 (UTC)

Here's an illustration of the problem, from a top-down view, let's suppose there are 4 two child families living in some small village with 4 boys distributed among them in a 'perfect' configuration.

Families with two children:

-Jones (BB)

-Parker (BG)

-Smith (GB)

-Walker (GG)

Boys living in two-child families:

-Jack Jones (BB)

-James Jones (BB)

-Jules Parker (BG)

-John Smith (GB)

Now, depending on which list you choose your two-child family with at least one boy from you get two different probabilities...

A) Choosing a random boy first results in 4 boys - Jack, James, Jules & John - of which 50%, Jack & James, have a brother.

B) Choosing the family first you get three families - the Smiths, Parkers and Jones - of which 66% have a boy and a girl configuration.

-G3, 11:44, 27 April 2007 (UTC)

Sample space makes sense

So, I am not a mathematician, nor have I studied logic or probability. The reason I understand this is because of how the first question explains the sample space. Our sample space consists of 4 possibilities (given certain assumptions of course, i.e. no twins, or intersexuals), BB, BG, GB, and GG.

Question #1: Since the older child is a boy we can reduce our sample space. This makes sense to most people, and of course this may be that the paradox is how people interpret a priori knowledge. I think that for the most part everyone is getting that part.

Question #2: Age no longer matters, only that one of the children is a boy. Our new sample space is reduced those sets (maybe the wrong term) that contain a boy, BB, BG, and GB.

What I am seeing a lot here is that people are not accepting the same sample space for the second question. I think that has to do with the way the article is written, because it immediately goes into people's wrong assumptions, and to be honest that was a little confusing to me. I had it, and then I began questioning it because I saw that people often got it wrong. As I was reading this discussion page Baccyak4H gave an example of tossing two coins in response to the Rebuttal to Solution above. I think that is a good way to expand on this in both questions.

We have two coins, both have a heads and a tails. They were minted in different years, and they always land flat.

Question #1: I flip two coins. The older coin is heads. What is the probability that the other coin is tails?

The possible outcomes of the two coins are (HH, HT, TH, TT). Even without the age of the coins taken into account, the possible outcomes are the same. Applying the age is also arbitrary, and this is important. If we decide that the older coin is the "first" coin is a set, then we get HH and HT. If we decide the older coin is the "second" coin we get HH and TH. It becomes apparent that TH and HT are not the same (I hope).

The first question is answered, but again, we mostly agree with that.

Question #2: Two coins are flipped, and at least one of them is heads. What is the probability that the pair contains a tails?

The possible outcomes for this are again (HH, HT, TH, TT). We don't worry about the age of the coins this time, so we just look at which pairs have a heads. We get HH, HT, and TH. TT does not have a heads. Of the three we have left (HH, HT, and TH), two of them have a tails.

I think a reason this is confusing can also be chalked up the first question preceding it without examining why TH and HT are not the same (we use age as a convenient way to skim over that). The first question, in the format given, appears to impress upon the viewer prior knowledge, and really just skews things.

Question #3: I flipped a coin, it is heads. If I flip another coin, what is the chance that it will be tails? Our sample set here has only two possibilities (HH and HT [assuming we denote the first H as being the heads, of course]).

I am not sure (meaning I can't talk with authority) about first stage and second stage games, but when INic says he is a man, he is eliminating the possibilities, since we are not determining the probability at the same time. It is very similar to the first question in that it gives us extra info to work with.

Hmmm, I think that is all in order. I am looking forward to corrections and such. I am going to think about how to make the article easier for people to understand, since I personally don't know a lot of what is being said here (I don't know Bayesian math). Also, what does everyone think of presenting the two questions in a different way, by means of formatting? Maybe give more explanations to the first question before jumping into the second one. Maybe even not assume that the majority of the viewers are wrong, which tends to rub people the wrong way. maiki 11:55, 1 May 2007 (UTC)

About your question #2: That depends on how the roll with a head is selected. Let's say you do 4 rolls with two coins: HH, HT, TH & TT

If you catalogue all rolls like this..

Heads - 1, 1, 2, 3

Tails - 2, 3, 4, 4

...and you choose a random heads flip there's a 1/2 chance it's in a HH roll

If you catalogue rolls like this..

Roll 1 - HH

Roll 2 - HT

Roll 3 - TH

Roll 4 - TT

...and you choose a flip with a heads you get that 2/3rds of such rolls also have tails.

The primary issue here is selection method:

-In case of a chance encounter - you flip a coin, you see a two child family with one child being gender identifiable (eg. the other child is at home) - the chance of the other party being of the same 'type' is 1/2: H* -> HT, HH or *H -> TH, HH.

-In case of selecting the favoured outcome first - at least one tails - and choosing the result from a list of results of wanted outcome - list of tails rolled - the chance of the other roll being the same is again 1/2: TT rolls have equal amount of tails to HT & TH rolls combined

-In case of selecting the favoured outcome first - at least one tails - and choosing the result from a list of results with wanted outcome - list of results with tails rolled - the chance of other roll being the same is 1/3: HT & TH rolls outnumber the TT rolls by 2:1.

More accurately, the question "A two-child family has at least one boy. What is the probability that it has a girl?" can only be accurately answered if we also know the answer to question "How do we know a two child family has at least one boy?" - While it is *absolutely* true that 2/3s of two-child families with boy also have a girl (assuming simple division) it is also equally true that 1/2 of boys in two-child families live with a brother (ie. another boy). - G3, 01:27, 3 May 2007 (UTC)

I disagree with the solution

I'm am not going to say that the answer "2/3" is incorrect - because it isn't. However, it isn't the best answer to the problem, either. It makes a hidden assumption that cannot be made in general, and that leads to a logical contradiction if applied unilaterally.

The problem statement I will work with is: "Mrs. Jones has two children, at least one of which is a boy. What is the probability that she also has a girl?" The solution presented in the encyclopedia article is that there are four possible 2-child families, based on gender and birth order: {BB}, {BG}, {GB}, and {GG}. We'll assume each is equally likely. The additional fact that at least one of Mrs. Jones' children is a boy means that one of those families, {GG}, is not possible for Mrs. Jones. Since the other three family types exist in equal numbers, and two of them include a girl, the probability is 2/3 that Mrs. Jones also has a girl.

As I said, that answer is not incorrect. But I'm purposely using a double negative, because not being incorrect in one case is not the same as being correct in general. Suppose another question is asked: "Mrs. Smith has two children, at least one of which is a girl. What is the probability that she also has a boy?" If you answer 2/3, following the same logic, then there is something incorrect in your solutions. The only answer that is consistent with the first solution is 0; that there is no chance that Mrs. Smith has a boy.

To see that it is wrong (or at least that there is something wrong if you assume it is right), ask a third question: "Mrs. Grey has two children. What is the probability that her two children have different gender?" This has a trivial solution: 1/2. But, if 2/3 is the correct answer for both the Mrs. Jones problem and the Mrs. Smith problem, and if I write the gender of one of Mrs. Grey's children on a piece of paper WITHOUT showing it to you, then you have to answer 2/3 for the Mrs. Grey problem. Regardless of what I actually wrote, the above logic - if applied unilaterally - says that probability that the other child has a different gender from what I wrote is 2/3.

The hidden assumption being made in the solution concerns how you decide which child to tell about when they have different genders. While the family types {BB} and {BG} may EXIST in equal numbers, that does not mean that for every {BG} family you will say "Mrs. Jones has at least one boy." But in order to get the 2/3 answer, you have to assume that in every {BG} or {GB} case you will say "Mrs. Jones has at least one boy." Which means that if you say "Mrs. Smith has at least one girl," I can be 100% certain that Mrs. Smith actually has two girls.

Oh, and using the answer 0 for the Mrs. Smith problem means that the answer for the Mrs. Grey problem is: P({BG} or {GB}) = P({BG} or {GB}|described a boy)*P(described a boy) + P({BG} or {GB}|described a girl)*P(described a girl) = (2/3)*(3/4)+(0)*(1/4) = 1/2. The right answer.

The best solution to the Mrs. Jones problem can be derived by including, in the solution, a factor for how you selected which gender to describe. There is a 1/4 chance that a random Mrs. X is from any of the four possible families: {BB}, {BG}, {GB}, and {GG}. If she is from a {BB} family, there is a 100% chance that she will be described as having at least one boy. Similarly, if she is from a {GG} family, there is a 100% chance that she will be described as having at least one girl. But for the remaining two family types, unless you make the unfounded assumption that one gender is favored over the other, there is a 50% chance that either description will be used. A {BG} family may exist as often as any of the others, but if our Mrs. Jones is from a {BG} family, only half of the time will she be described as having at least one boy. The other half, she will be described as having at least one girl. If she is so described, the odds of her other child being a girl will be P({BB})*0 + P({BG})*1/2+P({GB})*1/2 = 1/4*0 + 1/4*1/2 + 1/4*1/2 = 1/2.

This is the best answer for the problem. It isn't the only answer - because the question, as it is posed, is ambiguous. You can solve that ambiguity by making an assumption about how to choose whether tell about boys or girls. This answer is best, in the Bayesian Probability sense, because it doesn't assume you are biased toward boys or girls. - JeffJor 17:44, 8 June 2007 (UTC)

Do you suggest that for the Bayesian probabilist the correct answer is 1/2, but for the rest of us the correct answer is 2/3? If this is correct this would be a great test to determine if a person is a Bayesian or not. Very interesting. I'm not a Bayesian and I think 2/3 is the correct answer, so this test works for me at least. ;-) iNic 21:04, 8 June 2007 (UTC)

No. I'm saying that for the frequentist, the problem statement is ambiguous and there is no valid answer. That 2/3 can't be the answer to both the Mrs. Jones problem and the Mrs. Smith problem because, if it is, the probability of any two-child family having a boy and a girl must be 2/3. The only legitimate way to answer the quesiton is to take a Bayesian approach, and that answer is 1/2. - JeffJor 00:27, 9 June 2007 (UTC)

Very interesting. I make pretty much the opposite analysis. In a frequentist setting everything works fine, as it should. The correct answer is 2/3 for both Mrs. Jones and Mrs. Smith, of course. No contradiction arises because it's two different experiments that we can't mix. However, in a Bayesian setting it's easy to derive a contradiction because we don't know what the experiment is. Bayesians typically claim they don't care about experiments anyway. In an attempt to avoid the paradoxes that naturally arise when you don't know what the experiment is, the typical Bayesian start to analyze the psyche of the experimenter instead. This is what you do above. As long as you know how the psyche of the experimenter works in different situations you are all game. But as the experimenter obviously can have infinitely many states of mind, potentially, this problem isn't unambiguously defined unless the probabilistic preferences of the experimenter ("prior") is included as part of the problem statement. But in the way the problem is currently stated it's ambiguous and there is no valid answer—for the Bayesian. iNic 02:40, 9 June 2007 (UTC)

No. This issue does not involve "mixing" experiments, it involves how you interpret an ambiguous statement in the problem in each individual experiment. It revolves around the conditional probability that a mother of two has a a boy and a girl, given that she has at least one boy. Write this P(B&G|B).

By your arguments, a frequentist will say P(B&G|B)=2/3. That same frequentist will say P(B&G|G)=2/3. It is a well known, provable result that P(B&G)=P(B&G|B)*P(B) + P(B&G|G)*P(G). If both conditional probabilioties are 2/3, this reduces to P(B&G)=2/3*[P(B) + P(G)]. The contradiction does not depend on how you "mix" the probabilities P(B) and P(G), because no matter how you do it, they have to add up to 1.

The problem statement is ambiguous. Different experiments that all end up with the description "Mrs. Jones has at least one boy" can have different answers. If you met a Mrs. Jones a social function for parents at a boys' prep school, then 2/3 is the right answer. If it is a girls' prep school and Mrs. Jones mentions she has a son, the answer is 100%. If another Mrs. Jones simply tells you "My oldest child is a boy," the correct answer is 1/2. There are lots of different experiments. A fequentist CANNOT GIVE AN ANSWER without knowing how tit was determined that Mrs. Jones has at least one son. A Bayesian can, and that ansewert is 1/2. - JeffJor 13:09, 9 June 2007 (UTC)

No, the problem statement isn't ambiguous at all. At least not for a frequentist. The issue does not involve conditional probabilities as you think. It's much simpler than that. It is all about finding the sample space

\Omega

for the problem and we're done. I never said that P(B&G|B)=2/3 as you claim I did. I simply said that P(B&G)=2/3 on

\Omega

= {{B,G}, {G,B}, {B,B}} which is a completely different thing. And I also said that P(B&G)=2/3 when

\Omega

= {{B,G}, {G,B}, {G,G}}. If you read the Kolmogorov axioms for probability theory you will notice that they are only valid when you have a probability space defined, and one of the key components in a probability space is the sample space

\Omega

. This means that probability theorems are valid only within a specified probability space. To mix sample spaces (and thus probability spaces) in a formula the way you do above isn't allowed, and it's no wonder you get into absurdities when you do. iNic 02:30, 12 June 2007 (UTC)

I don't know how you can claim it isn't ambiguous, and at the same time claim the statement of the problem is not "What is P(B&G|B)?" That is how the problem statement "What is the probability that a two-child family has a boy and a girl, given that it has at least one boy?" is expressed in a formula. By denying the two are the same, you are admitting the ambiguity. Also, this problem has been called ambigous by many people, including Martin Gardner when he wrote about it his Mathematical Games column of Scientific American in May and October of 1959.

Yes, I know that probability theorems are only valid when you have a probability space defined for them. That's the ambiguous part here. Is the space described by the problem {{B,B}, {B,G}, {G,B}}, or {{B,B}, some subset of {B,G}, some subset of {G,B}}? The statement "A two-child family has at least one boy" is a necessary condition for that space to be {{B,B}, {B,G}, {G,B}}; but it is not a sufficient condition. It also desribes {{B,B}, some subset of {B,G}, some subset of {G,B}}. The exact makeup of the space you need to answer the problem can not be determined from the problem statement. You have to assume something additional - in this case, that the entirety of {G,B} and {B,G} are included.

Finally, although I breifly described the single experiment that leads to the contradiction before, I'll write it out more formally now. The probability that a two-child family has a boy and a girl, given no other information about it, is 1/2. Say I write down a gender that "at least one child" has, but don't show it to you. You have no additional information, and can't change your answer. It is still 1/2. But following the logic you have ascribed to, you could assume that if W is the gender I wrote down (you don't know what it is, but you know it is a discrete value), and N is the gender I did not write down, that the family is from the space $\Omega$ = {{W,W}, {W,N}, {N,W}}. If this assumption were true, the probability of a boy and a girl would be 2/3. But we know that probability is 1/2. Therefore, the assumption is not true.

I propose that this encyclopedia article be rewritten to reflect the fact that the problem is ambiguous, nad cannpot be made without assuming information not presented. It can include references to cases where peopel have made such assumptions, biut those are not true solution. - JeffJor 12:26, 16 June 2007 (UTC)

No, a conditional probability isn't what you think it is. P(A|B) simply means the probability we pick A from a sample space given that we picked B from the same space. Both A and B have to be events defined on the sample space. You claim that B = 'has at least one boy' is an event, but it clearly isn't. It's a boundary condition. We get two pieces of information that combined gives us our sample space ('at least one boy' & 'a two-child family') and none of these pieces can be treated as an event. You pick one of them as being a condition for your sample space ('a two-child family') and you treat the other one as an event. But why didn't you pick the last one as a condition and treated the first one as an event instead? Your sample space would have been all families with at least one boy, and you would condition on the "event" 'has exactly two children.' To use conditional probabilities in this way is of course nonsense but unfortunately a common mistake among Bayesians. If Martin Gardner at one time did the same mistake as you do here (as your comment implies) I feel sorry for him too. iNic 17:41, 17 June 2007 (UTC)

I honestly don't understand your talk about the subsets of {B, G} in your second paragraph. The subsets of {B, G} are {B, G}, {B}, {G} and Ø. They are of course all ruled out except {B, G} itself. iNic 17:41, 17 June 2007 (UTC)

In your third paragraph you make it very clear where your mistake is. You evidently do not discriminate between a boundary condition and an event (as explained above). If this is a general disease among Bayesians you might in fact have invented yet another paradox within Bayesian probability philosophy. But as this is WP:OR I'm afraid you can't add this discovery under the Bayesian solution section. iNic 17:41, 17 June 2007 (UTC)

I'm trying not to let this degenerate into name calling; but I know exactly what a conditional probability is. From the link you created, it is "the probability of some event A, given the occurrence of some other event B." The two events do have to be from a joint sample space; but B is not limited to being from the same subset of that space that A is from, which is what you are trying to do.

A "boundary condition," as you put it, is not as well defined a term as the others you have used here. If it were, I'm sure you would have linked to its definition like you did the others. But I can provide a definition I'm sure will satisfy you. It is some condition that defines a subset of one sample space to be used itself as a sample space. And since an event is a SET of outcomes, not just a single outcome - read the link you created - that subset can be called an event on the larger space. You are trying to work on the smaller space, and I am working the larger one that includes yours as a subset. In fact, the space I use is {{B,B}, {B,G}, {G,B}, and {G,G}}. It is called "the original sample space" in the article. If we can establish what set is meant by "a two child family with at least one boy," it clearly - by the definition you linked to - is an event in that space.

But, I agree you can call it a boundary condition that defines a smaller sample space. It is the exact content of that smaller sample space that is ambiguous, since it depends on how you applied that condition. If "at least one boy" is a proiri knowledge, then the probability we seek is 2/3. But if it is a posteriori knowledge, then we have to know how it was obtained. And the point of my example is that assuming it is a priori knowledge is what causes the paradox. Because to be a priori knowledge, you have to know that you are treating "boy" differently from "girl," and that is not clear in the wording.- JeffJor

"But why didn't you pick the last one as a condition and treated the first one as an event instead?" Because that doesn't help toward identifying a solution. It could be done, but isn't very interesting here.

Please understand this: I am not a Bayesian. I am not a frequentist, either. Both schools of thought can be useful for different kinds of problems. And because this problem does not identify the condition "at least one boy" as a proiori or a posteriori knowledge, the frequentist approach cannot work. That is the point of my paradox. That assuming it is a priori knowlegde means that for a similarly-worded problem for girls would have to make the same assumption. That leads to the invalid conclusion that there is a 2/3 probability of a boy+girl family on the space {{B,B}, {B.G}, {G.B}, {G,G}}. It would also help if you wouydl read what you criticize. I sad Martin Gardner said the problem statment was ambiguous. To my knowledge, he never applied conditional priobability in this fashion, and nothing in what I wrote implied it. - JeffJor 21:36, 17 June 2007 (UTC)

"I honestly don't understand your talk about the subsets of {B, G} in your second paragraph." Then you clearly don't understand the definition of an event, which is itself a set and can have subsets based on other conditions. I was trying to not be to wordy, and omitted those other conditions; but it was clearly implied. But, if you want, defined a more specific set of events in the form BG1, where the "B" in the first position means the older child is a boy, the "G" in the second means the younger is a girl, and the "1" means the older child's gender is reported.

The EVENT that you are assuming is your smaller sample space is {{BB1}, {BB2}, {BG1}, {GB2}}. But it isn't clear that both {BG1} and {GB2} MUST be included. They CAN be, but we need to know how "at least one boy" is determined to say it MUST be. In other words, the statement is amniguous.

The key word in the definition of an event in this case is occurrence. You constantly assume that one of the boundary conditions is equivalent to a reporting of the gender of one of the children as an event in space and time, i.e., something that occur. It isn't. None of the boundary conditions are random events. The only randomness involved is biological in nature and the outcomes of these random events determines if a family has at least one boy or not.

You might not know you are a Bayesian but you surely reason like a Bayesian. A Bayesian attach a probability measure to any statement whatsoever, whether the statement describes a random event (that occur) or not. This approach easily leads to paradoxes and there are different Bayesian schools just because there are different ideas on how to cope with this. One of the ideas the Bayesians have invented is the notions of a priori-distributions and a posteriori-distributions. These concepts are exclusively Bayesian and have no meaning for a frequentist. As you claim that the frequentist approach cannot work because we don't know what knowledge is "a priori" or not, shows that you confuse frequentism with Bayesianism completely. iNic 21:06, 21 June 2007 (UTC)

I would like to point out some nonsense in the previous disscussion...

P(B&G)=P(B&G|B)*P(B) + P(B&G|G)*P(G). is clearly wrong, as there is no assertion that G = ~B.
There is no difference in results, either for a frequentist or a Bayesian, between calculating P(A|B) as ${\frac {P(A\cap B)}{P(B)}}$ or as taking "B" as a "boundary condition".

But I agree there's an ambiguity as to whether the statement is "at least one child is a boy" (P=2/3) or "you are told at least one is a boy" (P is undetermined, but probably between 0 and 2/3, as you don't know the conditional probabilities (Bayesian) or frequencies (frequentist) of how the teller selects what to say, even given the underlying assumption that he/she is telling the truth.) — Arthur Rubin | (talk) 22:47, 17 June 2007 (UTC)

Let's say we have

\Omega

= {{B, B}, {B, G}, {G, B}, {G, G}} and we define the event B as being the observation of a boy, and event A as the observation of a family with a boy and a girl. Then P(B) = ½ and P(A&B) = ¼ which makes P(A|B) = ½. This is clearly different from when we treat all the information we get as boundary conditions on the sample space (which is the correct thing to do), as we then have

\Omega

= {{B, B}, {B, G}, {G, B}} where P(A) = ⅔. I do not understand the difference between "at least one child is a boy" and "you are told at least one is a boy." And I think we assume that if you're not a boy, you are a girl. iNic 00:21, 18 June 2007 (UTC)

Ah, yet another interpretation. You observe one child to be a boy, which gives the corresponding probablility of 1/2 as you note. — Arthur Rubin | (talk) 13:46, 18 June 2007 (UTC)

Exactly. And to not clearly distinguish between being told that at least one is a boy and observe that at least one is a boy is the cause of so much confusion here. (In the first case it is just a piece of information on the same level as the information that the family has exactly two children. It is not an event, no one picks anything from any sample space, no randomness involved, nothing happens in space or time. It is simply a boundary condition for the sample space. In the second case, however, someone observes that one of the children in a family is a boy. This is an event, defined on a sample space, an event in space and time, with randomness involved. After all, we could as well have observed a girl instead. In this case the boy/girl thing is not a boundary condition for the sample space. It is not on the same level as the information given that the family has two children.) In a frequentist setting this distinction is very clear. However, in a Bayesian setting this distinction is very hard to maintain. If possible at all. iNic 15:52, 18 June 2007 (UTC)

Added a query

Ms. Smith has two children; I'll call them Whitney and Leslie. One of them is a boy. What is the probability that the other is a boy?
Ms. Smith has two children; I'll call them Whitney and Leslie. Whitney is a boy. What is the probability that Leslie is a boy?
Is the first version ambiguous? Does it give you any additional information? I'm just asking. Jackaroodave 16:23, 16 June 2007 (UTC)

For the second version, the probability is 1/2. The problem statement is unambiguous, and the probability that Leslie is a boy is independent of Whitney's gender. The first version is ambiguous, and the answer depends on how you obtained the information that "one of them is a boy." If you only know the gender of one specific child, and know nothing of "the other," the answer is 1/2. Even if you don't tell me the name of that specific child.

Some people even claim that the fact you asked about "the other child" rather than "does the family have a girl" is important. The encyclopedia article says this is a "confusing" form of the question. The only thing confusing about it is that it illustrates the ambiguity by implying you might have the information about only one specific child. Are you asking about a specific "other child," in which case the probability must be 1/2; or are you asking about where the family fits in the space {{B,B}, {B,G}, {G,B}}, in which case you need to know how you obtained the information that "one of them is a boy." - JeffJor 17:25, 17 June 2007 (UTC)

Thanks for your clarifying answer. It reminds me of the Monte Hall problem, where the answer hinges on what Monte knows and chooses to reveal. It seems reasonable to me to assume that someone who torments people with probability questions knows the whole score, but perhaps not. How's this? "I'm a pediatrician. Ms. Smith brought in her two children for routine physicals. I'll call them Whitney and Leslie. One of them is a boy. What is the probability that the other is a boy?" Jackaroodave 13:45, 18 June 2007 (UTC)

Editted for clarity

I editted this so that the original question is no longer unclear. (Albeit at the risk of being less "paradoxical...") I also removed the assumption that "the majority of people will be confused," which tends to, as somebody else said, "rub people the wrong way."

At first I added this but then I was worried about its accuracy: (in the mistakes section)

The answer to this question is very sensitive to how it is asked. The first mistake is as follows.

When posing the second question, one might often state, "You see a family out for a walk with one of their two children. The child walking is a boy. What is the probability that the other child is a girl?" The poser of this question may expect an answer of 2/3. However, this is not technically the correct answer.

The question posed in these terms seems to ask, "A random family goes for a walk with a random one of their two children. The child randomly chosen is a boy. What is the probability that the family's other child is a girl?" A random family among the four possible family types is indeed chosen. However, a random child is also chosen, which means the family with two boys is much more likely to be walking with a boy than either individual family with a boy and a girl.

Whenever the family with two girls is walking with a random child, it will not be walking with a boy. Thus, this family never meets the criteria for the question and may be discarded. The remaining families are

{BG, GB, BB}

However, only 1/2 of the time, when chosen randomly, will family BG or GB be walking with their boy. Thus, among random families walking with a random one of their two children, the BB family will be the family the observer sees 1/2 of the time, and the GB and BG family only 1/4 of the time each. In the BB family, the boy seen walking will, of course, have a brother. Thus 1/2 of the time the other child will be a boy. And the remaining 1/2 of the time, the boy seen walking will belong to a family with a girl. The answer to the question posed, then, is 1/2.

There are of course many possible mistakes relating to this question. ..and then on with mistakes section.. Milkshakeiii 20:30, 29 July 2007 (UTC)

A Simplified Answer?

This contribution offers a contradiction to the answer for question 2. For question 1 order is important and allows for only 2 possibilities, either BB or BG so 1/2 of the time a girl will be part of the family. With question 2 order does not matter which leaves only three possible combinations, not four as implied in the answer for question 1; either both are girls (GG) or both are boys (BB) or one of each (GB/BG). Therefore with the GG possibility removed, it leaves only the BB or GB/BG resulting in 1/2 as well. Seriously, am I missing something here? —Preceding unsigned comment added by Retepris (talk • contribs) 03:23, 23 November 2007 (UTC)

Yes, you are missing something: The GB/BG-case is twice as probable as the BB case.--Niels Ø (noe) (talk) 06:56, 23 November 2007 (UTC)

Boy or Girl paradox - capitalization

Why is the word girl capitalized in the title of this article? — Carl (CBM · talk) 23:26, 4 December 2007 (UTC)

An easier way of looking at it

1st one
Two child family of unknown genders (gender of child 1, gender of child 2) = {BB, BG, GB, GG}
They are going to reveal one of the two children (child revealed, genders) = {1BB, 1BG, 1GB, 1GG, 2BB, 2BG, 2GB, 2GG}
The child revealed is a boy (sample space reduced) = {1BB, 1BG, 2BB, 2GB}
The chances of the family having a girl = {1BG, 2GB} / {1BB, 1BG, 2BB, 2GB} = 2/4 = 1/2

2nd one
Two child family of unknown genders (gender of child 1, gender of child 2) = {BB, BG, GB, GG}
They reveal they don't have 2 girls (sample space reduced) = {BB, BG, GB}
The chances of the family having a girl = {BG, GB} / {BB, BG, GB} = 2/3

Note that my version of the 1st doesn't matter which child is revealed, or why they were revealed (age,name,size,etc), only thing that matters is that the child was revealed and that the children do not suddenly change genders, {BG} to {GB} or vice versa. It also takes into account the possibility that the child revealed could have been a girl.

My version of the second is the equivalent of what it means, that there is no possibility of {GG}.

All the same I find these to be a helpful ways of presenting the problems. I've seen, so far, two times that some one has phrased the problem as the 1st one but said the answer is the 2nd, and did so as a way of expressing that the first is an flaw of intuition when in fact it was a flaw in the way they phrased it.

Deleted References

On 4 Dec. 2007, I posted the following references that were deleted the same day by Dorftrottel.

"One of the earliest discussions of this problem, by Martin Gardner, appeared in Scientific American [October 1959, p. 180]. On July 27, 1997, this problem appeared for the sixth time in Marilyn vos Savant's Parade magazine column [p. 6]. Previously, it had been discussed in her columns of March 30, 1997 [p. 16], December 1, 1996 [p. 19], and May 26, 1996 [p. 17]. This problem, involving two baby beagles instead of two children, had appeared originally in her columns of October 13, 1991 [p. 24], with a follow-up on January 5, 1992 [p. 22].

"Following the controversy of the Monty Hall problem, Ed Barbeau prepared two lengthy lists of references to the handful of paradoxes that are used to teach the concept of conditional probability. These references were published in The College Mathematics Journal [March 1993, pp.149-154; March 1995, pp. 132-134]. The fact that Marilyn recycled this paradox and toyed with her readers, without providing any references, is a glaring example of her unethical conduct."

The way in which Marilyn toyed with her readers is discussed at the following links:

http://www.geocities.com/SiliconValley/Circuit/1308/question.html

http://www.wiskit.com/marilyn/boys.html Italus (talk) 23:11, 4 December 2007 (UTC)

I would like to add that in the above articles, Ed Barbeau referred to this problem as the Second Sibling Paradox. Italus (talk) 21:11, 5 December 2007 (UTC)

Title query

Is this concept known by another name? I've looked for references via Google and, eliminating Wikipedia from the search, the main site discussing a "boy or girl paradox" is h2g2, which isn't exactly where I'd look for mathematics info. --Tom Tresser (talk) 16:13, 22 December 2007 (UTC)

Take a look at http://mathforum.org/dr.math/faq/faq.boy.girl.html Snielsen (talk) 19:22, 19 July 2008 (UTC)

Bold Boys

I don't understand the significance of some occurrences of Boy having been made bold in the tables in sections First & Second question. --Lambiam 21:05, 7 March 2008 (UTC)

Third Question + Mistakes Major correction

I sat here and I stared and stared and read and worked out problems, I immediately ran into a huge pitfall with the second question. I assumed I was doing exactly what the mistakes was saying people did, but it turns out the mistakes is a mistake.

First, like someone stated above:

  1. Harry has an elder brother Tom
  2. Tom has a younger brother Harry

Is way wrong. It almost needs to be

  1. Harry has an older brother 
  2. Harry has a younger brother(in addition to:)
  3. Harry has a younger sister
  4. Harry has an older sister

But now as you can see there is a problem, all four hold the same weight bringing the odds down to 1/2, which most would say is incorrect, however in this case it is correct. Turns out simply naming the known gender changes a whole lot, so in addition to just completely wiping that section, I'd like to introduce the third question:

"A random two-child family with at least one boy named Harry is chosen. What is the probability that it has a girl?"

Now here is where things get really sketchy. The first obvious problem is what if someone else is named Harry in the family? Adding the clause "Given that no one else is named Harry" simplifies things and at least should be an example taken care of before tackling the question sans-clause. Turns out the answer to this particular question is 1/2, as briefly demonstrated in my correction of the mistakes.

On the flipside adding the clause "Given that if another brother is present, his name is also Harry" nullifies the uniqueness bringing it back to the same as question 2; which is 2/3."

Now someone will probably have to check my math but I worked this out:

Suppose the probability that a boy is named Harry is b.

A family has: Two sons named Harry with probability x=(.5b)^2 A son named Harry and a son not named Harry with probability y=2(.5b)(.5(1-b)) A son named Harry and a daughter with probability z=2(.5b)(.5) No son name Harry with probability 1-the sum of the above three numbers.

Thus the probability that the family has a daughter given that they have a son named Harry is z/(x+y+z)=1/(2-(b/2)).

This matches our qualitative prediction above. If b=1 (i.e. all sons are named Harry so we are in the situation of Question 2) then the answer is 2/3. If b is close to 0, then the answer is close to 1/2.

Finally, a more detailed link regarding all this:

http://members.cox.net/srice1/random/child4answer.html

The site has a 4th question, but I find it completely unnecessary, personally. Hopefully someone can look all this over and we can look at getting it on the page soon. 70.63.193.180 (talk) 09:09, 6 April 2008 (UTC)

I've deleted every Tom, Dick and Harry from the text of the article. May they rest in peace. --Lambiam 21:32, 6 April 2008 (UTC)

"Girl Named Florida" Variation

Leonard Mlodinow’s book, The Drunkard's Walk, contains a variation of the girl boy paradox. I have read the answers, but can't quite figure out why the two situations are different. Maybe someone could explain it in language I can understand and post in in the main article. Here are the two scenarios, as stated in a quiz posted in the Wall Street Journal [2]

Scenario 1: You know that a certain family has two children, and that at least one is a girl. But you can’t recall whether both are girls. What is the probability that the family has two girls — to the nearest percentage point?

Answer: 33%

Scenario 2: You know that a certain family has two children, and you remember that at least one is a girl with a very unusual name (that, say, one in a million females share), but you can’t recall whether both children are girls. What is the probability that the family has two girls — to the nearest percentage point?

Answer: 50%

This seems a little different from the boy girl paradox in that you don't know whether the girl with the unusual name is older or younger, just that she has an unusual name ("Florida" in the book) —Preceding unsigned comment added by Luschen (talk • contribs) 02:26, 11 July 2008 (UTC)

Frankly, I think both these situations are too ambiguous to allow the calculation of exact percentages. If what you know is that the elder is a girl (say), but you don't remember the sex of the younger, the sentences given as "Scenario 1" are true, but the probability is 50%. If you know that at least one is a girl, and think that both are, but you are not quite sure, again the sentences are true, but the probability may exceed 50%. - Similarly, in scenario 2, the percentage depends on how you have obtained the information given.--Noe (talk) 13:56, 11 July 2008 (UTC)

In the first case the information that the family has a girl is on the same level as the information that they have two children. The sample space is here the three possible combinations {B, G}, {G, B}, {G, G}. All equally likely by assumption. That is why the answer is 1/3 and not 1/2 in this case. In the second case the only information that is certain is that the family has two children. Then you remember the unusual name of one of the children which happened to be a girl. This is an event. It could as well have been some other name you recalled, including an unusual name of a boy. Thus, this information is not on the same level as the information that the family has two children. The sample space here is therefore all four combinations of boy/girl. A simple calculation using conditional probabilities shows that the probability in this case is 1/2. (See some paragraphs above for the calculation.) iNic (talk) 18:12, 17 July 2008 (UTC)

INic, this is not correct. An event is a possible outcome. There is no temporal order to the statements. They are both constraints on the space of possible events. The fact that the family has two children eliminates all events in which they have more or less. The fact that they have a child named Florida eliminates all events in which they don't have a child named Florida. You are correct that there are four possible events, but they are now {Florida Girl} {Girl Florida} {Florida Boy} {Boy Florida}. Without resorting to logical formulation, a simple simulation of this problem shows empirically that these events occur with equal frequency and are thus equally probable. That makes the answer 1/2. --Thesoxlost (talk) 17:08, 20 November 2008 (UTC)

Thesoxlost, your alternative interpretation is a bit odd but gives indeed the correct result in the "Florida" case. However, you get into trouble interpreting the memory of a girl's name as a sample space constraint in the originally given example above where the name of the girl is not given. All you remember in that case is that it was some unusual female name. How would you turn this incomplete memory into a sample space constraint? It's simply impossible to do. So while your interpretation easily breaks down the more natural interpretation (treating a diffuse memory as an event in space and time) always work and gives the correct answer. iNic (talk) 01:33, 23 November 2008 (UTC)

If we are worried about the order in which the children are born, then it would be wrong to assume only 4 possible outcomes: BB, BG, GB, GG. We should really count GG and BB twice. (Or count BG and GB as one if the order doesn't concern us.) I'll explain why:

In the first problem, where AT LEAST one of the children is a boy, we say the possibility of the other one being a girl is 2/3. However, let's think of it this way. The boy could have: an older brother OR a younger brother OR an older sister OR a younger sister. Here we clearly see the possibility of 2 boys vs 2 girls, hence 1/2 chance - NOT 2/3. The problem with saying we're left with BB, BG, and GB so the prob. is 2/3 is that we only take into account ONE of the boys in the BB option.

Similarly, if we know the older child is a boy, then we have the following options. The boy could have: a younger sister OR a younger brother. There are no other choices. Therefore, again, the probability is 1 girl vs 1 boy, so 1/2. —Preceding unsigned comment added by 198.96.180.245 (talk) 20:44, 27 August 2008 (UTC)

This problem has nothing to do with the order in which the kids are born. It is mentioned in the article only as a (perhaps bad?) pedagogical heuristic. If you toss two coins repeatedly (please do!) and note how often you get two heads, two tails or one head and one tail you will notice that the last combination is twice as common as each of the first two, even though you don't care in what order you toss the coins or even if you toss them simultaneously. To understand why it must be like this is not entirely trivial. In fact, in the dawn of probability theory d'Alembert argued exactly as you do now. iNic (talk) 01:23, 26 January 2009 (UTC)

Really now. Stop this foolishness.

The main problem in the second question is mixing the variables. If the question is:

A random two-child family with at least one boy is chosen. What is the probability that it has a girl? (Or: choose a random two-child family assuring that at least one is a boy. What is the probability that the other one is a girl?)

Then we are not asked anything about the relative ages, are we? So to base our calculations of probability on the premise that:

in the second case, there are three equally probable ways in which at least one child can be a boy: only the older one, only the younger one, or both.

Would be foolish, would it not?

Because the question only asks about the probability of one child being male or female, we can safely discount the information about the other child, and any relationship between the two. Therefore, unless we take into account unnecessary variables, such as the real life balance of male to female children in typical families in the same circumstances as our random family, the answer is left perfectly clear. Two possible states (male / female) with no other pertinent variables means a straight 50/50 split.

Complicating this question with equations and tables is probably the greatest waste of human effort I have witnessed this year. So.. thanks for all the laughs... AND THE FISH. —Preceding unsigned comment added by 195.92.41.134 (talk) 15:15, 17 November 2008 (UTC)

The answer is clearly 2/3 under most formuations. If you can't understand the article, it's not our job to teach you basic probability theorem. — Arthur Rubin (talk) 16:07, 17 November 2008 (UTC)

Smart people and good formulations of the problem can be proven wrong. But not only is the answer clearly 2/3 for Question 2, but simulations of the problem that randomly generate pairs of siblings clearly demonstrate, empirically, that the answer is 2/3. Also, for the person who reverted Question #3, it can be demonstrated empirically that the answer to Question #3 is 1/2. Knowing the name of the child changes the odds. And the fact that this is all so counter intuitive is exactly why this article is interesting. :) --Thesoxlost (talk) 16:57, 20 November 2008 (UTC)

Hey, Rube. You convinced me simply by using bold print on the word clearly! WELL DONE! But seriously, you are correct to state it is not your job to teach me basic probability theorem. Nor is it your job to insinuate that I possess insufficient understanding of same. I am aware of the principles governing this type of paradox problem, but as per my previous post, the addition of an UNNECCESARY variable seems IMO to be the cause of all this wasted time. If age were a pertinent variable in the question AS ASKED, it would indeed result in an answer of 2/3. Please assess your arrogance levels, and if necessary, adjust to an inoffensive value.

what are we actually looking for?

If the question is about the probability of the sex of one child out of two being female, in what way does any other factor related to the phrasing influence this? No infomation about me would change the probability of my sibling being one sex or the other.

Can anyone explain how knowing that one child is older is in fact relevant to sex determination of a younger sibling? And if not, why not?

It looks a bit like people are trying to answer the question:

          "what is the probability of the other sibling being both younger and female"

If the question is about the probability of the sex of one child out of two being female, and if you assume they are independent, knowledge about the probability of one child being female does not tell us anything about the other. You are right. This is why if you said "the younger child is a girl" or "the older child is a girl", the chance is one half.

The reason the information changes the odds is because the information tells us something about both children: "If the other is not a girl, then this child has to be." Our question can be rephrased: "A couple has two children. If the older is not a girl, then the younger is. If the younger is not a girl, then the older is. What is the probability of two girls?" In this way, we've made clear that information is learned about both children. This is what changes the odds.

--Thesoxlost (talk) 22:04, 2 December 2008 (UTC)

Thanks. I appreciate the clarification, and the speedy response. 8)

An explanation of why it's 50%, not 2/3

For the 2nd question, all of the people arguing that the probability of the other child being a girl is 2/3 are incorrectly assuming that the following 3 situations are equally as likely to be the actual current situation:

Situation 1: BB Situation 2: BG Situation 3: GB

In fact, these 3 situations are NOT equally as likely to be the correct current situation. BB (situation 1) is twice as likely to be the current situation because the boy we already know about could be either of 2 boys in the pair. In other words, the likelihood of currently finding ourselves in each of the following situations is:

Situation 1: BB (50%) Situation 2: BG (25%) Situation 3: GB (25%)

Therefore, the probability that the other child is a girl is 50% (25% + 25%).

For a more detailed explanation (which explains it using the tickets-in-a-box approach), check out this page written by a knowledgeable statistics professor: http://www.quantdec.com/envstats/notes/class_04/prob_sim.htm

Eink21 (talk) 11:24, 15 December 2008 (UTC)eink21

Eink21,

You're incorrect. The cite you linked is (I believe) correct. But that analysis is not applicable in this case. The link you provided argues that the answer to the question (which I'll call Question 4), "You meet a boy on the street. What is the probability that he has a sister?" is 50%. Here, we don't "meet a boy," we are presented with a family. The frequency of BB, GG, GB and BG families in the world is equal. In your Question 4, your not asking about the frequency of existing families. Your asking about the frequency of boys met on the street. And yes, your twice as likely to meet a boy on the street from a BB family than from a BG family. That does not mean that there are more BB families in our population. The probability is 2/3 for Question 2.

--Thesoxlost (talk) 16:34, 15 December 2008 (UTC)

Thesoxlost - I think I see what you're saying and I think I'm starting to agree with you. Basically, my analysis above was incorrect because I was taking a random boy and finding the probability that he was FROM a BB, BG, or GB family. (In this case, it is 50% likely that he is from a BB family). This is not the same, however, as taking a random family and finding the probability that it CONTAINS a BB, BG, or GB. (In this case, all 3 are equally likely).

Eink21 (talk) 00:10, 16 December 2008 (UTC)Eink21

Yup! Sorry for being less than clear. That's exactly right. --Thesoxlost (talk) 02:50, 16 December 2008 (UTC)

Responses to rebuttals for 1/2 argument

OK, I am new to this paradox and just read the article most of this discussion on it. After reading the article, i was convinced the answer to Q2 was 1/2. After reading the discussion, I remain convinced. I'll briefly review the 1/2 argument in my own words, then address the rebuttals to this logic.

The CORRECT possibilities of the second question using the age format from the original post are as follows:

1) older boy(known), younger boy(unknown) 2) older boy(unknown), younger boy(known) 3) older boy(known), younger girl(unknown) 4) older girl(unknown), younger boy(known) (NOTE: There are 8 total possibilities if we saw one child, without specifying whether it is a boy or a girl. 4 of them are eliminated specifying that the known child is a boy)

The boy we KNOW about (lets say you see the family in a restaurant with only a boy, cause the other kid is sick) is marked known. This statement is key: SINCE YOU DO NOT KNOW WHETHER THE BOY IS THE ELDER, HE HAS JUST AS MUCH CHANCE OF HAVING AN OLDER BROTHER OR YOUNGER BROTHER AS HE DOES IN HAVING AN OLDER SISTER OR YOUNGER SISTER. (not trying to shout, just emphasize) There cannot possibly be any greater or less chance for any of these situations listed. There are four possibilities, two of which involve a girl. Therefore the correct answer is 50%

REBUTTALS The Coin Example(this also goes for the random generation of pairs sampling guy Arthur Rubin) This does not invalidate the 50% answer to the question at hand. In the coin example, you don't have any knowns. So yes, absolutely if you flip two coins, it will come up split twice as often as it comes up heads (or tails). The problem with this is that we're not flipping two coins, we're flipping one. I'm glad you bring this up because this proves the point even clearer now that i think about it. Using your example, if i put one coin on the table as heads, flip the other coin and ask you to give me the probability that it will come up the same as the one i placed on the table what would your answer be? Invariably 50%. That is what we have here, one coin (child) is on the table (whose gender we know). We know nothing about the other child, thus it is a 50/50 chance the other child will be the same as the one we know of.

Eink21 vs Thesoxlost Eink21s example was correct. The meeting of a "family with 2 children, one of which is a boy" vs the meeting of a "boy who has one sibling" has no difference. thesoxlost is speaking of general population in his rebuttal, but that has zero bearing on the question at hand. We're not talking about all 2 child families. We're talking about 2 child families where at least one of the children is a boy.

Paradoxes do exist where intuition isnt correct, but this is not one of them. The obvious answer of any given kid has a 50/50 chance of being a boy or a girl is the correct one.

Omegathejack (talk) 21:07, 8 January 2009 (UTC) 1/8/2009

The problem with your logic is that instead of considering Q2, you are restating Q1. For example, "seeing a boy in a restaurant" is a restatement of Q1, where instead of choosing a family where the older child is a boy, you are choosing a family where the child you see in the restaurant is a boy. This completely specifies the gender of one child and completely unspecifies the gender of the other. Hence, the probability that the other child is a boy is 50%. Indeed, it is hard to come up with a real-world situation where Q2 applies, so it's understandable that many people accidentally apply the Q1 logic to Q2, hence the paradox.

I assume you are coming from Jeff Atwood's blog, where the problem was poorly presented. There, Jeff merely tells you that someone "told you they had two children, and one of them is a girl." What he neglects to consider is that it matters immensely how this information is revealed.

If it was revealed in response to the question "Do you have kids?", then the probability of them having a boy and a girl is 50%, since they might just have easily commented about the number of boys they have. To see how this is equivalent to Q1, notice that the answer completely specifies the gender of one child, and this child is chosen by the parent. Instead of choosing a family where the older child is a girl, we're choosing a family where the child the parent chose to comment on is a girl.

But if it was revealed in response to the question "Do you have at least one girl?" then the probability of them having a boy and a girl is 66% -- because the situation is equivalent to Q2. To see this, consider that only families with two boys -- 25% of the population -- would answer "no". Of the remaining families, only a third have two girls. This is distinct from the previous case because the questioner gets to choose the gender, so the parent is effectively forced to comment on the combined genders of both children.

So, in the second case, the questioner learns more information even though the response was the same in both cases. If you still have any doubt that these cases are different, note that a parent of a boy and girl could give any response they want in the first case, but they are forced to say "I have at least one girl" in the second case.

As a numeric example, suppose we have a gathering of 400 two-child families, 100 each of BB, BG, GB, and GG. You devise two tests for them, where each family is instructed to circle exactly one true statement on each test (circling one statement chosen at random if both statements happen to be true).

Test 1:

(A) I have one or more girls.

(B) I do not have one or more girls.

Test 2:

(A) I have one or more girls.

(B) I have one or more boys.

For test 1, you will get 300 "A" responses and 100 "B" responses. For test 2, you will get approximately 200 "A" responses and 200 "B" responses.

Now, suppose you pick a test 1 answer sheet at random, and it happens to have an "A" response. Likewise, you pick a test 2 answer sheet, and it happens to have a "A" response. Now, we know that both of the corresponding families have "told us that they have one or more girls". But the probability of the test 1 family having two girls is 33%, while the probability of the test 2 family having two girls is 50%. To see this, note that the population of families answering "A" on test 1 includes all of GG, BG, and GB -- whereas the population of families answering "A" on test 2 is all of GG plus half of the combined BG/GB population.

Thus we can see that it is important to set up the paradox correctly -- as the Wikipedia article does. The distinction is subtle, which is why the paradox is so effective.

So what is the answer in Jeff's case? I believe the probability of the couple having a boy and a girl is 50% -- not because the problem is equivalent to Q1 but because we simply don't know enough to say whether it is equivalent to Q1 or Q2; we know nothing of the interview process. In the absence of information, we must assume the worst, which is 50%.

—Leejc (talk) 00:34, 9 January 2009 (UTC)

The exact formulation of the problem is crucial, and this article in its present state gets it right. Most presentations don't. I have added Jeff Atwood's statement of the problem, with a reference, as an example of an ambiguous presentation; hope this helps. Rp (talk) 11:30, 9 January 2009 (UTC)

I've read the article and this discussion about fifteen times now and confess to being very impressed with the contributors and very confused on my own part with respect to Question 2. If, as we are told, age (birth order) is not a factor, then why is the chart with "Older/Younger" even being used? Why don't we just have a BB choice plus a BG (or GB) (but not both) choice = 1/2? This means I have some questions about the Conclusion section, as well. Specifically, I see GB + BG as doubling up on, and therefore giving undue weight to, what should be only one choice. If birth order *does* matter, then GB and BG are OK, but we'd need a second BB to allow for the child we've met to switch places = still 1/2. Right?

Agnutmajig (talk) 04:45, 25 January 2009 (UTC)

Leejc is absolutely right. He did a better job expressing what I tried to say back in June, 2007. But while I agree with his final conclusion, it isn't quite the right argument to use, that "we must assume the worst," to get it. So I thought I would try once again to explain how the different wordings of this problem, that seem to say the same thing, can lead to different solutions.

The ultimate cause of confusion is the math-specific definition of the term "event." When a random experiment produces a result, that result is not called an "event," it is an "occurrence." "Event" is defined to be the subset of all possible occurrences that have some defining common property. So if I flip a coin a hundred times, there are only two possible events: "heads" and "tails." There will most likely be about fifty occurrences of each event; and each occurrence will fit into only one event.

If I draw a card from a standard deck of playing cards, there are lots of possible events and/or occurrences. Some of them overlap. They include "black card," "spade," "honor card (TJQKA)," "Ace," and "Ace of Spades." The occurrence "black card" can be in the event "spade" or "club" as well. Note that an occurrence can fit into other events, but not vice versa. An occurrence of "spade" is also an occurrence of "black card," but the event "spade" is most definitely not the event "black card." Since determining probabilities involves counting the number of elements in events, the difference can be crucial when less specific observations are applied.

There is a famous problem that isolates this difference between an occurrence and an event. It is called Bertrand's Box Paradox Suppose you have three indistinguishable boxes, each with two indistinguishable drawers. Inside each drawer is a coin. Box #1 has two gold coins, Box #2 has two bronze coins (I can't resist the temptation to use the initials B and G), and Box #3 has one of each. You pick a box at random, and open a drawer at random. If the coin is gold, what are the odds that the coin in the box's other drawer is also gold (i.e., that it is Box #1)?

The first issue here, common to story problems, is that this problem describes an occurrence, not an event. We must extrapolate what the possible outcomes of the actions are, to define the event (the set of possible occurrences) that defines the conditional probability.

The incorrect approach to Bertrand's Box Paradox is to say that you narrowed the identity of the box to two possibilities, Box #1 and Box #3. That the "event" matches the observation "the box has at least one gold coin." This is wrong because there are other ways to describe occurrences in that event. If you pick box #3, then no matter which drawer you open, the occurrence is in the event "the box has at least one gold coin."

The correct approach is to define the event as the set of all possible occurrences where you open a drawer with a gold coin. Every possible occurrence that is consistent with the problem description is in this event (remember, it's a set), and every possible element of this event is consistent with the problem description. Both of those conditions have to be true for the event set we extrapolate as the condition. This set has three elements, not two. In two of those three possible occurrences, the box's other coin is gold. The correct answer is 2/3.

I have seen a variation of Bertrand's Box Paradox used to represent the Boy or Girl Paradox. You are the organizer of a neighborhood swim meet. During the meet, you get a phone call from Mr. Smith, who is delayed and can't pick up his children after the meet. He says "Tell my two kids to go home with a neighbor. Their names are Sarah and..." and the phone connection goes dead. There are four families of two children in the meet, and they conveniently include the four possibilities BB, BG, GB, and GG. What are the odds that Mr. Smith has two daughters?

This occurrence can be described as "a two-child family includes at least one girl." But not the event you need to use as the condition. Mr. Smith could have mentioned a boy's name first; and if that boy had a sister, that occurrence would have been an element of the set "a two-child family includes at least one girl." So that event fails one of the two criteria. You get the answer 1/3 by erroneously using this event.

The correct event is just that Mr. Smith mentioned that he had a daughter, and compares to "you opened a drawer with a gold coin" in the Box Paradox. You can count four possible occurrences of that event. In two of those four, representing a single family, Sarah has a sister. The correct answer is 2/4, or 1/2.

If a conditional probability question describes an occurrence, and not an event, care needs to be taken when extrapolating the "event" that fits the math definition. You need to avoid making the assumption that because a fact is "given" for the occurrence, that it defines the "given" event set. The former is only a statement of the form "If FACT is true, then OCCURRENCE is in EVENT," while the math-definition you need to use is "OCCURRENCE is in EVENT if and only if FACT is true."

The reason Leejc's conclusion is right, is because Jeff Atwood's problem statement is an observation about the occurrence, not a definition of the event. While isomorphic in this case, the correct solution is not to remove half of the BG cases, as Leejc did; but to count the GG cases twice - once for each girl that could be the one mentioned. And BTW, I modified that Sarah Smith example from soemone defending Jeff Atwood's answer. They answered their own example incorrectly,

For Agnutmajig: In a correct solution, the fact that one child is older is not being used except to account for the fact that there are twice as many boy&girl families as two-girl families. In my "Sarah Smith" variation, it doesn't matter what the ages are as long as there are two BG families to every GG family, mimicking the real life distribution.

IMO, the current article is weak because it doesn't explain how the definition of "event" is important, and instead concentrates on incidental points. But changing it probably is not encyclopedic (not that the article currently is). But I would like to see it say that so-called experts (like Marilyn vos Savant) and actual experts who should know better (like Leonard Mlodinow) have given the 1/3 answer to cases that are at best ambiguous, and quite possibly should be 1/2. JeffJor (talk) 22:13, 27 January 2009 (UTC)

Thank you, JeffJor, for such a detailed contribution. I can follow the examples and understand the reasoning (so I tell myself), with the following exceptions or additional questions:

1) In the interest of improving the article, I do not support the use of a chart with "Older" and "Younger" column headings for Question 2, where we're told age (birth order) is not important. I feel this would be confusing to a reader and may also be contributing to the "2/3 vs. 1/2" disagreement.

2) In a related way, I'm still having a severe attack of confusion as to why we have four possible outcomes (events?) for our one family: BB, BG, GB, GG. I think they should be BB, BG (alpha order) or GG, when age is not an issue. Can you re-explain why we double up on the boy-girl combination? You say such doubling mimics "the real life distribution." Do you mean biologically? Agnutmajig (talk) 18:23, 28 January 2009 (UTC)

Agnutmajig, your questions show that you might not understand the difference between an occurrence and an event. It isn't really your fault, because sometimes in very controlled circumstances, we can treat sets of occurrences as single occurrences when we count them, to simplify the counting. And that is what is confusing you.

Suppose you have a bag with 20 red jelly beans, and 10 black jelly beans. The experiment is that you pick one jelly bean from the bag. There are thirty different occurrences that can happen, one for each jelly bean. But there are only two different events, getting a red jelly bean and getting a black jelly bean. The odds of getting a black jelly bean are found by dividing the number of occurrences in the event "black jelly bean," which is 10, by the number of occurrences of any kind of jelly bean, which is 30. Thus, (10 black jelly beans)/(10 black jelly beans + 20 red jelly beans) = 1/3.

Now suppose the red jelly beans come in two flavors, cherry and raspberry; but the black ones are all licorice. There are ten of each flavor in the bag. You don’t like licorice, but either red jelly bean is fine. You could calculate the odds of getting one you don’t like the same way I did above, or you could just say that there are the same number of each flavor. The odds are (1 licorice jelly bean)/(1 licorice jelly bean + 1 cherry jelly bean + 1 raspberry jelly bean) = 1/3. I counted events instead of actual occurrences because I knew there were the same number of occurrences each. And in fact, I don’t even need to know that there were 10 of each, only that it is the same number. I can treat equal-sized sets of occurrences as though they are individual occurrences.

This is the same thing that is done with the ages of the children in this problem. Technically, to get these probabilities we want, we would need to take a census of all two-child families in the world. But we know (well, we assume) that there equal numbers of families in the four groups BB, BG, GB, and GG. We treat those groups as though they are the occurrences. And we don’t have to use age - we could use any ordering that can be applied to two children in one family that is independent of gender. Age is just the easiest.

If we were to ignore the difference between BG and GB, as you suggest, it would be the same as ignoring the difference between cherry and raspberry jelly beans, and saying the chance of getting one you don’t like is 1/2. JeffJor (talk) 15:08, 29 January 2009 (UTC)

I'm not saying we should ignore the difference, provided we're told than an "order" or other comparison is called for--and in Question 2, we were told that "neither order nor age is important." I am saying we should treat all combinations equally. When we're considering gender alone, the choices are BB, BG and GG, which is three choices, not four. When we're considering something else (comparative age, the order in which we saw the children, who's taller), then there are six choices, not four. By using four choices, in both cases we're unfairly creating an option for when the siblings' genders differ that we're not allowing for when they don't.

In other words, including both BG and GB should certainly be done when there is another difference to consider along with gender--age, for example. However, if we allow for such "age" place-swapping only for the boy-girl combo, then our ordering is not "independent of gender," as stipulated in your fourth paragraph. Instead, the BB and GG choices depend totally upon gender and are ambiguous or presumptive as to age; while for the boy-girl combo, we've explicitly and properly allowed each child to be either older or younger.

In summary, I agree with the answer to Question 1 (½) but not with the combinations shown on its chart. I disagree with the answer to Question 2, as well as the combinations and column headings on its chart. I further confess to knowing nothing about "probability." Perhaps it's the way these particular questions are worded, but I'm not seeing a paradox at all. Thanks for all your patience with this. 24.31.129.66 (talk) 20:03, 3 February 2009 (UTC)Agnutmajig

Agnutmajig, your concern has some validity, but it doesn't affect anything. The article could list only three possibilities, (Two boys, One child of each gender, Two Girls); but then we'd have to devise a way to count how many families are in each. Because we don't care about the number of possibilities, we care about the number of families. Just like in my first jelly bean example, where we don't care about the number of colors, we care about the number of jelly beans of each color.

To solve it without ordering the families somehow, we'd have to reword it to make it more complicated. Say, that in a particular town, there are 100 two-child families: 25 with two boys, 50 with one child of each gender, and 25 with two girls. Question 2 would be phrased that you picked a family at random from the first two groups. There are 75 families you picked from, and 50 of them include a girl. The odds are 50/75 = 2/3 that the family includes a girl. That is the answer given in the article, and it is correct. It was just arrived at it from a different direction. A direction that is made simpler by considering order.

You aren't seeing the paradox, because you are falling into its trap. The paradox is that you think the answer to Q2 is 1/2, when it really is 2/3. There is also a paradox if the second question is worded differently - call it Q3, where you know through some unspecified means that one child is a boy - then the answer changes to 1/2. Contrary to what Skeetin says below. You might know this by having seen just one child (see Bar-Hillel & Falk, Section 4.3 in Grinstead and Snell's probability course, or a similar problem published in The American Statistician), by Leejc's second "test" above, or by having been told by someone else that there was (at least) one boy. The paradox is that some people insist that the answer to Q3 is the same as Q2, just like you are insisting the answer to Q2 is the same as Q1. They feel that the family was chosen from all falimies with at least one boy, when really it was chosen from all families where the observation 'at least one boy' would have been made the same way. Since it is possible you would have met a girl from a family that had a girl, Q3 is more like Bertrand's Box Paradox than it is like Q2. It is Bertrand's Box Paradox with four boxes, and the correct answer is 1/2.

Anyway, this page is not supposed to be for explaining the correct answer, it is for discussion that improves the article. I am working an a pretty big re-write, where I hope to present (as impartially as I can) the differences between Q1, Q2, and my Q3. But the fact that there are always two camps for both Q2 (you say 1/2; while the article, Skeetin, and I all say it is 2/3) as well as my Q3 (Skeetin says 2/3; I say 1/2) means that it likely will get put back, despite my references that agree with me. The problem here is that nobody really examines why somebody else's answer might be right, they only reiterate why they thinnk their own is, ignoring the points others make. Often, they grasp at some totally unrelated and unimportant issue, like "the probability of a boy isn't 50%" or "the father is a boy, too." I'll try to remove those ambiguities, but I doubt everybody will believe me. Authur Rubin, AndyBloch, and anybody else who changed this article recently, I'd be interesting in hearing your thoughts before I go ahead. JeffJor (talk) 17:49, 4 February 2009 (UTC)

Sample space of second question cannot be limited to children

The second question as currently phrased "A random two-child family with at least one boy is chosen. What is the probability that it has a girl?" does not explicitly state that at least one child is a boy, but only that at least one family member is a boy. We are not asked to determine the probability that one of the possible sets of children in the family has a girl in it, but instead to determine the probability that there is a girl in the family. There is no constraint in the formulation of the question that requires that the terms boy and girl refer exclusively to children. Further, the number of family members has not been limited by a specific amount (even if we were to assume as fact that the phrase "two-child family" limits the number of children in the family to two and only two children, the number of family members who are not children has not been limited in any way). As a result, the sample space should include combinations of two children (or more, if a random two-child family is allowed to have more than two children) and combinations of an arbitrary number of family members who are not children. Skeetin (talk) 04:12, 31 January 2009 (UTC)

While there are many ways that subtle points can be overlooked in this problem, that is not one of them. And arguing with this pedantic point does not help to further the understanding of the issue. The primary definition of "boy" is "a male child," and it is obvious that the intent of the question is that one of the two children is a boy.JeffJor (talk) 03:34, 2 February 2009 (UTC)

JeffJor, if it's incorrect to interpret a stated question literally, then the question is ambiguous. I don't think there's any question that an ambiguous phrasing has been used, one which can have multiple correct interpretations, and thus multiple correct answers which depend on the sample space. Having said that, though, I would never personally attempt to enumerate multiple sample spaces including all possible combinations of family members, as I'm fairly certain that the universe will end before I'm done. But one has to argue pedantic points in questions of probability, since even the most minute detail can yield useful information that might alter the size of the sample space. If a particular detail reduces the size of the sample space, then all the better. I agree that the question is intended to be interpreted as you have described, but that interpretation is not required (impossibly difficult to answer, but not required).

In the spirit of furthering the understanding of the issue, it seems clear that those who think the answer is 1/2 are either misinterpreting the question as it's commonly accepted, or not properly enumerating the sample space(s). That's not necessarily because of the ambiguity, but that's a distinct possiblilty. Regardless, once it is realized that we are simply determining the probability that one of the possible permutations {BG, GB, BB} of two children is the correct one can we reveal that there is no paradox. We are not calculating the probability that a particular pregnancy resulted in a girl, but the probability that there is a girl in (the correct) one of the possible combinations of two children which contains a boy.

Those who still disagree with 2/3 based on the belief that "birth order" shouldn't matter should realize that "oldest/youngest" just seems to be the most natural way to illustrate that the permutations BG and GB are not the same, but "oldest/youngest" labels are not explicitly required to properly enumerate the sample space. Also regarding labeling, some of the confusion may stem from the belief that somehow having "seen" a boy reduces the sample space to two events. It doesn't. If you label the columns "seen/not seen" and then reverse them, you'll see that "not seen/seen" is a different but equally valid sample space, and cannot be eliminated. You are then required to calculate the probability that one of those two valid sample spaces is the correct one, and the answer to that question is 2/3. Otherwise, one could create the labels "child imagined/child not imagined" and magically reduce the sample space without having seen anything. Clearly, having seen a boy does not change the probability from 2/3 to 1/2.

"What is the probability that it has a girl?" and "What is the probability that the correct permutation has a girl.?" are equivalent questions. While the latter version of the question trivializes the issue and would be no fun to ask in practice, if it helps someone to realize what's being calculated then it's worth noting. Skeetin (talk) 10:31, 2 February 2009 (UTC)

Discussion moved to Skeetin's talk page JeffJor (talk) 14:04, 3 February 2009 (UTC)

important proposal

Why is this problem so interesting? Because slight changes in wording completely change the answer. The current version" A random two-child family whose older child is a boy is chosen. What is the probability that the younger child is a girl? (Or: choose a random two-child family assuring that the older one is a boy. What is the probability that the other one is a girl?) " is not intereting enough. I propose "A random two-child family is chosen. One of the children is randomly chosen and is found to be boy. What is the probability that the other child is a girl?" which reduces the difference of the 2 questions. Stating "older" and "younger" distracts novices.--133.9.4.11 (talk) 07:35, 6 February 2009 (UTC)

Current version is not "paradox". Simly a problem. The reason it is paradox is that almost same question leads to a different answer. Also see Mathforum linked to the article. For this reason, 2 problems should be

Suppose a 2-child family. You choose randomly one of them, who is found a boy. What is the probability the other is a girl? Suppose a 2-child family. You learned that at leas one of them is a boy. What is the probability the other is a girl?

In the current version the sentences differ widely and one states "older" and "younger". This is trivial, and does not constitute a paradox.--211.5.13.42 (talk) 12:27, 10 February 2009 (UTC)

Major Rewrite

The Boy or Girl Paradox is a set of related problems in probability theory, usually seen in recreational mathematics, that are all built around the basic question "If one of the children in a two-child family is a boy, what it the probability the other child is a girl?"

But the basic question does not establish enough information calculate the probability. The method by which some observer determined "one is a boy" must be included in the problem statement, or assumed by the solver. Depending on what that method is, there are at least three variations of the problem that have different solution methods, leading to two distinct answers. If different readers make different assumptions to fill in the missing information, they can get different answers to the same problem. The paradox is not only that different answers may be given for the same problem, but also how the specific information given (or assumed) can change the answer.

Common assumptions for all questions

There are four possible ordered combinations of children in a two-child family. Labeling boys B and girls G, and using the first letter to represent the older child, the possible combinations are:

{BB, BG, GB, GG}.

These four possibilities are taken to be equally likely a priori. This follows from three assumptions:

That the determination of the sex of each child is an independent event.
That each child is either male or female.
That each child has the same chance of being male as of being female.

An ordering is necessary for the solution to any version of this problem. It is used to represent the proper proportions of two-child families in the general population. Twice as many families have a boy and a girl; than those with two boys, or those with two girls. This can be accounted for by putting the children into some order, since there are two possible orderings of the boy and the girl.

Any factor that is independent of gender could have been used to order the children. Age is used because it is the most intuitive ordering, not because of how it might affect any of the solution methods. It does not affect any of the answers.

Additional information required for a solution

A probability is essentially the frequency that one particular kind of outcome will happen, relative to all possible outcomes, under a given set of circumstances. Those circumstances define the sample space for the probability. But the sample space needs to be defined not only what condition(s) are required to include an occurrence in the sample space, but by what condition(s) exclude one as well.

The basic question for the Boy or Girl Paradox is ambiguous because it does not describe which two-child families should be excluded this way. That is, what the observer might have told you about he family under other circumstances. In particular, the statement in the basic question, "one is a boy," does not mean that the observer has to say that in all cases where one of the children is a boy. That condition can be a statement about one child only, that ignores his sibling; or it can be a statement about the family, that considers both children.

In the first case, where the observer sees only one child, the only possible alternate observation is "one of the children is a girl" (or something equivalent, like "one of the children is not a boy"). Since only one child is considered, for any given child seen this way, only one observation is possible.

In the second case, there are several possible alternate observations, including (but not limited to) "one of the children is a girl," "neither of the children is a boy," and "both of the children are boys." More than one of these observations may apply to any given family seen. The method used by the observer needs to be stated or assumed in order to get a solution. Further, some of the these methods could be biased. A method that allows the observations "one is a boy" and "one is a girl" for the same family, but not with equal likelihood, is biased. Biased methods cannot be assumed when solving a probability problem.

The following questions represent three possible cases where an unbiased sample set is created. The first is the only case that includes additional information about the child.

Question 1

A family is chosen at random from all two-child families whose older child is a boy. What is the probability that the younger child is a girl?

When the older child is a boy, then the elements GG and GB of the original sample space cannot be true, and must be deleted. The reduced sample space is the set {BG, BB}. All of the possibilities in this sample space are equally likely; but only one of the two, {BG}, includes a girl. So, the probability that the younger child is a girl is 1/2.

Question 2

A family is chosen at random from all two-child families where at least one child is a boy. What is the probability one child is a girl?

An equivalent and perhaps more enlightening way of stating this problem is "Excluding from the sample space any case that doesn't include a boy, what is the probability that two random children have different genders?" This form requires two conditions: the observer must look at both children to exclude a family, and the observer can only exclude families without a boy.

Only one of the possible sets of families, GG, fails to meet these criteria. The reduced sample space is the set {GB, BG, BB}. All of the possibilities are equally likely; but now two of the three, {BG, GB}, include a girl. The probability that a random two-child family with a boy also includes a girl is 2/3.

Question 3

A random child is chosen from a random two-child family. That child is a boy. What is the probability that the other child in the family is a girl?

An equivalent and perhaps more enlightening way of stating this problem is "Excluding the cases where you don't select a boy, what is the probability that a random child from a random two-child family has a sister?" Note that this is subtly different than the alternate expression for Question 2. The observer only looks at the gender of one child, not two. Some of the families that were included in Question 2 are excluded in Question 3.

This variation is more similar to Bertrand's Box Paradox, with four boxes instead of three, than it is to Question 2. The random event is that the observer found out that a boy came from a two-child family, not that a two-child family was selected from all families with a boy. Just like for Bertrand's Box Paradox, where the random event is that you opened a drawer with a gold coin, not that you picked a box that had a gold coin.

The original sample space must be expanded to account for which child was selected. That will be indicated by adding an "O" or "Y" to the family's designation, indicating whether the older child or the younger child is selected. There are eight possible children you could have selected this way:

{GGO, GGY, BGO, BGY, GBO, GBY, BBO, BBY}.

These elements properly represent children, not families as the sample spaces for Questions 1 and 2 did. And in fact, each possible family is counted exactly twice, once for each child. Four elements can be eliminated because the selected child is a girl. The reduced sample space is the set {GBY, BGO, BBO, BBY}. All four of these possibilities are equally likely. Two of them, {GBY, BGO}, include a girl. The probability that a random boy (child) from a two-child family has a sister is thus 2/4, or 1/2.

Perception of Paradox

The Boy or Girl Paradox is often used in recreational mathematics publications to illustrate how sample spaces can affect probability. Examples can be found from Marilyn vos Savant, Martin Gardner, and more recently in the book "The Drunkard's Walk: How Randomness Rules Our Lives" by Leonard Mlodinow. But most expressions of it will fail to specify what to exclude from the sample space. They will describe one family with the observation "one of the children in this family is a boy" as a means to include families in the sample space, but will not mention how to exclude families. Often, the publication will contrast this ambiguous problem to Question 1.

Whether passively or intentionally, most readers will assume either that one child was used to make the observation, or that both children were used AND that the only possible alternate observation is "neither child is a boy." The first reader will apply the solution to Question 3, and get the answer 1/2. The second reader will apply the solution to Question 2, and get the answer 2/3. This is the first paradox associated with the problem; getting two answers to the same problem. The two readers may not even realize they are making assumptions, so the twi answers will seem to be correct to each.

If the answer to the published problem follows the Question 2 solution, and especially if it is contrasted to Question 1, a second paradox can be perceived by the first reader above. The added information, that the one child observed is also the oldest child, cannot possibly affect the gender of the unobserved child. Which is correct, for that reader's assumptions. But the published answer assumed both children were observed.

Real Life Scenarios

Sometimes a puzzle will be described as a real-life scenario that attempts to describe how the information "one is a boy" was obtained. But it is unusual to encounter one that corresponds to the Question 2.

One such example, that has been attributed to a reader of Marilyn vos Savant, is that you found out about some friends who are giving away two puppies. You only want to get a boy puppy. So you call them on the phone and ask if one is a boy, and they say one is. What are the chances they also have a girl puppy? This satisfies the two conditions for Question 2: you want to know only about boy puppies, and the person observing the litter sees the gender of both puppies.

In a more realistic scenario, one or both of those conditions will not be met. You might obtain some information about only one child in a family, like if you meet one of the children (see Bar-Hillel and Falk), or only recall one. Or, you might get information about the children in a family and happen to get the information "boy," when "girl" was equally possible. That is, that "one of the children is a girl" was a possible alternate answer. Whether or not it is justified, people who object to the 2/3 answer for the basic question feel this last example is the case described.

John E. Freund has explained another seeming paradox, with similar aspects, that is based on a card game. If you are told partial information about a player's hand, the probabilities for the whole hand's makeup depend on how that information was determined. Freund imagined a spy that looked at the hand, which corresponds to the observer in this article. The spy will either always give a specific clue if that clue applies (corresponding to Question 2), or choose randomly between two possible clues when both applied (corresponding to Question 3).

A Possible Fourth Question

(The following is based on another example from "The Drunkard's Walk: How Randomness Rules Our Lives" by Leonard Mlodinow, but is slightly changed to simplify it.)

Some children use a double name, like "Christopher Robin" or "Peggy Sue." Assume that in a town of 10,000 two-child families, one out of every fifty children do so. Assume also that it happens independently in siblings.

A family from this town is chosen at random, from all two-child families where at least one child is a boy who uses a double name. What is the probability the other child is a girl?

This question is in the same form as Question 2, with an additional condition. Does that additional information about the boy's unusual name change anything? Applying the same solution as in Question #2, but using a Frequentist approach, the following table lists the expected numbers of all unique outcomes for this town, with the impossible cases crossed out:

Older child	Younger child	Frequency
~~Girl~~	~~Girl~~	~~2500~~
~~Girl~~	~~Single-named Boy~~	~~2450~~
Girl	Double-named Boy	50
~~Single-named Boy~~	~~Girl~~	~~2450~~
~~Single-named Boy~~	~~Single-named Boy~~	~~2401~~
Single-named Boy	Double-named Boy	49
Double-named Boy	Girl	50
Double-named Boy	Single-named Boy	49
Double-named Boy	Double-named Boy	1

If we eliminate all instances that do not meet our given criteria of at least one boy with a double name, then we eliminate 9801 of these families, leaving 199. Of those, there are 100 families with a double-named boy. So if the probability of a boy using a double name is 1 in 50, then the probability that a two-child family with such a boy also has a girl is 100/199, or just over 1/2. This is different than the answer to Question 2, which was 2/3.

This does not mean that knowledge of a boy's name can somehow change the probability that he has a sister, from 2/3 to nearly 1/2. As in Question 2, this result assumes we selected families because they included a child that meets certain criteria. We did not select from all 10,000 families in this town; we separated out 199 of them first, and then selected one of those 199 at random. The odds changed from Question 2, because we eliminated a greater proportion of the families with a girl before the selection. It is not a paradox, but it does happen for the same reasons the second paradox discussed above was perceived.

Suppose instead, as in Question 3, that we had met a boy who used a double name from this town, and knew that he came from a two-child family. Then we would count boys with double names, not families with boys with double names, similar to what we did for Question 3 (and also in Bertrand's Box Paradox). The odds that such a child has a sister are 100/200, or exactly 1/2. Which are the same odds we get if we observe that he uses a single name, or if we ignore his name entirely.

Finding out that a child has an unusual name does not change any probabilities; but changing the sample space to require we only consider a family with such a child does.

References

Martin Gardner. The Second Scientific American Book of Mathematical Puzzles and Diversions. Simon & Schuster, 1961. Republished by University Of Chicago Press, 1987, ISBN 978-0226282534.
"Mathematical Games" by Martin Gardner. Scientific American; October, 1959
Bar-Hillel, M. A., and R. Falk "Some teasers concerning conditional probabilities." Cognition, 11, 109–122.
Grinstead and Snell's Introduction to Probability, The CHANCE Project, July 2006
John E. Freund, "Puzzle or Paradox." The American Statistician. October, 1965.

External links

Boy or Girl: Two Interpretations
A Problem With Two Bear Cubs
Lewis Carroll's Pillow Problem
Marilyn is Wrong! criticism of Marilyn's answers��I just posted my extensive re-write of the article. I originally tried to keep many of the features it already had; but the more I wrote, the more I needed to change portions that were chasing red herrings. I also tried to include all points of view as fairly as I could. But obviously, I believe some points of view that are commonly held are wrong. The following is an outline justifying what I wrote. If you feel you need to change it, please address the justifications for it here instead of just changing it back. We can work on getting any valid point I missed in.

I also feel that all of the above discussion can be archived, as suggested at the top of the "Edit" page. Too much of it is about the problem, not the article. But I feel it should be a consensus opinion to archive it, not mine alone. My vote is "do it," and I might soon unless I hear otherwise.

1) The old article never really explained what the paradox was. That should be paramount, so I had to change the whole emphasis of the introduction. At first I added Q3 to the intro, but listing the variations was not as important as stating that the variations come from a common source while producing different answers. The list of the problem differences now appear only with the solutions.

2) It is not important whether or not you feel that Q3 applies to any actual questions. The people who get the answer 1/2 do feel Q3 applies, whether or not they could express it as I did. The paradox they see is because of that belief, so Q3 belongs in the article.

3) I kept the "common assumptions" mostly intact, but I think it used to mean "commonly applied" and I used it to mean "common to all variations." I added an explanation about the use of ordering to enumerate the sample space, hopefully to derail objections about using age when age isn't part of the problem statement.

4) Then I split out the assumptions not common between variations, since they are usually necessary, and do determine which solution applies. That is, how to assume families are excluded. Here is where my slant will start to show, since the assumptions for Q3 are much simpler, and so are easier to make via Occam's Razor.

5) I next listed the three variations, and a simplified (from the redundant ones before) solution for each. I tried to keep each one separate from the "basic question" so that people who objected to any one of them would associate their objection with the variation, not the basic problem. If you don't think any one solution is correct for its variation, you need to go back and see what its explicit sample space is. Specifically, for Q3 look at the Bertrand's Box Paradox article, but use four boxes instead of three. That is exactly what Q3 is. Your objection is that the sample space doesn't apply, not that the solution is wrong for that sample space.

To Arthur Rubin and AndyBloch: your alternate forms of the statement using the word "assuring" are wrong because they only apply the condition one way. "Assuring" that "at least one is a boy" does not mean you can't eliminate a family with a boy. Essentially, membership in a sample space needs to be an "if and only if" condition, and your statement only expressed half of that. This is one of the fundamental points leading to the perceived paradox.

In fact, I think it is the reason some highly capable mathematicians think only Q2 is possible. An event space is usually defined something like "the event X is all occurrences x given that fact F is true." Here, the word "given" establishes that F is true for every occurrence in the event space, and the word "all" establishes that it is false for every occurrence excluded from the event space. But the highly capable mathematician will associate both the "if" and the "only if" with the word "given" alone, just as you two did with "assuring."

Since most problem statements describe just one occurrence, they can't mean the "all" part. What they mean is only that F is true for every occurrence in the event space, not that it necessitates being in the event space.

5) Then I explained how the real paradox is not the math as it applies to the sample space, but what sample space to use for a word problem. Since this article is about a paradox, this is a critical point. And the problem is not with what families to include, but what families to exclude. I did include a link to a rec-math list of problems derived from Leonard Mlodinow's book; I wasn't sure about the policy for linking to the book author's own page (since I'm essentially saying he goofed) or to an ad to buy it; but you can get those from where I linked.

I suspect it was the release of that book that prompted renewed activity for this article, so I did want to get it in.

6) The examples of real-life word problems include two sources. I probably didn't set up the references well; I was interested in content first. The important point is that it is possible to get a either "1/2" or "2/3" as a correct answer in some versions of a Boy or Girl problem that don't mention age. And that examples exist in literature.

7) Leonard Mlodinow's "Florida" example raised a lot of stink when his book came out. It is based on the inverse of the article's problem, so it was changed to "Jacob" by whoever brought it here. I changed it one step further, to eliminate the worry that both children in a family could have the "unusual" name. That part of the issue was a red herring, and this paradox doesn't need any more of them.

It required a more complex solution than Q2, I kept the old article's table, which I edited but didn't re-write.

8) I avoided saying "you can only get the X answer to this problem" anywhere, although I did describe how hard it is to justify the Q2 answer, and how it requires unrealistic scenarios and/or assumptions. I can't help it, if that is true.

Because assumptions must be made in most cases, it isn't conclusive that 1/2 is the better answer; but you can't really justify 2/3 unless you make bad assumptions (like: you know which of the alternative observations are possible or impossible, when no information about it is provided). It is just hard to prove how bad they are to someone who won't see that the assumption is being made in the first place. But hopefully, I got the message across that there are assumptions.

9) About the references and links. Martin Gardner is on record as saying a word problem in this family is likely to be ambiguous, that the answer not unilaterally 2/3, and that good mathematicians can, and have, gotten it wrong by saying 2/3 was the answer. I have seen the article, but couldn't find an official citation on the web. You can read the excerpt by following the "Marilyn is Wrong!" link on the Marilyn vos Savant page, to "She's wrong about boys" and then a "letter defending Eldon." But I won't be more direct than that, to an unofficial source.

Bar-Hillell and Falk show how meeting one child in the family makes the problem Q3. Grinstead and Snell resist the temptation to describe the problem as a word problem, and only state the sample space for Q2 as any mathematician would. But they go on to say that real-world ways to create the problem have different answers. John E. Freund article is a good read, since it is about a different problem you may not be familiar with. Familiarity breeds lazy problem solving.

I left the others, with no endorsement or disagreement intended.JeffJor (talk)

JeffJors complete rewrite

I reverted this on a number of bases:

First is that there were abrupt, large changes to the substance of the article that had not been discussed and cannot be considered consensus. The article has been pretty stable. Such a large change requires justification. Second, this article isn't a textbook; your edits read like you were trying to teach content, which is not appropriate here. Third, you introduced a ton of statements that were not referenced (and deleted the no-references template).

I don't think anyone thinks that the article is a particularly good one; it needs work, and I'm sure some of your edit improved upon it. I see you've posted your reasons for the edits; lets go through them. --Thesoxlost (talk) 23:55, 8 February 2009 (UTC)

Other opinions

Hi all,

JeffJor and an IP address who may be JeffJor has made a substantial, contraversial change to this article. He, like many, many people before him, disagree with the content of the page, and has decided to overhaul the page with his own view, which substantially differs from the content that has been stable for quite a while. He has done so without WP:RS, clearly violating WP:OR. I don't feel like getting into an edit war; could someone else please put in an opinion? --Thesoxlost (talk) 03:32, 9 February 2009 (UTC)

I agree that this version looks very much like WP:OR. It reads like a personal essay on the problems inherent in the paradox, rather than a description of the paradox. At the very least, I think such a radical change should be discussed here before being implemented. I fail to see that the new version is any better than the old, stable, consensus version. SNALWIBMA ( talk - contribs ) 08:44, 9 February 2009 (UTC)

Defense of my Changes

The changes I made are no more a "personal essay" than what it was changed back to. The so-called paradox does not involve the answers to the questions (Q1) "A random two-child family whose older child is a boy is chosen. What is the probability that the younger child is a girl?" and (Q2) "A random two-child family with at least one boy is chosen. What is the probability that it has a girl?". Those are well understood; and when phrased that way, uncontroversial.

The paradox comes from the question (Q0) "You are told that a two-child family includes at least one boy. What is the probability that it has a girl?" This question is ambiguous; one possible meaning is Q2, but another is (Q3) "You know that a randomly selected child from a two-child family is a boy. What is the probability that the family has a girl?"

What are called "mistakes" or "confused answers to the second question" are neither mistakes, nor confused answers. They are inaccurate descriptions of correct answers to Q3. And my change was not a personal essay at all; I provided references for what I said, that contradict several points in the current article. I put some facts together in a way you may not have seen before (and yes, I did provide an original alteration for the "Jacob" question, but it was only a cosmetic change); but they still are verifiable facts, for which I provided references.

The article is lacking in several ways, and needs to be updated. I am willing to discuss it, but we have to look at all the facts to do so, and this question has a way of making people only consider that facts that as they apply to their preferred side of the ambiguity. A terse list of the missing facts include: what the paradox is, what the controversial question is, how it is ambiguous, how the definition of the sample space requires a resolution of that ambiguity, and how both of the answers, 1/2 and 2/3, can be valid, depending on that resolution. These all must be included, or else the title needs to be changed to "Some people's opinion about what should be the answer to the Boy or Girl Problem." Because that is all the current article is.JeffJor (talk) 19:10, 12 February 2009 (UTC)

Yes, the English language is very often ambiguous. I don't know who changed the first question from Q0 to Q1, but that should be changed back. The problem is always posed in common English. But Q0 cannot be interpreted as Q3; there is no reason to think that the sample space for Q0 is "all boys;" clearly it is "all families," which specifies Q2 and excludes Q3.

Of course there are mistakes. People are not always thinking of Q3, and simply answering the wrong question. They often think "knowing that one is a boy in no way affects the gender of the other." This would be a true statement if we were asking Q3, so by asking Q3, we make their answer correct. But that does not mean that they were thinking of Q3.

Lastly, and perhaps more importantly on a meta level, have you been using numerous IP addresses to make repeated reverts over the last week? --Thesoxlost (talk) 21:45, 12 February 2009 (UTC)

Whether or not you believe that Q0 cannot be interpreted as Q3, many people who read Q0 do. Your belief is irrelevant. That is why there is a paradox. Whether or not it can be so interpreted is also irrelevant to the requirements of the article, because many people believe it can. The point needs to be addressed. But in fact, it can be interpreted that way. Example: If you met just one child in the family (it doesn't matter whether it is the older or the younger), "At least one child is a boy" is a perfectly valid statement to make. And this is not WP:OR, Martin Gardner and the other sources I cited said it. Please don't use your own opinions as justification for the article's content.

Nothing about the quesiton indicates "all families." The mistake you are making is that you are taking "at least one boy" to be a necessary and sufficient condition for an outcome to be in the sample space. Any statement specifically about one occurence can only be a sufficient condition.JeffJor (talk) 23:51, 12 February 2009 (UTC)

But it is true that "knowing the gender of only one child in no way affects the gender of the other." It doesn't matter how you picked the child you know about, as long as the method is independent of gender. Since "knowing the gender of only one child" is a valid way to arrive at the condition is Q0, these answers are not mistakes. They are operating from a different assumption that resolves the ambiguity, than you are. And you are assuming, because Q0 is ambiguous. Your point is that if "they" were thinking of Q2, 1/2 is the wrong answer. Mine is that if "they" were thinking of Q3, 2/3 is just as wrong. So the article needs some other reason, besides your assumption that "they" were thinking of Q2 to state one answer is correct. And yes, I have posted from two addresses.JeffJor (talk) 23:51, 12 February 2009 (UTC)

"A family has two children, and at least one is a boy." clearly defines a set of objects (i.e., families) that meet a certain set of criteria (i.e., "P has two children" and "P has at least one boy"). Its not so much an interpretation as a conversion into predicate logic.

Even if that were true (and it isn't, more on that next), "A family has two children, and at least one is a boy" is not Q0. Q0 is "You know that a family has two children, and that at least one is a boy." That is clearly defining only two facts - that you know there are two children, and that you can attest to the gender of only one. It is clear, because those are the only two facts the real Q0 says you know. My Q0 is the form of the question that makes some people answer "1/2."

You have not supported your assertion in any way, except to say "it is clear." How do you think it clear? Because I don't see it. All I see is a misapplication of logic. What your Q0 statement says is "If a two-child family is in the appropriate sample space for this problem, then it has at least one boy." What you are saying is "clear" is the statement "If a two-child family has at least one boy, then it is in the appropriate sample space for this problem." These are not equivalent statements. And in fact, I have demonstrated with an example a way to make the statement "A family has two children, and at least one is a boy" that is of the form Q3, not Q2. You met one child, at random, and the other's gender is unknown to you.

I think the consensus of other editors (those that reverted your posts, and the ones that have supported the current relatively stable version) is that this is a clear interpretation of Q0 that involves no assumptions. In order for Q0 to be "interpreted" as Q3, you had to stipulate that "If you met just one child in the family..." That is an additional bit of information that is not provided in the original question. Your additional requirement makes it a different question than Q0. There is no additional bit of information that is necessary to "interpret" Q0 as Q2. --Thesoxlost (talk) 00:56, 13 February 2009 (UTC)

So, you are saying that it is indeed the WP:OR of these editors that there is no paradox. I'm sorry, but there is, and the literature concering the topic agrees with me. Look it up in the references. The statement of Q0, as I expressed it, is ambiguous. Whether or not you agree, that ambiguity is what causes the paradox, and it deserves to be addressed in an encyclopedia article that claims to be about the paradox.JeffJor (talk) 03:04, 13 February 2009 (UTC)

Jeff, this conversation cannot continue while random IP addresses with no editing history repeatedly revert the consensus view. Currently, the only identified user who holds your view is you; There are at least 3 people who disagree with you. You do not have consensus. You ignored my earlier question: "Have you been using multiple IP addresses to edit the content of this page?" If so, then you are breaking wiki guidelines on edit warring and using ip sockpuppets. It is relatively easy to confirm; we would just need to ask a CheckUser. If you are involved in this behavior, please admit to it and stop. --Thesoxlost (talk) 03:14, 13 February 2009 (UTC)

Try Again

I admitted above that I posted on this discussion page from two IP addresses. But I edited the article just once for this change, from my main IP address - if that wasn't clear before, I'm sorry. I misunderstood what you were asking. I try to always log on to "JeffJor," and sign what I add. I may have slipped and left a signature out, or assumed that disconnected paragraphs posted at the same time would be recognized as such, but I do use the account, and I am the only one who does. As far as I know, that is one of the purposes of creating an account, and not "sock puppeting." I also admit I am a Wikipedia novice. I couldn't follow exactly who was changing the article after I edited it, or what the changes were; but it wasn't me.

I am not the only one here who recognizes that the question is ambiguous, there are many who agree and who have posted in this discussion page. They just don't seem to be stubborn enough to risk getting into edit wars over the article. I also have cited known experts in the field who have published the same views I have stated. That is, as far as I know, what consensus is supposed to be built upon. Not our own opinions.

And you do continue to base your interpretations on opinion, not required interpretations that are universally held. There is added information needed to interpret Q0 as either Q2 or Q3; you are just assuming one is implied (I think by the fact that "family" is stated first). You need to know if the information "two-child" and "includes a boy" is describing a family and is determined by looking at the both children (Q2); if it is describing a family and is determined by looking at only one child (Q3) (this is entirely possible since the statement "at least on is a boy" is an incomplete determination); or if the information "from a two-child family" is describing a boy (Q3) (this is easier to create an example for, and is a variation of the previous option). Here is yet another example of a word problem that can be used to illustrate how Q3 can be a valid interpretation:

You are supervising the door prizes at the annual family picnic for Equiprobability City Elementary School. True to its name, of the 48 families in attendance, all have two children and they are evenly distributed among the four possible orderings BB, BG, GB, and GG. Each child was supposed to get a blue or pink door-prize ticket, allowing them to choose from the prizes for boys or girls, as they entered. However, the tickets were late in arriving. So when they finally came, you asked each family to send one of their children to the prize booth in order to get their family's tickets. Again true to the name, 24 Equiprobability City boys and 24 Equiprobability City girls lined up at your booth. What are the odds that a boy in this line needs both a blue ticket and a pink ticket? What are the odds that a girl in this line needs a blue ticket and a pink ticket?.

The complete set of information that you know about a boy in this line, is that he is there representing a "family (that) has two children, and at least one is a boy." That is your statement of Q0, and it applies to every boy in line. You said that this statement "clearly defines a set of (families) that meet (a set of two) criteria (i.e., 'P has two children' and 'P has at least one boy')." This is incorrect. English can be flipped around, and that statement can be about the boy OR the family. In this example, the boy represents only a subset of the set you describe. There are some families in the set you describe, who sent up a daughter. A boy does not represent them.

The tone of the article seems to say that both probabilities I asked for are 2/3. That means 32 families need both kinds of tickets, 8 need two blue tickets, and 8 need two pink ones. We know this is incorrect; it must be that 24 families need both kinds, 12 need two blue tickets, and 12 need two pink ones. That required result can come only from using "1/2" for the two probabilities I asked for, however you achieve them. In other words, it is provable by working backwards from the even distribution of families, that the probabilities I asked for are both 1/2.

The paradox in the Boy or Girl Paradox is that some people interpret "A family has two children, and at least one is a boy" differently, and so set up the problem differently. If the two criteria that you mentioned are applied to the family; that is, if all families of two that include at least one boy comprise the sample space, then Q2 is the actual question and 2/3 is the correct answer. But if the two criteria are applied to the child; that is, if a random child is known to be a boy from a two-child family, then Q3 applies and the answer is 1/2.

Since this is the paradox, the article needs to address it. I understand that your position is "While Q0 may appear ambiguous, Q2 is the only valid interpretation of it without further information." Not everybody who has posted here, and not all experts who have published Original Research on the subject, agree with that. I assume you think they are wrong - that is irrelevant. But yes, there are others in both groups who do agree with you; I think they are wrong - that is also irrelevant. Neither of our opinions define what should be in the article; it is published experts, and experts in both camps exist. That's the paradox. Now, I have no wish to start an edit war, which is why I have not edited the article a second time, YET. But I am trying to work out a way to include the mention of the actual paradox in an article that is ostensibly about a paradox. Not to have it say that one side is always right, or always wrong; because neither is true. But that means the article can't claim only one correct answer, as it currently does.

Now that I have your attention (I did try before), I am going to try, again, to create an acceptable rewrite. I'll start from what I posted before, but some will change. I'll try to include both points of view equally, but both will be there. The outline is (1) A statement of Q0, (maybe Q1), Q2, and Q3; (2) Say that Q0 is ambiguous, since it doesn't say whether the information applies to the boy or the family; (3) List common assumptions (to avoid red herrings); (4) Discussion of the conditions under which Q2 and Q3 apply (Q2 means both children's gender is used to include a family in the sample space, and Q3 means that one child's gender is used and the other must be ignored); and (5) Solutions to (maybe Q1), Q2, and Q3; (6) Maybe move an explanation of Q1 as version of Q3 here; and (7) Only a statement of the paradox, that different people see Q2 and Q3 in the same problem.

What I will try to exclude are the red herrings: we don't need to mention "intersex" children, unequal ratios of boys and girls, the differences between Bayesian and Frequentist approaches, mistakes anybody thinks are made, or "incomplete problem statements." Most of those are red herrings, and are explained away in that outline. They draw attention away from the issue at hand. (It may be that your sock puppeteer is who is putting those back. They don't help, whoever you are.) I intend to leave out any commentary about which interpretation is more realistic. I will post what I write either here on the discussion page first, or replace the article (the guidelines say to "be bold" when editing, which is why I changed the article before). I will look forward to your comments, suggestions, and any changes EXCEPT the deletion of Q3 from the article. Without it, there is no paradox. We can leave the two options there, and let our readers decide which, if either, interpretation they think is proper.

What I'd like to know right now is (A) Where you think I should post it first; (B) Whether you think Q1 could be moved as I suggest; (C) Should word-problem examples like my Equiprobability City one be included (and from where - I can copy from literature instead of making one up); (D) Should the "Jacob" version be included - it seems to inherently be a word problem - and is the "Jacob" or the "Christopher Robin" version preferable (even if the article states that we allow two Jacobs in a family, people will chase that red herring); and (E) Any justification you feel is needed for defending Q2 as the only possible interpretation of Q0. If you give me one, assuming it says something other than "it clearly follows" or "most editors here agree with me," I will include it; but then I will provide a counterargument for why Q3 is possible. That isn't hard. It is possible to know that a boy comes from a two-child family, without knowing the gender of his sibling. Q0 can be stated for such a child, and Q2 cannot. Your "clearly" made "conversion into predicate logic" has a trivial counterexample.JeffJor (talk) 18:16, 13 February 2009 (UTC)

Not Me

I'm writting this to state, as emphatically as I can as a novice on Wikipedia, that I have not edited this article between 8 Feb 2009 and today, 18 Feb 2009; and that all accusations that seem to say I am using sockpuppets are incorrect and based on the asssumption of bad faith. I want to improve this article, but only to get the point across that many expressions of the problem are ambiguous, but not universally seen as such. That nobody is "wrong" except in the way they don't recognize how others are interpreting the ambiguous parts in a different way.

To all who continue to change it: it does not help to continuously change the ambiguous formulations. We have to state how they are ambiguous (whether you, as an individual, see the ambuiguity or not), state that not everybody agrees about the possible interpretations, and finalize the article with each possible solution associated with as clear a statement, of the problem that goes with it, as is possible.

Stating "one of the children is a boy" in any part of the description is potentially ambiguous, depending on how the information was determined. If determined from looking at both children, and if it was pre-determined to answer "one is a boy" whenever possible, the answer is 2/3. If determined from one child only, or if it was possible to state "one is a girl" when it was also possible to say "one is a boy," then the answer is 1/2.

The unambiguous formulations are "A family is selected from all two-child families that include at least one boy," and "A family is selected from all two-child families, where all that is known is that one random child in the family is a boy." The answers are 2/3 and 1/2 for these formulations, respectively. You may not agree about which conditions I described in the preceding paragraph go with either problem statement, but that is the "Paradox," and is what this article needs to address. JeffJor (talk) 22:26, 18 February 2009 (UTC)

I agree that this page is unnecessarily complex for such an easy problem. A total rewrite would be a good thing. By only accept sourced edits and defer general discussion of the problem per se to a new talk page devoted to that ("Arguments page") might do the trick. This main talk page should only contain purely editorial discussions. Please see the talk page of Two envelopes problem for an example of the structure I mean. iNic (talk) 11:52, 19 February 2009 (UTC)

JeffJor, there is no assumption of bad faith, there is a fact of disruptive editing. There is an IP editor who is reverting changes repeatedly in clear opposition to the consensus established by numerous other editors. The pattern and contents of the edits indicated that the editor was you. It does not violate WP:AGF to try to stop a disruptive editor, and if the evidence is that you are that editor, it does not violate WP:AGF to accuse you of it. The edits made by the IP user are precisely what you have argued in the talk page--even this post! A CheckUser has said that your IP does not match those of this editor. Of course, this is not conclusive, but it is sufficient for me to give you the benefit of the doubt. But the problem still exists: there is still a disruptive editor who needs to be stopped. Once this user stops the disruptive and counter productive behavior, then we can meaningfully engage in a discussion of the content that you and this IP user are interested in adding. --Thesoxlost (talk) 14:37, 20 February 2009 (UTC)

No evidence connected it to me. When you asked if it was me, and I told you clearly that it was not, and then you again said it was me. You said my answers were "evasive" when they came directly to the point of saying it was not me. I asked the disruptive editor to stop. And again you said it was me.

The disruptive editor does not share my views, as far as I can tell. They are that the problem statement I call Q0 is ambiguous: every solution presented is correct for some interpretations it. So anybody who reverts the article to that say any solution is wrong, to connect the solution S2 to Q3, or to connect the solution S3 to Q2, does not share my views in total.

But what you don't seem to realize is that there are others who do share some of my views, that they are valid, and that a consensus might be reached that doesn't agree with your point of view on this issue. This discussion page is littered with comments about how the question is ambiguous, which you deny. It is ambiguous. That's the paradox, that different people can't see the different interpretations.

I'd like to work with all editors, including the disruptive one if he will, to work it out. As I said before many of the edits you think I made. That can't happen until we all agree to discuss it WITH AN OPEN MIND. That includes you, and you haven't done that so far. You have lumped all views that don't equate Q0 with Q2 together, which is why you think this disruptive editer shares my views.

I suggest we revert the article to what it was before I edited it, and leave it there for a while. That includes the disuptive editor, PLEASE. We will accomplish nothing by this war. Let's start a dialog about how to include the actual problem, ambiguous questions, in the article, and stop accusing anybody of being wrong about it. JeffJor (talk) 19:31, 21 February 2009 (UTC)

Jeff, thou dost protest quite a bit. I don't care if its you. I believed, and still do, that it was, but that only affected how I addressed the problem. The "war" is over; the IP contributors have not engaged in discussion, thus the edits are clearly disruptive, and the page has been protected. If that person signs in, they can make edits. Now that edits can be stable, we can actually discuss content. As I have said in the past, I share some of your views, and I agree that the article can be improved.

No one, when figuring out the problem, says "Ooh, I thought they were sampling the boy at random, not the family! Of course, if the family is randomly sampled, the probability would change! This is just a semantic trick!"

The statement of the Q1 is "A family has two children, and the older is a boy. What is the probability that they have a girl?" The most straight-forward translation of this statement into predicate logic is:

(1) There exists a family, X, such that contains_two_children(X), older_is_boy(X)

That defines the space of possible events; the space of "successful" events is:

(2) There exists a family, X, such that contains_two_children(X), older_is_boy(X), has_a_girl(X)

If you take all occurrences and assign truth values to each of these statements, the probability equals the ratio of the frequency of (2) being true divided by the frequency of (1) being true.

Your interpretation would be something more like:

(3) There exists a boy, X, such that is_member_of_family(X,Y), contains_two_children(Y), older_is_boy(Y), has_a_girl(Y)

This would be a reasonable interpretation of a separate question: "A boy is a member of a family with two children. What is the probability that the family that this boy is a member of has a girl?"

But interpretation isn't the key issue here; there are many people who do not interpret the problem as you have laid out, and yet they still get it wrong. They still see Q1 and Q2 as equivalent. And for these individuals, the paradox exists and not because of ambiguity. Yes, changing the question to mean (3) instead of (2) will eliminate the paradox. Many changes would eliminate the paradox. That is not the issue. The issue is whether the paradox exists when (2) is intended, and the question is interpreted as such. In these cases, the paradox still exists.

Many formulations of this problem remove the ambiguity explicitly. They say "You meet an old man on the street. You ask him if he has any children, and he says he has two. You ask if he has a boy, and he says, 'Yes, ...' but is cut off by a bus passing by. What is the probability that he has a girl?" This is the paradox. --Thesoxlost (talk) 14:43, 23 February 2009 (UTC)

Thesoxlost, I "wast [sic] protesting quite a bit" because you were labeling me as the sockpuppet in public. I don't care what you personally think (you are wrong there for reasons similar to why you are wrong about this probability question) if you will talk about the problem, but you were propagating the assumption of my responsibility in public, based on your personal opinion.

But yes, people do think the sample is about one child. But I think you are over-emphasizing what I said about "sampling the child." Yes, the sample provides information about a family; but the information is specific to only one child in that family. The information conveyed by Q0 does not have to be "I know about both children; at least one of the two is a boy but I'm withholding the gender of the other." It can be "I have determined the gender of one of the two children, and that child is a boy." Many people think it means this. This isn't an empty assertion, like your claim is. You can see it happening on this talk page, by people who signed their names. Leejc was the last one to do so. There were many others. You can look it up in my references (Grinstead and Snell, Bar-Hillel and Falk, Martin Gardner) and find it happening. I'm not mentioning these links just to fill space; look it up. IT HAPPENS.

One of the worst examples of misapplying that difference is from Leonard Mlodinow's book. He worded it differently in different parts of his book, I guess not realizing the difference, and so created a case where he explicitly said Q0 should be interpreted as I am describing. I read the book in the library; it isn't in front of me, so I have to paraphrase. The first was something like "You recall that a certain family has two children, and that one of them is a boy (he actually used ‘girl'), but you DON'T KNOW IF BOTH are boys." He specifically said that the information was about only one child. The other version, which he claimed was equivalent, was more along the lines of "at least one is a boy."

Your use of predicate logic is misapplied. Predicate logic is for when you define a set by "specifying a property that the elements of the set have in common." That is not what Q0 does. It gives an example of an outcome in a set of outcomes; but that doesn't have to define the set. The condition "at least one is a boy" is a necessary condition for an outcome to be in the sample space, but it is not a sufficient condition. No part of Q0 says that every family with at least one boy must be included in the sample space. The sufficient condition is "whatever method was used to determine that one child is a boy in the known occurrence, will determine that one child is a boy in every element of the sample space." And there is the ambiguity - we don't know that method. If it looks at both children and withholds information about one, it is Q2. If it doesn't look at both children, then Q3 is the correct approach.

Now, let's get the issues straight: I don't care about any problem statement that mentions age. That is, Q1. I'd like to drop it. But the variations we are discussing, slightly re-worded to emphasize the differences I see, are:

Q0: You observe that a two-child family includes (at least) one boy. Q1: A random two-child family is selected, and you observe that the older child is a boy. Q2: A family is selected at random from all two-child families with at least one boy. Q3: A random two-child family is selected, and you observe that one of its children is a boy without observing the other.

Note 1: Q1 is a special case of Q3. I think we need to avoid Q1 because of the connotations it raises, about how you picked the child to observe. It is a red herring. Note 2: These forms emphasize how some are describing an outcome (an observation of a family after it is selected), not defining the sample space of families that could be observed. Note 3: Observing a fact about one outcome does not make that the predicate P that defines the sample space. Note 4: The observation described in Q0 can be made by the method described in Q3.

You said, and I'm editing it to drop Q1, "there are many people who do not interpret the problem as you have laid out, and yet they still get it wrong. They still see Q3 and Q2 as equivalent." I don't think that is true. Those people see Q0 and Q3 as equivalent, where you see Q0 and Q2 as equivalent. When they say "Q0 is Q3," you hear "Q2 is Q3." That's what you are observing here. That's the paradox you are seeing, as I'll get to below.

The critical requirement for ANY version of a question to be Q2, is that somebody, somewhere, has to know the genders of both children; but some circumstances prevent you from getting that complete information. This isn't an assertion, it is part of the criteria you are applying to define the sample space for Q2. A family can't be excluded unless both children are known to be girls. Only a two-boy family will always be included by looking at just one child. In your "passerby" version of Q2, the "bus cuts the speaker off" part is the circumstance that prevents you from getting the complete information. I have never seen anybody misinterpret it for Q3, because they can see that information is withheld. In the "two puppies" version of Q2, the circumstance comes from you asking if one puppy is a boy. I have never seen anybody misinterpret that one as Q3, for the same reason. But these versions of Q2 are brought in at the end of the discussion, to try to prove that 2/3 is the answer to Q0 by giving an example of Q2.

Q0 does not contain those circumstances; and some versions (Mlodinow's) specifically deny they exist. If a reader won't assume those circumstances exist, or is specifically told they don't exist, then that reader must answer 1/2 because the question he sees Q3. He sees that some families that do include one boy can be excluded from the sample space. That's why people dispute Mlodinow's Q0, and Jeff Atwood's Q0 (the "passerby without the cutoff by bus" version). SEE ABOVE, or the blogs about those problems, for many examples of what you say does not happen. Not as many dispute the BBC's Q0, the "If a family has two children and at least one of them is a boy" version. Some see it as talking about both children. But some do dispute it, because the information they see conveyed in it is that "one is a boy and one is uncertain." They are not mistaking Q2 for Q3, they are legitimately seeing Q0 as Q3.

The article can include your examples of ways for Q2 to happen when Q0 is described - but then it also needs to include Q3 examples like "You meet a man taking a walk with his son, and the man tells you that he has two children. What are the odds his other child is a girl?" Because this fits Q0 as well as, or better than, your passerby version or the puppies version. Because Q0 does not include the circumstances that make it Q2. Because there are papers in the published literature about this problem that say the answer to this version of the problem is 1/2. JeffJor (talk) 22:01, 23 February 2009 (UTC)

Re the sockpuppet, this is the last I will say on the subject: When you hear hoof-beats, don't assume its a zebra. Or perhaps more applicable, when you hear hoof-beats, and look up to see a lone horse, its reasonable to assume that the hoof-beats came from that horse; its unreasonable to consider the possibility of invisible zebras. I have nothing against you; I just heard hoof beats and saw a horse. I won't discuss the issue anymore.

First off, one of the reasons that few people have engaged you is that you write 3 pages every time you post. Please be concise; if you have a point, get to it, and let it stand without additional verbiage. If you can remove a paragraph without substantively weakening your central point, please do so.

Now, I disagreed with a ton of what you just said, but here is the central point: saying "I don't like it" or "He misunderstands it" or "it is ambiguous" does not change what something is. I've produced many references that discuss this problem. Each one uses slightly different language, but they are all far closer to the current formulation than yours. Ideally, we would have a reference to the first academic discussion of this problem. Then we would use that as the proper, or at least initial, formulation of the question. In the absence of that, we have to go for consensus: is there a consensus, across publications, as to what the "Boy or Girl paradox" is? Is there consensus as to how it is phrased? If there is, then that is what goes in this article. It doesn't matter if we like it, or if we think its ambiguous, or whether we think an alternative formulation is superior. We are not imposing our own views here, we are abstracting information from the existing, published literature. The article currently does not meet that standard, but we need to move in that direction. Anything relevant that you produce that can be directly attributed to a WP:RS can be incorporated. --Thesoxlost (talk) 23:01, 23 February 2009 (UTC)\

The assertion "it works this way" cannot address whether a situation is ambiguous unless you show why it can only work this way, or why an example of it working another way is invalid. You have only provided assertions for why Q0 must be interpreted one way; I have provided reasons for, and examples of, it happening another. You have provided references that discuss it happening one way that do not address the plausibility of it happening another; I have provided references that address both possibilities and discuss how and why they are different. The best is probably Freund's, even though it is about a different problem that shares the issue about "at least one has value X." Read it. Read Bar-Hillel and Falk. Look at what Martin Gardner, a very respected name because he does not approach issues with an agenda, says about the problem being ambiguous. Look at my examples, and tell me why Q0 does not describe them, or why Q2 does,

The biggest roadblock here is that there are two approaches to Q0: the logical, formulaic one and the intuitive, conceptual one. Both need to make assumptions to solve Q0. The formulaic assumption is the condition "both children are observed to determine 'at least one is a boy.'" But the logical, formulaic approach uses circular reasoning to justify it. The problem must have a formulaic solution, so that condition must be implied in the problem statement. Because it is implied by the problem statement, it can't be an assumption. So the problem can be solved by formula, completing the circle. In other words, the assumption is passive; it frequently goes unrecognized. That happens in all of your references. The intuitive, conceptual approach does not read more into the problem statement than what is stated: One child is known, and one is uncertain. One has to actively assume how the uncertainty applies to solve it, and so the solution comes with a caveat that the logical, formulaic solvers see as a "mistake." And that is the paradox, because it isn't a mistake.

To whoever wrote the next part: Consensus is valuable because it prevents one person from inflicting their own views on the article - you have to convince others. It doesn't help if you don't sign on with a name; people will disreagard your arguments whether or not thay are valid. You do more harm to the cause you want to support than good. Create an account, and log in to post. JeffJor (talk) 14:07, 24 February 2009 (UTC)

Consensus? What consensus?

Some insist consensus version. But consensus of how many people? Of which how many have substantial mathenmatical knowledge? If you want to edit, read academic articles, not just popular books or something. e.g., see

Some teasers concerning conditional probabilities Cognition, Volume 11, Issue 2, March 1982, Pages 109-122 Maya Bar-Hillel, Ruma Falk

which also cites Martin Gardner 1959 book.--133.9.4.12 (talk) 07:56, 24 February 2009 (UTC)

Well, what do you think?

Bar-Hillel & Falk (1978, 1982): "Mr. Smith is the father of two. We meet him along the street with a young boy whom he proudly introduces as his son. What is the probability that Mr. Smiths other child is also a boy?"
Gardner (1959): "Mr. Smith says, 'I have two children and at least one of them is a boy.' What is the probability that the other child is a boy?'

Gardner's formulation is similar to the one in the popular press, and both Gardner and Bar-Hillel say the correct answer is 1/3. Bar-Hillel says that Gardner is asking a different question, but does not consider it to be ambiguous. What would you like to do, 133.9.4.12?

--Thesoxlost (talk) 14:55, 24 February 2009 (UTC)

Again, you are reading selectively to see the answer you want. Bar-Hillel and Falk list possible approaches to the problem in the first section of the paper. They clarify it in section 3, where they say:

"Contrary to the second approach to Problem 1, which viewed the three remaining family types as equiprobable, they are seen not to be. Realizing that a father of two boys is more likely to pick a boy for a companion (in fact, it is a certainty) than is a father of a boy and a girl (in which case it is a toss-up), it becomes clear that the observation Bm renders the event BB more probable than either BC or CB. It is essential to notice that the conditioning event should be phrased not as 'Mr. Smith has at least one son', but rather as 'a randomly encountered child of Mr. Smith is a son'. Under the usual assumptions, the former has a proability of 3/4, the latter of l/2."

And they go on to say the correct answer is 1/2. Gardner addessed the issue several times, trying to be careful about the ambiguity. In the October 1959 issue of Scientific American, he said:

Another example of ambiguity arising from a failure to specify the randomizing procedure appeared in this department last May. Readers were told that Mr. Smith had two children, at least one of whom was a boy, and were asked to calculate the probability that both were boys. Many readers correctly pointed out that the answer depends on the procedure by which the information "at least one is a boy" is obtained. If from all families with two children, at least one of whom is a boy, a family is chosen at random, then the answer is 1/3. But there is another procedure that leads to exactly the same statement of the problem. From families with two children, one family is selected at random. If both children are boys, the informant says "at least one is a boy." If both are girls, he says "at least one is a girl." And if both sexes are represented, he picks a child at random and says "at least one is a ..." naming the child picked. When this procedure is followed, the probability that both children are of the same sex is clearly 1/2. (This is easy to see because the informant makes a statement in each of the four cases -- BB, BG, GB, GG -- and in half of these case both children are of the same sex.) That the best of mathematicians can overlook such ambiguities is indicated by the fact that this problem, in unanswerable form, appeared in one of the best of recent college textbooks on modern mathematics.

JeffJor (talk) 17:04, 24 February 2009 (UTC)

Jeff, if you think I'm engaged in casuistry, keep it to yourself; it isn't constructive to make personal attacks.

Please continue with your quote from the 1959 issue of Scientific American. I don't have access to it. I want to know what Gardner goes on to say, and why he is saying the the question is ambiguous when he himself (according to Bar-Hillel) asked it.

Freund's work requires too much WP:OR to link to the present debate, but I like the Gardner Scientific American bit. I would oppose your re-writing the problem to be "unambiguous," but I would not oppose citing the original formulation of the problem, variants of the problem, and a discussion of ambiguity (to the extent that can be supported. --Thesoxlost (talk) 21:41, 24 February 2009 (UTC)

I'm not personally attacking anything, I'm pointing out to you that it appears you misread (whether deliberately or not) the first section of that article. It very clearly outlined two solutions in Section 1, with both answers 1/2 and 1/3 listed. It stated they were in contrast to each other, and deferred comment on which was correct. You can't have seen the one without the other if I recall the article correctly, since the one you quoted was stated second. Now, I'm willing to put this bad footing we seem to have started on aside. Are you?

I don't intend Freund's work to support the article, but to support the effort to arrive at a consensus about the article. I don't have access to the full Gardner quote. I haven't found it online, only in an internet quote somebody else made from a photocopy which I quoted verbatim. Follow the references on the Marilyn vos Savant entry to "Marilyn is Wrong;" to where she was wrong on this subject. Look for the "Letter defending Eldon." Eldon is a little over zealous, but does correctly point out incorrect answers by others. I can tell you I once verified the quote at the library, but I don't know what else it says. My recollection is that Gardner posed the problem in May, with the "1/3" answer, but agreed later that the problem as stated was ambiguous.

As to your suggestions about the rewrite, I thought that is (mostly) what I had done. The outline I suggested above is closer, and I believe I have demonstrated extremely good faith in not editing it in while waiting to get some conformation like this. But I did pass along all those references when I edited it before. JeffJor (talk) 22:29, 24 February 2009 (UTC)

Second question, Bayesian approach

There is a (poorly formateted) suggested Bayesian approach which incorrectly assigns the chance of the younger child being a girl to be 2/3. This does not take into account the specific requirement of the younger child being a girl. Because it is misleading (and badly formatted anyway) I have removed this approach. Padillah (talk) 21:12, 24 February 2009 (UTC)

Because of the edit war which I had not part in except to lob the first stone, the problem changed from asking "Is there a girl" to "is the younger child a girl." Thatneeds to go away, but I'm hoping for a large rewrite, so I ignored it. JeffJor (talk) 22:29, 24 February 2009 (UTC)

Possible New references

In looking for the Gardner Reference, I found two other references for a concept I think is critical, and being overlooked by the recent literature that has been cited. One I read on-line, and one I saw only described. From Leviatan came this verson of the problem:

Mrs. Cohen has a son: Mrs. Cohen gave birth to (non-identical) twins. You know she has a son because you remember that there was a brith-party (circumcision) soon after birth. You meet Mrs. Cohen with one baby (unfortunately dressed in yellow). What is the probability that it is a boy? (Many students will say 1/2 because they don’t know how to make use of the extra information!)

This is about the best realization of a Q2 question, with a description of why it is Q2, that I have seen. The "extra information" is twofold: (1) You know that at least one child is a boy, and (2) You know that this fact was determined by an external mechanism that considered both children and asked "is at least one child a boy?" That mechanism was the "birth-party" or briss; a religious event held only for a boy child, eight days after birth. It is the second part that is missing from most story problems that are trying to ask Q2, and that absence makes them ambiguous. It is the fact that most of the references currently in the article (BBC, Atwood, Fukamachi, Ingram, Mlodinow) are assuming, without justification. One of those (Mlodinow) partially contradicts it by implying you might not have considered both children.

The other reference applies to the same issue, and apparently goes further by saying the answer is 1/2 without part (2). In a description of Martin Gardner's book "Aha! Gotcha: paradoxes to puzzle and delight" included an isomorphic coin problem that I think matches the one far up[ in this discussion page (flip two coins, tell someone "at least one heads"). The reviewer said "... two coins flip four equally likely ways. To flip two coins three equally likely ways, the null [sic] must be selected, prior to the flip. As Martin Gardner said, 'the flipper must agree in advance...'" ... to ask about heads, I am assuming the reviewer meant. I will continue to try to find that book. JeffJor (talk) 14:02, 25 February 2009 (UTC)

Jeff, you've construed my statements as support of my position when they have not been intended as such, so I'll be explicit: this post neither supports my position nor yours.

It seems to me that if we consider ambiguity, (1) she gets Mrs. Cohen wrong, and (2) Mrs. Cohen is very different than her two aces puzzle. I still believe that the "ambiguity" depends on sneaking in new information through "assumptions." But that doesn't matter. What matters is that the paradox has been discussed as ambiguous in certain versions. To the extent that certain versions are viewed as ambiguous, the Mrs. Cohen version is ambiguous; if you selected Mrs. Cohen by attending her sons brith-party, then the probability that Mrs. Cohen has two sons is increased--its clearly twice as likely that Mrs. Cohen, and not Mrs. Smith, was selected if Mrs. Cohen has 2 boys and Mrs. Smith has only one. Or you could have selected Mrs. Cohen at random from the population of 2 children families.

(2) In the aces problem, there is no question about sampling. The space of all possible events is clearly defined. If we changed the Boy or Girl paradox to the Ace or Spades paradox, there would certainly be no ambiguity.

The fact that Leviatan clearly equates the Mrs. Cohen problem and the two aces problem suggests that she does not consider the issue to be ambiguous, despite citing Gardner. I'll track down the Gardner references and read them in context. --Thesoxlost (talk) 18:12, 25 February 2009 (UTC)

But the "Mrs. Cohen" problem is not ambiguous. We know that (1) every mother of twins, who has at least one boy between them, will hold a briss; and (2) Every briss for twins will be for a pair that includes at least one boy between them. Thus, "at least one of the twins is a boy" is both a necessary condition and a sufficient condition to have the briss. And that is always the source of any ambiguity, in all variants of the problem. The ambiguity it allows is that the problem statement might not definitively describe how the information was obtained. So that it can be interpreted as either being both necessary and sufficient, or (usually) just being a necessary condition.

The two-aces problem is similar in that you need to know how the information is obtained, not just what the obtained information is. It is a good example, because Freund's idea of the "spy" completely captures the details that distinguish the two cases. Which are, "Information about one memeber of the pair was reported, and the other is uncertain" and "The information was pre-decided, and reported whenever it applied to either of the pair." JeffJor (talk) 20:46, 25 February 2009 (UTC)

stop sloppy editing

Thesoxlost, stop sloppy editing. Current article contains ref error. Dont approve edit before verifying it. We are fed up.--61.200.130.56 (talk) 11:19, 28 February 2009 (UTC)

I can't sneak anything by you, can I? Feel free to edit the article to fix my sloppy edits. :) Thanks! --Thesoxlost (talk) 04:03, 1 March 2009 (UTC)

Fox and Levav's Ambiguity.

In the "study of MBA students" that is incorrectly linked with other publications in the introduction to the article, the problem statement is still ambiguous. To see that, start with their claim:

Roughly half the participants (n = 60) received the following problem: “Mr. Smith says: "I have two children and at least one of them is a boy." Given this information, what is the probability that the other child is a boy?”

Now, change it just a little so that it is slightly more flexible. Sixty randomly-selected fathers-of-two are each introduced to a single MBA student, and told to tell the student "I have two children and at least one of them is <insert a gender here>." So that each student sees what is essentially the same problem that Fox and Levav posed. (The option to say "girl" must exist, since we have to handle the possibility of BB and GG; but the question is still the same from the student's viewpoint.) The student is then asked "What is the probability the father you met has two <insert the same gender here>?"

Let the number of fathers who say "boy" be X, so that (60-X) say "girl." IF 1/3 is the correct answer to the question, then we should expect X/3 two-boy families, (60-X)/3 two-girl families, and (2*X/3 + 2*(60-X)/3) = 40 boy&girl families. That is, 2/3 of these random fathers have both a boy and a girl, independent of what they actually said. That can't be the right. It has to be 1/2 of families, and that is possible only if 1/2 is the correct answer to the question that was asked of the students.

The ambiguity is that, if Mr. Smith is the father of a boy and a girl, he has to choose between making either of two correct statements: "I have at least one boy" and "I have at least one girl." It is only if he always chooses to say "at least one boy" that the answer is 1/3; but then, given that he is constrained to say "at least one girl" only if he has two, the odds are 100% that he has two girls when this same man says "at least one girl."

The Fox & Levav question didn't say why Mr. Smith said "at least one boy," which is what makes it ambiguous. We can't really say for sure whether 1/3, 1/2, or 100% is the correct answer (see Nickerson, Cognition and Chance, p. 157); but any answer requires an assumption. And the assumption that leads to 1/3 also leads to 100%. The better assumption is that he chooses randomly, and the answer is 1/2. The 85% of these MBA students made the better choice for the answer, although their reasons might not have been the correct ones. JeffJor (talk) 12:04, 3 March 2009 (UTC)

Example removed

An example I inserted (a reference to a discussion on Jeff Atwood's "Coding Horror" blog) has been removed as "original research". Having a reference to that discussion seems useful enough, and I really don't think a comment on a 1000+ thread on the subject can possibly be "original research". Thoughts? Rp (talk) 17:15, 1 April 2009 (UTC)

I don't know for sure who removed that, or when. But whether or not it was correct (and it was a valid point), it is a comment that disagrees with the conclusion of the cited source. That makes it OR, by my understanding. In my opinion, Jeff Atwood has no qualifications to be a cited source here, and his conclusion is clearly wrong. (As you point out, the correct answer to his question hinges on the intension of the person who told you "One of my two children is a girl." Atwood assumes that intention pre-determined to mention girls, and as you pointed out, to not give complete information.) He is merely repeating what he heard elsewhere, but for a differently-worded problem. But I didn't want an edit war over its inclusion. However, the entire section is gone, and didn't really fit in anyway. Not that there isn't a lot that doesn't fit, but other editors will not let the article come out and say what they disagree with, even if the cited sources do. —Preceding unsigned comment added by JeffJor (talk • contribs) 13:50, 15 April 2009 (UTC)