Talk:Bayes' theorem

/Archive 1, ending October 2005, created 18:32, 5 December 2005 (UTC)

The Statement of Bayes' Theorem

The Statement of Bayes' Theorem section is correct but confusing. I had to re-read this section several times before I remembered from graduate school that "likelihood" has a counter-intuitive technical definition. To the average math-oriented reader, you can't just pop P(A|B) = L(B|A) and not explain that likelihood is an unfortunate technical phrase. Most non-statisticians would not equate "Probability of A | B" with "Likelihood of B | A". If someone doesn't already know Bayes' theorem (reason they're reading the article), they probably don't know what statisticians mean when they say "likelihood function" either. I'd suggest eliminating everything about likelihood functions entirely from this section and just stick with probabilities-oriented terms.--Toms2866 13:06, 28 March 2006 (UTC)[reply]

"Nontechnical explanation" and cookies example

Hello. I've cut the "nontechnical explanation" and the cookies example for the following reasons. (1) "Nontechnical explanation" is mistaken. Bayes' theorem isn't limited to observable physical events, as suggested by the repeated use of the word "occurring". The author has been misled by the suggestive term "event". (2) The verbiage about the term likelihood is void of meaning: This measure is sometimes called the likelihood, since it is the likelihood of A occurring given that B occurred. It is important not to confuse the likelihood of A given B and the probability of A given B. Even though both notions may seem similar and are related, they are quite different. Uh huh. (3) Descriptions of each term P(A), P(B), etc are covered elsewhere in the article. (4) P(A), P(B), etc are called "measures" in the "nontechnical explanation" but they're not; I suppose the author intended "quantities". (5) The description of P(B) is mistaken: This measure is sometimes called the normalising constant, since it will always be the same, regardless of which event A one is studying. No, it is not called a normalizing constant because it is always the same. (6) The cookies example doesn't illustrate anything interesting. (7) The cookies example already appears on the Bayesian inference page. -- The article needs work, and it can be improved, but not pasting random stuff into it. Wile E. Heresiarch 07:17, 28 November 2005 (UTC)[reply]

I agree with some of the points that you raise, but I also believe that there was some good information in the "non-technical" section that you removed. Furthermore, I believe that many math-related articles on Wikipedia, this one included, tend to start immediately with highly technical explanations that only Ph.D. mathematicians can understand. Yes, the articles do need to include the formal mathematical definitions, but I believe that it would be helpful to begin each article with a simple, non-technical explanation that is accessible to the more general reader. Most of these math-related articles have important applications well beyond mathematics -- including physics, chemistry, biology, engineering, economics, finance, accounting, manufacturing, forensics, medecine, etc. You need to consider your audience when you write articles for Wikipedia. The audience is far broader than the population of Ph.D. mathematicians. -- Metacomet 14:37, 28 November 2005 (UTC)[reply]

One other point: in my opinion, it is not a good idea in general for articles to point out that they are starting with a non-technical explanation, and that the full technical discussion will come later, as this article originally did. It is better simply to start with simple, non-technical descriptions and then smoothly to transition to the more formal, technical discussion. Sophisticated readers will know immediately that they can skim over the non-technical parts, and read the more advanced section in greater detail. Non-sophisticated readers will appreciate that you have tried to take them by the hand and bring them to a deeper level of understanding. -- Metacomet 14:50, 28 November 2005 (UTC)[reply]

Hi, I wrote the non-technical explanation, so I'll chip in with my thoughts. First, the reason I wrote it is that this article is too technical. If you check back the history before I first added the section, you'll see there was a "too technical, please simplify" warning on the page. Hell, I'm a computer engineer, I use Bayes' theorem every day, and even I couldn't figure out what the page was talking about. People who don't have a strong (grad level) mathematical background will be completely lost on this page. There is a definite, undeniable need for a simpler, non-technical explaination of Bayes' Theorem.

That said, the vision I had for the non-technical explaination was for it to be a stand-alone text. The technical explaination seemed complete and coherent, if too advanced for regular readers, so I did not want to mess around with it. I thought it would be both simpler and better to instead begin the page with a complete non-technical text, which regular readers could limit themselves too while advanced readers could skip completely to get to the more technical stuff. That is why, as Heresiarch pointed out, the definitions of Pr(A), Pr(B) etc. are there twice.

So I vote that we restore the non-technical explaination. Heresiarch, if you have a problem with some terms used, such as "occur" or "measure", you should correct those terms, not delete the entire section. But keep in mind when doing those corrections that the people who'll be reading it will have little to no formal background in mathematics – keep it sweet and simple! -- Ritchy 15:11, 28 November 2005 (UTC)[reply]

I think there is room for a compromise solution that will make everyone happy and improve the article substantially. Basically, I think Ritchy is correct, the non-technical explanation needs to go back in at the beginning, but it needs to be cleaned up a bit and the transitions need to be a bit smoother. The truth is, the so-called non-technical discussion is not even all that simplified -- it happens to be pretty well written and provides a very good introduction to the topic. Again, I think it just needs a bit of cleaning-up, and it needs to be woven into the article more smoothly. -- Metacomet 15:54, 28 November 2005 (UTC)[reply]

As a first step, I have added the simple "cookies" example back, but this time I grouped it with the other example in a single section entitled "Examples." Each example has its own sub-section with its own header. I think it improves the flow of articles when you put all of the examples together in a single section, and begin with simple examples before proceeding to more complicated ones. -- Metacomet 16:11, 28 November 2005 (UTC)[reply]

The next step is to figure out a way to weave the non-technical explanation back in near the beginning of the article without sounding too repetitious and with smooth transitions. -- Metacomet 16:11, 28 November 2005 (UTC)[reply]

I am not opposed to some remarks that are less technical. I am opposed to restoring the section "Non-technical explanation", as it was seriously flawed. If you want to write something else, go ahead, but please don't just restore the previous "Non-technical explanation". Please bear in mind that just making the article longer doesn't necessarily make it any clearer. Wile E. Heresiarch 02:22, 29 November 2005 (UTC)[reply]

Actually, I think it is pretty good as written. You say that it is "seriously flawed." I am confused: what are your specific objections or concerns? -- Metacomet 03:36, 29 November 2005 (UTC)[reply]

See items (1) through (5) above under "Nontechnical explanation" and cookies example. Wile E. Heresiarch 07:04, 29 November 2005 (UTC)[reply]

I have pasted a copy of the text below for reference. -- Metacomet 04:03, 29 November 2005 (UTC)[reply]

I have edited the "Nontechnical explanation" according to the critics (1) and (4). (2) and (3) are meaningless – it seems Heresiarch just doesn't like things explained too clearly to people who don't know math. (5) seems to be a misunderstanding. Pr(B) is the probability of B, regardless of A. Meaning, if we're computing Pr(A|B), or Pr(C|B), or Pr(D|B), the term Pr(B) will always be the same. That's what I meant by "it will always be the same, regardless of which event A one is studying." If the statement isn't clear enough, I'm open to ideas on how to improve it. -- Ritchy 20:10, 29 November 2005 (UTC)[reply]

Non-technical explanation

Simply put, Bayes’ theorem gives the probability of a random event A given that we know the probability of a related event B occurred. This probability is noted Pr(A|B), and is read "probability of A given B". This quantity is sometimes called the "posterior", since it is computed after all other information on A and B is known.

According to Bayes’ theorem, the probability of A given B will be dependent on three things:

The probability of A on its own, regardless of B. This is noted Pr(A) and read "probability of A". This quantity is sometimes called the "prior", meaning it precedes any other information – as opposed to the posterior, defined above, which is computed after all other information is known.
The probability of B on its own, regardless of A. This is noted Pr(B) and read "probability of B". This quantity is sometimes called the normalising constant, since it will always be the same, regardless of which event A one is studying.
The probability of B given the probability of A. This is noted Pr(B|A) and is read "probability of B given A". This quantity is sometimes called the likelihood, since it is the likelihood of A given B. It is important not to confuse the likelihood of A given B and the probability of A given B. Even though both notions may seem similar and are related, they are quite different.

Given these three quantities, the probability of A given B can be computed as

\Pr(A|B)={\frac {\Pr(B|A)\Pr(A)}{\Pr(B)}}.

Cookies example revisited

The continued expansion of the cookies example isn't improving it. The medical test example, presently in Bayesian inference, is no more complicated, and much more compelling. The medical test, incidentally, is a standard example of the application of Bayes' theorem. I'm going to cut the cookies and copy the medical test unless someone can talk me out of it. Wile E. Heresiarch 15:59, 29 November 2005 (UTC)[reply]

I totally disagree. The cookies example, although rather simple, provides a tangible example of the relationship between conditional probabilities and Bayes' thoerem. Actually, one of its virtues is the fact that it is such a simple example. If you don't find it interesting, you don't have to read it. If you are so advanced in your understanding of Bayes' theorem that this example is trivial for you, then you don't have to read it. Not all readers of Wikipedia are as smart as you are. What is the harm in leaving it in the article? -- Metacomet 16:36, 29 November 2005 (UTC)[reply]

The medical test example is essentially the same as the cookies: bowl 1 = people with disease, bowl 2 = people without, plain = negative test, chocolate chip = positive; Fred has a plain cookie, which bowl is it from = Fred tests negative, does he have the disease. If the cookies example is simple, then so is the medical test, and the latter has the advantage that people (even ordinary readers) truly care about such problems. Wile E. Heresiarch 23:21, 29 November 2005 (UTC)[reply]

Furthermore, although the medical example is interesting, it is confusing and too advanced for a first example meant to introduce basic concepts. Again, the audience that we are writing for is not Ph.D. mathematicians; the audience is a general audience that includes people who do not have the same background that you do. The goal is to explain the concepts, not to show how smart you are by throwing around a lot of techno mumbo-jumbo that no one understands except the elite few. -- Metacomet 16:42, 29 November 2005 (UTC)[reply]

That's a nice strawman you have there. If you bothered to check the discussions above, you would see that I've argued against including measure-theoretic stuff (which, I believe, counts as "technical mumbo-jumbo"). More recently, I revised the introduction to remove the technical stuff and make it entirely verbal. Wile E. Heresiarch 23:21, 29 November 2005 (UTC)[reply]

Good. I am glad you agree. BTW, I think the revisions that you made recently to the introduction are excellent. I realize that most people above the age of 12 don't care much about bowls of cookies. Nevertheless, I think it illustrates the concepts very well and in a very straightforward way. Finally, I think the term I used was "techno mumbo-jumbo," not "technical mumbo-jumbo". ;-) -- Metacomet 04:32, 30 November 2005 (UTC)[reply]

One more thing: if any of the examples in this article should be removed, it is Example #2 on Bayesian inference and not the cookies example. I have a pretty strong background in math, and I don't have the first clue what this example is all about. What benefit does it provide other than to confuse the reader? -- Metacomet 16:49, 29 November 2005 (UTC)[reply]

Agreed. In fact I've argued the same point (item 3 in my edit of July 11, 2004, at the top of the page). Wile E. Heresiarch 23:21, 29 November 2005 (UTC)[reply]

I agree with Metacomet. The cookie example is clear and relates to a simple tangible situation. Everyone can easily imagine drawing cookies from a bowl. This makes it an excellent medium to explain Bayes' Theorem. The example is complete, and clearly and accurately illustrates the Theorem. It is explained in plain and simple terms, so that anyone can understand it. Furthermore, it does not require any background knowledge from the reader in any other domain, and does not needlessly take on another topic like medicine or polling, something that only serves to confuse readers. I see no reasons to cut it; quite the opposite, it is the perfect example for the page and should definitly be kept. -- Ritchy 20:01, 29 November 2005 (UTC)[reply]

Cutting the cookies

Why did the cookie example get cut? There was only one person who didn't like it, and the discussion here clearly highlighted why it was necessary to keep it. You can't possibly think that this medical example is simpler!

I didn't say that it is simpler; I said the medical test example is no more complex than the cookies example. Wile E. Heresiarch 00:11, 3 December 2005 (UTC)[reply]

The cookie example explained Bayes' Theorem much more clearly, and using a situation everyone is familiar with. Unless someone comes up with a good reason why it should be cut today, I'll restore it tomorrow. -- Ritchy 15:23, 2 December 2005 (UTC)[reply]

I'd like to know in what sense the cookies example is clearer. Try to steer away from repeated assertions of the conclusion this time. Wile E. Heresiarch 00:11, 3 December 2005 (UTC)[reply]

I agree 100 percent with Ritchy. The cookies example is far better than the medical example as a simple way to illustrate the basic concepts. I think it should be restored. I invite others to voice their opinions on this issue. -- Metacomet 21:32, 2 December 2005 (UTC)[reply]

The main point of the Medical Example

Read the medical example again. Do you know what the main point of this example is? It is not meant as an example to illustrate the fundamental concepts of Bayes' Theorem. The main purpose of the medical example as written is to illustrate an important and common fallacy in probability theory. As it turns out, Bayes' theorem is particulary useful as a way of uncovering this fallacy and demonstrating the correct inference. So if you accept my premise that we should use Example #1 as a means of illustrating the fundamental concepts, then you would conclude that the medical example is not the appropriate vehicle for that purpose. On the other hand, if you want to use the medical example, then it needs to be completely re-written so that it illustrates the basic concepts, and not as a device for discussing an incorrect logical inference that people commonly make.

I am not interested in re-writing the medical example, because the cookies example is perfectly fine as written, and it serves the desired purpose more than adequately.

Also, I am tired of this discussion. I have more important things to worry about. So I am done. Mr. Wily, please do whatever you want. -- Metacomet 06:36, 3 December 2005 (UTC)[reply]

You know, it is truly remarkable. I just compared the medical example that you added to this article with the original medical example as it appears in the Bayesian inference article. You ripped the guts right out of the example! So we are left with the Readers Digest abridged version, or if you perfer, the Medical Example Lite. No wonder it's so difficult to understand this example. There is no there there. -- Metacomet 06:55, 3 December 2005 (UTC)[reply]

There is some discussion in the medical test example as it appears in Bayesian inference which is related to issues that aren't relevant in Bayes' theorem, so I omitted that discussion. I carried over just what's needed to illustrate the machinery of Bayes' theorem. Wile E. Heresiarch 18:44, 3 December 2005 (UTC)[reply]

Cookies are contrived

The question "from which bowl is the cookie" is entirely contrived, and that is the major difficulty with the cookie example. While cookies are familiar, the question posed is not, and that obscures the point of the example. On the other hand, the question "Does Fred or doesn't he have such and such a disease" is posed in real life in the same way as in the example; it doesn't take some kind of cognitive readjustment to comprehend it. You & Ritchy may wish to consider why the medical test is a standard example of Bayes' theorem, while cookies are not. Wile E. Heresiarch 00:11, 3 December 2005 (UTC)[reply]

Of course the cookies example is contrived. It is meant as a bit of a tongue-in-cheek, overly-simplified, and slightly whimsical example. The idea is to make it simple enough to demonstrate the basic concepts of the theorem and the related definitions, but not deadly dull and boring. Lighten up! -- Metacomet 06:16, 3 December 2005 (UTC)[reply]

The medical test example is serious, but far from boring. Wile E. Heresiarch 18:40, 3 December 2005 (UTC)[reply]

Like I said, lighten up. -- Metacomet 20:44, 3 December 2005 (UTC)[reply]

Arguments

The arguments for the cookie example are all in the previous text, but as requested by Heresiarch, I will once again list them all for some reason.

The example is complete, in the sense that it illustrates all the necessary steps to apply Bayes' Theorem.
The example is written in clear, non-technical English.
The example is limited to Bayes' Theorem, and doesn't try to address other issues such as medical testing or polling.
The example is simple, in that it doesn't require the user to have background knowledge of another field to be understood. Everyone knows what a cookie in a bowl is. Not everyone knows what a false-positive medical diagnostic is, or what binomial distributions are.
Wikipedia should be accessible to everyone, regardless of instruction level and academic background. Thus, simple examples using common household items such as cookies are very useful to explain advanced mathematical concepts such as Bayes' Theorem.
There has been so far no compelling reasons given to delete the cookie example. Making the page more complicated for the sake of making it more complicated doesn't count. The fact you don't like part of the phrase "from which bowl is the cookie" is a reason to fix that phrase, not to delete the entire example. The fact it's not a standard textbook example doesn't count, because Wikipedia is anything but a standard math textbook.

And after all that, Metacomet beat me to the fun of restoring the example. Dammit. -- Ritchy 18:48, 4 December 2005 (UTC)[reply]

(1) and (2) apply as well to the medical test. (3) and (4) are true, but that's because the cookies example has zero motivation, while the medical test is strongly motivated. (5) is false; there are plenty of articles which are not accessible to everyone. That said, the medical test is just as comprehensible as the cookies. About (6), I've already spelled it out. In summary, the medical test example is no more complicated, and much more compelling. Incidentally, the fact that the medical test is a standard example of Bayes' theorem shows that many people (not just me) consider it a useful illustration. Wile E. Heresiarch 03:34, 5 December 2005 (UTC)[reply]

(1) may apply to the medical test, but (2) does not. Even if it applied to both examples, it wouldn't on its own constitute a reason to delete one of them, much less a reason to delete the cookie example and keep the medical example. I'm glad you agree that (3) and (4) are true, because they're also very important. In regard to the cookie thing having "zero motivation", well I'll grant you that in real life, people don't care what bowl they took a cookie from, but that's besides the point. I think it's safe to assume that the motivation of someone reading the Wikipedia entry on Bayes' Theorem is to learn about Bayes' Theorem, and the cookie example fulfills this perfectly. And since you brought up the topic of motivation, what makes you think people reading about Bayes' theorem are also motivated to learn about medical diagnostics and polling? (5) is true. Wikipedia is meant to be accessible to everyone. It's one of the guidelines. Allow me to quote: "Articles in Wikipedia should be accessible to the widest possible audience. For most articles, this means accessible to a general audience. Every attempt should be made to ensure that material is presented in the most widely accessible manner possible. If an article is written in a highly technical manner, but the material permits a more accessible explanation, then editors are strongly encouraged to rewrite it." The bolding and italics is from the Wikipedia guideline by the way, not from me. As for (6), I'll reiterate that I haven't seen a single good reason to remove the example. I'm sorry if you feel you've made the point clearly, but you haven't. Since we've gone through the trouble of giving you a clear numbered list of reasons to keep it, perhaps you'd care to return the favour? --Ritchy 03:55, 5 December 2005 (UTC)[reply]

And another thing. You (Heresiarch) keep going on and on about how the medical example is a standard textbook example of Bayes' Theorem. Do you actually mean you read it in a math textbook and copied it here? Because you're not allowed to do that. Books are protected by copyright laws (as someone of your unparalleled intelligence probably knows). You don’t have the right to just copy a page from it and post it on a free website for the world to see, unless you have a written legal authorisation to do so. If you just copied the example from a math textbook, we’ll have to delete it, no matter how great you think it is. --Ritchy 16:18, 5 December 2005 (UTC)[reply]

More Arguments

Ritchy -- I apologize for stealing your thunder. On the other hand, as they say, great minds think alike. -- Metacomet 18:54, 4 December 2005 (UTC)[reply]

Wily -- I agree 100 percent with the arguments made by Ritchy above. I would also reiterate some of the arguments that I have already mentioned in prior discussions:

The cookies example is somewhat fun and whimsical. A little bit of humor every now and then is useful and entertaining.
The medical example is not a good example for illustrating the basic concepts related to Bayes' theorem because the main point of the medical example is to discuss a common logical fallacy in probability theory.

-- Metacomet 18:55, 4 December 2005 (UTC)[reply]

The reason the cookies example is unhelpful is that each bowl has the same probability of being chosen. This means that somebody who thinks Bayes theroem in general might be

\Pr(A|B)={\frac {\Pr(B|A)}{\Pr(B|A)+\Pr(B|A^{C})}}

will get

{\frac {0.75}{1.25}}=0.6

, the right answer for the wrong reason. They then may think they understand, which they will not. So it is a poor example of the use of prior probabilities in Bayes's theorem. --Henrygb 02:55, 5 December 2005 (UTC)[reply]

Good point. --MarkSweep (call me collect) 04:04, 5 December 2005 (UTC)[reply]

I think the only people who might possibly be confused by this red herring you've cooked up are those who have not read this article. I think the article spells out the theorem, and the link to conditional probability provides a very clear definition and explanation of the related concepts. Furthermore, the cookies example takes the reader by the hand and walks through the calculation.

Your argument is equivalent to saying that we should not mention to people that two times two is four, because they might be confused by the fact that two to the second power is also four. -- Metacomet 04:53, 5 December 2005 (UTC)[reply]

Actually it is the same as saying that 16/64 is a bad example for illustrating simplifying factions because some people might cancel the 6s to get 16 / 64 = 1/4. But uninformed some people do this. Another bad example would be calculating the derivative of e^x at x = e as some people would get x e^x−1 = e^e. --Henrygb 10:25, 5 December 2005 (UTC)[reply]

I'm with Metacomet here. Sure, there are other ways of mixing the numbers of the cookie example and getting the right answer. If we just provided the problem statement and the correct answer it would lead to confusion. But we don't. We give every step of the reasoning, and lay out the correct equation as

\Pr(A|B)={\frac {\Pr(B|A)\Pr(A)}{\Pr(B)}}={\frac {0.75\times 0.5}{0.625}}=0.6.

. I just don't see how anyone who reads the example could mess up the equation to the extent you described in your post. --Ritchy 16:26, 5 December 2005 (UTC)[reply]

Agreed w/ Henrygb on this point. The medical test has interesting and relevant prior information; the cookies example doesn't. Wile E. Heresiarch 06:41, 5 December 2005 (UTC)[reply]

I'm sure I could mix and match the numbers of the medical example in a way to get the right answer in a completely wrong way, like MarkSweep did for the cookie example. The medical example isn't superior on that point. [snide remarks deleted] --Ritchy 16:26, 5 December 2005 (UTC)[reply]

The equal prior probabilities of the two bowls makes the cookies example susceptible to the error mentioned by Henrygb. The prior probabilities in the medical test example aren't equal. Wile E. Heresiarch 02:35, 6 December 2005 (UTC)[reply]

For what it's worth, I've taught Bayesian inference to incoming Freshmen in the University of Texas' Plan II honors program five times. I find the students much more receptive to real examples (like the medical example) than to the artificial ones (like the cookie example, but I use chocolates). The "hook" is that people do get medical exams, and many of my students have personal experience, if not in their own life, then in the lives of close relatives. So my experience leads me to prefer the medical example to the cookie example.

I do not agree that "the main point of the medical example is to discuss a common logical fallacy in probability theory." That's not how I use it. Bill Jefferys 18:16, 5 December 2005 (UTC)[reply]

For the record, I was talking about the medical example as it is written here in Wikipedia (refer to the article), and not the medical example as a general class of examples. As I have said, in order to use the medical example as a simple illustration of the basic concepts of conditional probability and Bayes' Theorem, I believe that it would need to be re-written with that purpose in mind. I still maintain, that in its current form, it is useful in illustrating the fallacy related to false positives, because that is what the writer has chosen to emphasize. Unfortunately, it does not, in my opinion, emphasize the basic calculations and definitions related to Bayes' Theorem. -- Metacomet 02:03, 6 December 2005 (UTC)[reply]

For the record, I don't object to keeping both the cookie example and the medical example. While I personally prefer the cookie one, I recognise that the medical one also has merit, and both can be useful in helping people understand Bayes' Theorem. In fact, I'd consider it preferable to keep both examples; in my experience, there's no such thing as giving too many examples to illustrate a mathematical theorem. Most of the current debate stems from Heresiarch's fanatical devotion to deleting one of the examples. Given the choice (or rather, being forced to choose), I'll go with the cookie example. But my first preference would be to keep both. --Ritchy 18:37, 5 December 2005 (UTC)[reply]

I have no objection to including both. Bill Jefferys

To set the record straight, if you look back at the history, you will see that I did not delete your signature on your posting. You forgot to include your identity when you originally made the posting. But I am glad that you have now identified it as yours. -- Metacomet 19:56, 7 December 2005 (UTC)[reply]

Actually, what happened was that I had a two paragraph entry, and you split it when you added the new section. Purely unintentionally I am sure. I don't blame you, but it left my first paragraph orphaned without the signature that I had put at the end of the second paragraph. Go check the history, you'll see that this is what happened. No problem, I just went back and put my sig on the first paragraph when I realized what had happened, so people would know that I wrote it. It's a lesson to all of us to be careful when we edit. Bill Jefferys 22:52, 7 December 2005 (UTC)[reply]

Using actual data in the medical example

As a matter of pedagogy, I would prefer that the numbers in the medical example correctly correspond to an actual disease. For example, for colorectal cancer, the prior is that 0.3% of individuals have undiagnosed colorectal cancer. The hemoccult test will come up positive 50% of the time for patients that have the cancer and 3% of the time for patients that do not have the cancer. (Data from Gerd Gigerenzer, Calculated Risks.) Other examples could be found easily. Bill Jefferys 19:09, 5 December 2005 (UTC)[reply]

I think this is an excellent idea, worthy of further consideration. Putting your data into the same notation as the current example:

P(D)\ =\ 0.003

P(D^{C})\ =\ 0.997

P(T|D)\ =\ 0.500

P(T|D^{C})\ =\ 0.03

where event D is having the disease and event T is testing positive for the disease.

So, using Bayes' Theorem, we have

P(D|T)\ =\ {\frac {P(T|D)\,P(D)}{P(T|D)\,P(D)+P(T|D^{C})\,P(D^{C})}}

=\ 0.04776

or about 4.8 percent (if I did the math correctly). So the rate of false positives is approximately 95.2 percent.

On the other hand, what is also really scary is that the rate of false negatives is 50 percent!

-- Metacomet 02:20, 6 December 2005 (UTC)[reply]

Yes, but the probability that one has the condition, given that you test negative, is 0.2%. The test is still useful in that it will detect about half the cancers in the general population. And (my medical spies tell me) the next year when you go in to take the test, the data will be nearly independent of what you had the year before. So about half the cancers that were missed the first time around will be detected the next year. Ditto for the next year. The fortunate thing is that these cancers are generally slow-growing, so regular testing will turn up a significant fraction of them. I have been told that at my age (I am on medicare) I need have the "gold standard" colonoscopy only once in ten years, and take the hemoccult test once a year. This gives a very significant margin of safety with little risk (the risk of colonoscopy is of the order of a percent or so...perforated bowel, bleeding, other complications). As with all invasive medical tests, one has to balance various different risks as well as costs. All of this makes deciding whether to take a test a matter of decision theory, not just of statistics. Bill Jefferys 03:12, 6 December 2005 (UTC)[reply]

Text of the Cookies Example

Example #1: Conditional probabilities

To illustrate, suppose there are two bowls full of cookies. Bowl #1 has 10 chocolate chip cookies and 30 plain cookies, while bowl #2 has 20 of each. Our friend Fred picks a bowl at random, and then picks a cookie at random. We may assume there is no reason to believe Fred treats one bowl differently from another, likewise for the cookies. The cookie turns out to be a plain one. How probable is it that Fred picked it out of bowl #1?

Intuitively, it seems clear that the answer should be more than a half, since there are more plain cookies in bowl #1. The precise answer is given by Bayes' theorem. But first, we can clarify the situation by rephrasing the question to "what’s the probability that Fred picked bowl #1, given that he has a plain cookie?” Thus, to relate to our previous explanation, the event A is that Fred picked bowl #1, and the event B is that Fred picked a plain cookie. To compute Pr(A|B), we first need to know:

Pr(A), or the probability that Fred picked bowl #1 regardless of any other information. Since Fred is treating both bowls equally, it is 0.5.
Pr(B), or the probability of getting a plain cookie regardless of any information on the bowls. In other words, this is the probability of getting a plain cookie from each of the bowls. It is computed as the sum of the probability of getting a plain cookie from a bowl multiplied by the probability of selecting this bowl. We know from the problem statement that the probability of getting a plain cookie from bowl #1 is 0.75, and the probability of getting one from bowl #2 is 0.5, and since Fred is treating both bowls equally the probability of selecting any one of them is 0.5. Thus, the probability of getting a plain cookie overall is 0.75×0.5 + 0.5×0.5 = 0.625.
Pr(B|A), or the probability of getting a plain cookie given that Fred has selected bowl #1. From the problem statement, we know this is 0.75, since 30 out of 40 cookies in bowl #1 are plain.

Given all this information, we can compute the probability of Fred having selected bowl #1 given that he got a plain cookie, as such:

\Pr(A|B)={\frac {\Pr(B|A)\Pr(A)}{\Pr(B)}}={\frac {0.75\times 0.5}{0.625}}=0.6.

As we expected, it is more than half.

Tables of occurences and relative frequencies

It is often helpful when calculating conditional probabilities to create a simple table containing the number of occurences of each outcome, or the relative frequencies of each outcome, for each of the independent variables. The tables below illustrate the use of this method for the cookies:

Number of cookies in each bowl
by type of cookie

Relative frequency of cookies in each bowl
by type of cookie

	Bowl #1	Bowl #2	Totals
Chocolate Chip	10	20	30
Plain	30	20	50
Totals	40	40	80

	Bowl #1	Bowl #2	Totals
Chocolate Chip	0.125	0.250	0.375
Plain	0.375	0.250	0.625
Totals	0.500	0.500	1.000

The table on the right is derived from the table on the left by dividing each entry by the total number of cookies under consideration, or 80 cookies.

Another poor teaching example

What happens to the tables if Bowl #2 has 30 Chocolate Chip cookies and 30 plain cookies, so there are 100 cookies in total?

Number of cookies in each bowl
by type of cookie

Relative frequency of cookies in each bowl
by type of cookie

	Bowl #1	Bowl #2	Totals
Chocolate Chip	10	30	40
Plain	30	30	60
Totals	40	60	100

	Bowl #1	Bowl #2	Totals
Chocolate Chip	0.1	0.3	0.4
Plain	0.3	0.3	0.6
Totals	0.4	0.6	1.0

So this suggests the answer of 0.3/0.6=0.5, which is wrong for this question (though might work if the cookies were chosen at random without the plates being chosen first). So greater evidence of a bad example. --Henrygb 20:00, 5 December 2005 (UTC)[reply]

As Ritchy already pointed out, anyone can generate numbers for the cookies example so that it becomes ambiguous and not appropriate as a teaching example. That's easy. The challenge is to come up with numbers so that it is a good example for teaching and illustrating. Remember, the whole idea here is to try to help people understand this stuff. Oh yeah, I almost forgot. Any idiot could do the same thing for the medical example as for the cookies example. It has nothing to do with whether you use cookies or medical testing. It has to do with whether you want to make it work or you want to make it fail. -- Metacomet 04:33, 11 December 2005 (UTC)[reply]

What I don't get is why you are spending so much time and energy trying to devise inappropriate examples for illustrating this theorem instead of investing your time and energy into something positive and useful, like trying to improve the medical example for instance, or trying to improve the cookies example instead of attacking it. -- Metacomet 04:35, 11 December 2005 (UTC)[reply]

Muggles and witches

As I stated during this discussion of the cookie example, I think that the medical example is more compelling, but do not object to having both. But I do object to eliminating the medical example in favor of the cookie example.

While reading the New York Times today, my attention was drawn to an article on The Genetic Theory of Harry Potter, which discusses questions about (according to the article) a recessive gene w for wizardry that is an allele that is normally represented by the dominant gene M for a muggle. That is, to be a wizard, one has to have two of the w allele, one from the father and one from the mother; if one has one w and one M, one is a muggle, but may possibly have children that are wizards, depending upon the alleles that ones spouse may have.

This made me think of a possibly more compelling way of presenting the ideas that were present in the cookie example. One can ask the question, for example, given that in the general population, a certain percentage of individuals are witches (that is, ww, homozygous for the w allele), assuming random mating (Hardy-Weinberg equilibrium, how does one calculate the percentage of individuals that are heterozygous (that is, Mw or wM, where the first letter indicates the allele inherited from the father and the second the allele inherited from the mother), and the percentage that are homozygous for M? I pass on the question about whether random mating is a reasonable assumption for people who are witches or who know that they are descended from witches; this is a reasonable question, but beyond the scope of my comments.

The paragraph above allows us to compute the prior on the three (four if one distinguishes Mw and wM) cases. Since the frequency of ww is under our control (as pedagogues), we can manufacture any example we wish.

Now one can pose questions like: Suppose a couple has three children, all muggles. What is the probability that neither parent has the w allele? What is the probability that both parents have genotype Mw? What is the probability that one parent has genotype Mw and the other is MM? What is the probability that both have genotype ww (zero, but one can calculate this formally from Bayes' theorem).

Or, same questions, except that the couple has three children, one wizard and two muggles? Or if they have three children, all wizards?

As you can see, the questions one may ask are quite varied and all illustrate both the idea of setting a prior, and the idea of how to turn a prior into a posterior, given data.

Now, I am not wedded to the muggles-wizards thing here, although I think that many young people just getting to college may have grown up with Harry Potter and may find this an interesting and compelling example. It could just as well be an example with (say) the sickle cell trait, or some other actual biological example. But I am thinking of using it in my own teaching. So, I put it out here for your consideration. Bill Jefferys 00:14, 12 December 2005 (UTC)[reply]

Sounds interesting. Maybe you could write a rough draft on your own User page, and then when you think you have something that is ready for prime time, you could present it as a proposed example for the Bayes' Theorem page and then we could discuss it here on this page if you wanted some outside feedback. It's entirely up to you of course. -- Metacomet 01:47, 12 December 2005 (UTC)[reply]

OK, give me a few days; semester is ending and I have things that have priority. Bill Jefferys 02:49, 12 December 2005 (UTC)[reply]

A surprising example

A friend sent me this surprising example. Suppose there is a test for detecting whether an unborn child is a boy or a girl. If the child is a boy, the test is "perfect": P(Test B|B)=1; if the child is a girl, it is not so good: P(Test G|G)=0.7. Bayes' theorem readily gives the result that P(B|Test B)=10/13, whereas P(G|Test G)=1. The surprising thing is that the "perfection" of the test is transferred from boys to girls when the conditioning is reversed. My friend notes that this can be considered a version of the prosecutor's fallacy, AKA the Harvard Medical School fallacy.

I don't know if this could be used as an example in this article or whether it should appear in the prosecutor's fallacy article. In any case, it is rather counterintuitive and deserves mention somewhere.

I'll mention this on the talk page of the prosecutor's fallacy article. Bill Jefferys 21:32, 16 December 2005 (UTC)[reply]

I can never figure these examples out without setting up my trusty 3 x 3 table of joint probability. I worked it out for this example using the numbers you provided and the definition of conditional probability:

	Test = B	Test = G	Totals
Actual = B	0.50	0.00	0.50
Actual = G	0.15	0.35	0.50
Totals	0.65	0.35	1.00

It was not obvious to me, maybe it was to some others, but the key to the surprising result is the zero joint probability for the test finding a girl when the actual is a boy, and the non-zero joint probability for the test finding a boy when the actual is a girl (the dual case). Anyway, it is an intriguing example. Bill, thanks for posting it here. -- Metacomet 22:36, 16 December 2005 (UTC)[reply]

This actually caused confusion at the RP article. An RP machine has the property that:

If the answer is NO, it always returns NO.
If it ever returns YES, the answer is YES.

We ended up having to explain both carefully to avoid any further confusion. Deco 04:07, 17 December 2005 (UTC)[reply]

This is only "surprising" because of the misleading phraseology used when describing the problem. If the problem is stated in a less colloquial manner, the result is not at all surprising:

There is a single test, denoted by the random variable T, which is used to determine the sex of an unborn child. The test can produce two results: either T=boy or T=girl. When the child is a boy, it is known that P(T=boy)=1. When the child is a girl, it is known that P(T=girl)=0.7. This means that the test can only be incorrect when it reports that T=boy. This obviously makes the test less reliable when this result is observed and immediately leads to P(boy|T=boy) < P(girl|T=girl).

No use of Bayes' theorem is necessary. One could speculate that the reason why this was thought to be surprising is the complete absence of logic as a subject in its own right from the modern school curriculum. However, times and fashions change. Lukestuts 14:46, 22 December 2005 (GMT)

Unborn children do not generally have sex. Usually they wait until after they are born, and then for at least 12 years or so. They do, however, have a gender. -- 24.218.218.28 16:00, 22 December 2005 (UTC)[reply]

I agree that the confusion comes from imprecise language. Epidemiologists and statisticians who work with screening application like the one above refer to four test characteristics that characterize the performance of a test. These are the four conditional probabilities (with apologies for apparent gender bias):

1. Predictive value positive ==> P(B|Test B)

2. Predictive value negative ==> P(G|Test G)

3. Specificity ==> P(Test B|B)

4. Sensitivity ==> P(Test G|G)

So in the example, the specificity is 1, the sensitivity is 0.7, the PVP is 10/13 and the PVN = 1. I think the confusion comes from thinking about the test as being 'perfect'. - Ken K 21:21, 1 March 2006 (UTC)[reply]

No, you don't need non-zero probabilities to have well-defined conditionals

[snide remarks deleted]

It is commonplace to say, for example, that the conditional distribution of Y given X is normal with expectation X and variance 1, and X itself is normal with expectation 0 and variance 1. In that case, one is obviously conditioning on an event of probability zero. There's nothing wrong with that. It does mean, however, that the identity Pr(A|B) = Pr(A & B)/Pr(B) would not apply. Michael Hardy 23:41, 4 December 2005 (UTC)[reply]

I am bit confused by your explanation. You have said that X is normal with expectation 0 and variance 1, and then you say that we are conditioning on an event of probability zero. Are you talking about event X ? If so, is it true that the probability is zero? The pdf (scratch that, make it the CDF) of X, call it f(X), is greater than zero and monotonically non-decreasing for all non-infinite X. So then, P( a < X < b) = f(b) – f(a) > 0 (in general, although could be = 0 in special cases) for all a and b where b > a. So in what sense is the probability of X equal to zero?

X is not an event; X is a random variable. The event on which one is conditioning here is the event that X has a particular value. That is an event of probability 0, because X is a continuous random variable. The pdf of X is certainly NOT non-decreasing; it's the "bell curve" that increases and then decreases (maybe you meant the cdf rather than the pdf?). OK, I've looked closely at your next sentence. Apparently you did mean the cdf. There's no such thing as the probability of X, since X is not an event. The probability that X has a particular value is an event of probability 0. Michael Hardy 00:04, 11 December 2005 (UTC)[reply]

Sorry, you are right, I did mean the CDF and not the pdf. -- Metacomet 00:30, 11 December 2005 (UTC)[reply]

So if we define Event A as the event where the continuous random variable X is equal to a specific value, for instance X = a, then of course P(A) = 0 as you have said. But, for a different event, say Event B such that b < X < c, then the probability of this event is not zero, P(B) > 0. But that means that even though X is a continuous random variable, we are now talking about discrete events A and B defined in terms of X. -- Metacomet 00:30, 11 December 2005 (UTC)[reply]

Please note, I am asking these question purely in good faith. I have no agenda other than that I am confused and I would like to understand. I am not trying to bust anyone's chops or to forward any particular point of view. I would greatly appreciate your help in understanding this example. Thanks. -- Metacomet 18:16, 10 December 2005 (UTC)[reply]

Michael Hardy's point is that in the particular case he considered, the probability of observing a particular value of a continuously distributed quantity (such as a quantity that is normally distributed) is zero. This is not a problem for Bayes' theorem, because in the case of continuously distributed quantities the correct approach is to go over to the probability density for x, which is not zero for any given x. The probability density in his example is given by the standard normal distribution. Bill Jefferys 22:18, 5 December 2005 (UTC)[reply]

Isn't this what Bayes.27_theorem_for_probability_densities says? --Henrygb 23:03, 5 December 2005 (UTC)[reply]

I will repeat my question again. For two discrete random variables A and B, if the probability of B is zero, then what meaning is there in trying to determine the conditional probability of A given B ? In fact, is it not the case that if P(B) = 0, then in fact P(A|B) is completely indeterminant, and can take on any value whatsoever? If P(B) = 0, then B cannot occur, so how can we talk about the probability of A contigent on B, an event that never happens?

Mathematically, if P(B) = 0, then it follows that P(A&B) = 0. Since P(A|B) is the ratio of P(A&B) divided by P(B), it then follows that P(A|B) is zero divided by zero, which can take on any value, including values less than zero and greater than one. Of course, it would be absurd for a probability to take on these values, but nevertheless, there it is. So my conclusion is, that in order to have a meaningful value for P(A|B), then P(B) cannot equal zero. Could someone please tell me if that is correct or incorrrect, and if not, why not. -- Metacomet 00:38, 6 December 2005 (UTC)[reply]

For discrete random variables, your point is valid, but you'll notice I spoke of normally distributed random variables, so they're not discrete. Michael Hardy 01:25, 6 December 2005 (UTC)[reply]

Thank you. In most cases where this issue came up, I was in general talking about discrete random variables, although I did not always make that assumption explicit. I understand that Bayes' theorem can be applied to continuous random variables (as the article points out), and I understand the difference between probability and probability density functions. I appreciate your willingness to help me understand the issue involving marginal probabilities that are equal to zero. I need to spend some more time trying to understand the continuous case, and how it differs from the discrete case. Thanks again. Regards, -- Metacomet 01:41, 6 December 2005 (UTC)[reply]

Bottom-line

So what's the bottom-line? It seems to me that my original point was in fact correct. If we have two discrete random events, A and B, then in order to have meaningful conditional probabilities, P(A|B) and P(B|A), we must have non-zero marginal probabilities:

P(A)\neq 0

and

P(B)\neq 0

Otherwise, the conditional probabilities become zero divided by zero, which is completely indeterminant, as I discussed above.

Furthermore, even if we are dealing with a continuous random variable X, we still end up defining discrete events A and B in terms of X, in which case once again, in order to have meaningful conditional probabilities, the marginal probabilties cannot equal zero.

-- Metacomet 16:55, 22 December 2005 (UTC)[reply]

Good example of a flame-out

[snide remarks deleted]

The reason I am editing this page is because it desperately needed improvement. One of the biggest problems, which was not identified by me but rather by others, was that it was written in a way that was way too technical for a general audience to understand. Oh, and look, that is exactly at the heart of our disagreement over the cookies example. You still have made no credible attempt to identify any valid reasons for deleting the cookies example. What are you afraid of? Why are you totally opposed to helping people understand technical concepts? Or is it more fun to obfuscate ideas with obscure terminology? -- Metacomet 04:45, 5 December 2005 (UTC)[reply]

More straw men. Wile E. Heresiarch 05:59, 5 December 2005 (UTC)[reply]

Good answer.

OK, I'm going to get to this article soon, when I'm feeling energetic. Michael Hardy 03:21, 5 December 2005 (UTC)[reply]

So, is it worth discussing here under what circumstances the identity applies, or is that sufficiently covered under conditional probability? --MarkSweep (call me collect) 04:00, 5 December 2005 (UTC)[reply]

I'm inclined to steer around the difficulty in this article and leave interesting details to conditional probability. But it could go the other way too; if we had some alternative texts in front of us, it might be easier to choose. Wile E. Heresiarch 06:04, 5 December 2005 (UTC)[reply]

In all seriousness, tell me what I am missing. If P(B) = 0, then from P(A&B) = P(A|B) P(B) two things are clear: (1) P(A&B) = 0, and (2) P(A|B) is an indeterminate quantity (okay, not undefined, but indeterminate). Is that correct?

On the other hand, in the real world, if P(B) = 0, then why would I care what P(A|B) is? If event B never happens, then trying to find P(A|B) is completely meaningless. Who would even want to ask such a question? -- Metacomet 05:29, 5 December 2005 (UTC)[reply]

"I don't understand what's going on here, but I'll tell you what to do anyway" is a weak position to argue from, but you don't let that slow you down. I'm accustomed to arguing with people who know what they're talking about; I really don't know how to deal with you. Wile E. Heresiarch 05:59, 5 December 2005 (UTC)[reply]

[snide remarks deleted]

It is interesting to note that rather than answering my (legitimate) question, you chose to attack me personally. This is not about me. It is about trying in good faith to improve Wikipedia in general and this article in particular. Or maybe for you it is about something else.... -- Metacomet 13:56, 5 December 2005 (UTC)[reply]

No response....

Also, it was Michael Hardy who claimed that it is not necessary for the marginal probabilities to be nonzero. I am not convinced. That doesn't mean I don't know what is going on, that means that he made a statement that I do not understand. I have asked for an explanation, but so far none has been forthcoming. If I am wrong, then I will be the first to admit it (unlike some other people). I am not so fragile that I cannot admit when I do not understand something or when I make a mistake. Grow up! -- Metacomet 14:00, 5 December 2005 (UTC)[reply]

No response...

I am accustomed to dealing with people who are interested in learning and growing, not people who need to feed their own ego by showing how much smarter they are than everyone else. -- Metacomet 14:02, 5 December 2005 (UTC)[reply]

No response...

Two more points:

At least I have an open mind, and I am willing to consider a point of view different from mine.
Where I come from, asking a question when I don't understand something is not a sign of weakness; it's a sign of strength. The weak person is the one who pretends to understand even when he doesn't.

-- Metacomet 15:22, 5 December 2005 (UTC)[reply]

No response...

Simplified explanation on top?

I am proposing putting something like the following into the article:

If I flip a double-headed coin, the probability of getting a head is 1. However, if I flip a coin and get a head, what is the probability that the coin was double-headed? This is an example where Bayes' theorem will apply.

x42bn6 Talk 07:00, 5 December 2005 (UTC)[reply]

Possible plagarism

The 'false positivies in a medical test' is lifted directly from "A First Course in Probability" 6th. ed, by Sheldon Ross. ISBN 0-13-033851-6.

Is it lifted word for word, including the same numbers, or is it just the same example? The medical test example is very common. Deco 04:11, 17 December 2005 (UTC)[reply]

We could change it to an actual disease with actual figures, which would be better pedagogically anyway. See above for one possibility. Bill Jefferys 20:41, 17 December 2005 (UTC)[reply]

Bayes theorem requires a key assumption

I just wanted to point out that the bayes' theorem is only useful IF there is no correlation between the frequency with which the information given is given, and the outcome regardless of your awareness of such a correlation. If there is such a correlation, an adjustment needs to be made.

To demonstrate this, just look at the Monty Hall problem. If you were not told how the host in this problem was choosing doors to open, you would simply use Bayes' theorem along with the given information that you did not choose the goat he shows you. This calculation would lead you to a 1/2 chance to have chosen a goat or the car, given you didn't choose the goat he shows you. But empyrical results would show a 1/3 chance for you to get the car by staying and a 2/3 chance to get the car by switching. This is because there was a correlation between how often you were given the information you were given and the outcome... whether or not you knew it.

Perhaps one can simply adjust by using this formula in place of Bayes' theorem when such a correlation is known:

Any time such a correlation exists, whether you know it or not, Bayes' theorem would lead you to an incorrect answer. - T. Z. K.

I have deleted this sentence, because the theorem as stated is correct and because the sentence is incomprehensible without explanation:

Bayes' theorem depends on the assumption that there is no correlation between the frequency with which information is given and the outcome.

Although I have not digested what is written above, I think that if there's an error, it is an erroneous way of applying the theorem, rather than an error in the way the theorem was stated. It says "you would simply use Bayes' theorem along with the given information that you did not choose the goat he shows you". But I think that if there is additional information to be used, it should have been included within "B" in the expression P(A|B). I don't know what "B is given" means above. The comments above are really not written clearly. Michael Hardy 22:08, 9 March 2006 (UTC)[reply]

Oh -- one other thing: "compliment" and "complement" are two different words that mean two different things. You wrote the former where you clearly meant the latter. Michael Hardy 22:39, 9 March 2006 (UTC)[reply]

Correlation isn't the right word, in any case. In statistics, 'correlation' has a specific meaning. What is meant is 'independence'. Things can be uncorrelated, but still dependent. And, in the Monty Hall example above, the mistake is failure to write down the correct likelihood function, assuming independence when it doesn't hold. Bill Jefferys 02:41, 10 March 2006 (UTC)[reply]

Ludicrously overtechnical and hard to read

I understood the introduction. So far, so good. Then I went to "Statement..." expecting a clear statement of what the theorem actually is in plain English. Oh dear. This article is absolutely useless to the layperson. I daresay I could walk away with an understanding of Bayes' theorem if I waded through all the symbols but why would I bother? I'm sure I can find a straightforward explanation of it somewhere else. And I can't sofixit. I have no idea what it should say; I only know it isn't saying it.-- Grace Note