Jump to content

Talk:Bayes' theorem: Difference between revisions

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
Content deleted Content added
move stuff to archive
add bot for archiving
Line 1: Line 1:
{{User:MiszaBot/config
{{Talk header|noarchive=yes}}
|archiveheader = {{talk archive navigation}}
|maxarchivesize = 70K
|counter = 5
|minthreadsleft = 5
|minthreadstoarchive = 1
|algo = old(90d)
|archive = Talk:Bayes' theorem/Archive %(counter)d
}}
{{talkheader}}
{{WikiProjectBannerShell|1=
{{WikiProjectBannerShell|1=
{{WP1.0|v0.7=pass|class=B|category=Math}}
{{WP1.0|v0.7=pass|class=B|category=Math}}

Revision as of 14:45, 28 March 2013

Bad introductory example

I can see where the writer was coming from, and its certainly an engaging example, but its probably a bad choice as it uses a fixed probability p=0.01 for the likelihood of a arbitary member of the population of having cancer. As the author no doubt new, this is a vast simplification, and in fact one could use Bayes' Theorem again to give an estimate of the likelihood that you have cancer, given that you've agreed to have a mammogram (significantly higher) since when making that choice you may be basing it on high prevalence of cancer in your family history etc etc. Perhaps with a modification saying that if all the population were screened when they were 50, irregardless of their family history (though then you'd have to also say that they were forced to have mammograms even if they didn't want to have them- as people declining would also require bayesian modification). You see where I'm coming from? I'd swap for a non biological example otherwise it complicated matters. Hai2410 (talk) 23:26, 24 May 2010 (UTC)[reply]

Same for the drug testing example, of course - you'd usually only test suspicious cases, where you expect the chances of actual abuse to be much higher. --195.57.192.25 (talk) 09:04, 17 February 2011 (UTC)[reply]

in the drug testing example, it would be useful to show the calculation for P(User|-); it would be 5 x 10^-5. So, even though there is a 0.33 probability that someone with a positive test is a user, that is still 6,600 times greater than the probability that someone with a *negative* test is a user. After all, users are 99 times more likely to have a positive test than non-users (positive LR=99). — Preceding unsigned comment added by 174.60.36.236 (talk) 03:46, 19 February 2012 (UTC)[reply]

I don't like the new one either. One point worth making is that implied asumptions are something which should be introduced later on when talking about probability, i.e. don't automatically assume that the liklihod of meeting a woman on a train, on the assumption that you are going to meet somebody, is .5. Women might tend not to travel so much on certain trains at certain times of day (an assumption I make - I do not know). It is a good idea to use a sex based example since not only are the sexes very distingushable but each sex has generally differing traits suich as what clothing each sex tends to wear - again, though, this is speaking in general terms. Why not try the faraway teacher example? A school specifically prohibits pupils climbing out of and into the ground floor windows of the school house. A teacher taking a lunchtime walk some distance from the school house sees a person wearing trousers jumping from a window of the school house. He is too far away to tell whether the person jumping from the window is male or female. He assumes that the person jumping from the window is a pupil (or there is aditional information allowing him to verify that fact). If the school comprises equal numbers of females and males and all the males wear trousers but only 40% of the females wear trousers then what is the probaility that the teacher has seen a female? Assume, importantly, that there are no social or psycological factors in play (females less likely to break school rules, males more likely to be outside anyway playing football and so on). Any thoughts? — Preceding unsigned comment added by Blueawr (talkcontribs) 13:35, 21 February 2012 (UTC)[reply]

Maybe somebody can come up with an example that isn't sexist? — Preceding unsigned comment added by 69.142.244.49 (talk) 03:38, 22 February 2012 (UTC)[reply]

Agree with above. Can we make a more culturally agnostic example than our quilting example so that readers do not have to grasp an assumption being made in order to understand it? Here is a rough idea - one based on the idea of rain - since its relevant properties are more timeless and more likely to be taken as trivially true regardless of a person's time and place:

Suppose I live in a relatively dry climate where it rains only 1 in 100 days. Thus if is the event that it rained today, the probability that it rained today is provided we have no further information. Now suppose that we go outside and notice that the ground is wet; let this event be called . This observed evidence drastically increases the probability that it rained today. As a result, our updated probability of given our evidence is going to be greater than . Let denote our updated probability - the probability that it rained today given that the ground is wet. We can calculate this quantity using Bayes' theorem as:
where is the probability that the ground is wet given that it rained, is the probability that it did not rain, and is the probability that the ground is wet given that it did not rain. Since there are several potential explanations for the ground being wet, will not be 1. For instance, it may be the case that it did not rain but that the ground is wet because of a flood. In this case, we would say that neither nor are 0.

Mmattb (talk) 20:47, 31 May 2012 (UTC)[reply]

An alternative approach

An alternative approach to finding conditional distributions is through the notion of disintegration.

Let X and Y be any two random variables (discrete or continuous or neither, in any combination). It's a theorem that the joint distribution of the two can be built up in the obvious way by combining (a) the marginal distribution of X, and (b) the family of conditional distributions of Y given X=x. Note on my use of the word the: it is a theorem any two choices of disintegration will be equal, up to events of probability zero.

So from this point of view all you have to do is to guess the answer and then check that it's correct. How to check the answer? well, you just have to check that the probabilities of enough events coincide, e,g. rectangles.

Reference: any modern book on measure theoretic probability, for instance David Pollard's book A User Guide to Measure Theoretic Probability (CUP).

Moral: no need to go through all this differentiation stuff. The defining property of a probability density is that when you integrate it you get probabilities of events. Differentiating a cumulative distribution is not the definition. It's a useful property of distribution functions which are smooth enough. There are also cumulative distribution functions which you can differentiate almost everywhere, but such that the result is not the probability density.

Conditional probability distributions are also defined by what happens when you integrate them. Example: we want to be able to calculate expectation values of functions of several random variables as follows: E(g(X,Y)) = E( E(g(X,Y)|X ) ). In other words: first of all fix X=x and compute E(g(x,Y)|X=x) by using the conditional distribution of Y given X=x. This results in something, say h(x), which in principle can depend on x. Now compute E h(X) by using the marginal distribution of X.

And yes indeed. It's a theorem that there exists a family of probability distributions of Y, one for each possible value of X, which we call the conditional distribution of Y given X, such that this recipe works (except possibly in situations where expectation values don't make any sense at all, ie. +infinity for the positive part, -infinity for the negative part). And moreover that family of probability distributions is uniquely defined (up to probability zero events for values of X).

Conclusion: use your intuition or use a heuristic limiting argument to guess the conditional distribution of Y given X. Then check that together with the marginal distribution of X it reproduces the joint distribution of X and Y. Richard Gill (talk) 13:55, 29 October 2011 (UTC)[reply]

PS see my essay on probability notation [1] Richard Gill (talk) 14:32, 29 October 2011 (UTC)[reply]

I did already some time ago, and I doubt whether it is always only lazyness for people to treat f(x) and f(y) as different functions. I would never show such nonsense to students. I also dislike the notation , as what actually given is the event {Y=y}. This is shown, but really clumsy, in . Nijdam (talk) 07:12, 30 October 2011 (UTC)[reply]

Article typical of much wiki nonsense -- instead of helping layman becomes evermore unavailable

Can anyone read the first several paragraphs of this article and understand it without having a good understanding of probability?

I actually have a BS in math, from 30 years ago, but since I haven't used it frequently, I am immediately put off by notation like P(A|B) WHICH IS NEVER DEFINED IN THIS ARTICLE.

I am sure this article only gets more and more precise and accurate.

DOES IT EVER GET MORE ACCESSIBLE TO THE LAYMAN?

Making articles ever more technical, and ever less accessible is a form of wiki rot and wiki masturbation.

The guideline covering these point is WP:MTAA. I recently updated much of the current article, which from my point of view is "one level down". However, maybe I misjudged that. I have just made some minor revisions that are hopefully improvements. I think you do have to be careful though, because trying to make things too accessible or too self-sufficient can detract from quality. For example, what exactly would you hope to take from an explanation of Bayes' theorem without mention of conditional probability or events? Should these things be explained in every article they are used? I think it is good to expect a particular body of knowledge, as long as readers are also clearly pointed to the prerequisites. Gnathan87 (talk) 22:43, 19 November 2011 (UTC)[reply]
I agree totally with the OP. I see this all over the place. It's useless to teach this way. I don't think it's typical of Wiki, but it is common. I call it "show-off writing."
I don't really have a great aptitude for reading or writing formulas. I know they're needed, but when mixed with complicated explainations, total confusion aboundsLonginus876 (talk) 12:32, 26 March 2013 (UTC)[reply]

"___ interpretation of probability"

Would it be acceptable to revise the visible text "under the freqeuentist interpretation of probablility, probability is..." to "under the frequentist interpretation, probability is..." and "under the Bayesian interpretation of probablility, probability is..." to "under the Bayesian interpretation, probability is..." ? I think it would improve clarity. —Monado (talk) 04:41, 8 December 2011 (UTC)[reply]

Agree. I've just made some edits to the lead which include this. Gnathan87 (talk) 01:52, 11 December 2011 (UTC)[reply]

Introduction

I wasn't complete happy with the former introduction, but I'm less with the present. It's not the mathematical relation that counts, but the interpretation. Somehow I would explain that if an event A may occur on the basis of several "causes" B1,...Bn, Bayes shows how likely is that Bi was the cause when A occurred. If others agree, I (we) may find a suitable formulation. Nijdam (talk) 18:20, 11 December 2011 (UTC)[reply]

Just to check, are you referring to the description of the frequentist interpretation or the Bayesian interpretation? If frequentist, I totally agree. What I have found tricky is that a description similar to that you provided is really just an interpretation of conditional probability, not of Bayes' theorem. The best way to understand it seems to me just to have a good understanding of conditional probability... I think we should be wary of trying to explain anything complex in the lead. IMHO what is appropriate is just to provide a straightforward interpretation of terms in the simple statement. For the Bayesian interpretation, the idea in any case was to move the details to Bayesian inference, which is why I thought it suitable just to provide a flavour of belief updating here. Gnathan87 (talk) 21:46, 11 December 2011 (UTC)[reply]

No, just to the very first sentences. ... gives the relationship between the probabilities of A and B, P(A) and P(B), and the conditional probabilities of A given B and B given A, P(A | B) and P(B | A)... is not informative concerning he meaning of the theorem.Nijdam (talk) 22:09, 11 December 2011 (UTC)[reply]

Hmmm... I'm not sure that's so bad actually. The interpretation is not so straightforward, and the second part of the lead is clearly devoted to it (beginning with "The meaning of Bayes' theorem..."). I think it's better to put the more general, simpler material up front. Part of the motivation for putting that stuff there was also to address some of the concerns that were raised about the lack of definitions. Gnathan87 (talk) 22:24, 11 December 2011 (UTC)[reply]

I'll give it a try: An event A may occur simultaneously with another event B or not, with conditional probabilities given that B has happened and given that B not has happened. If A actually has occurred the theorem of Bayes' relates the conditional probability that it was B that happened with the before mentioned reverse probabilities. As an example think of a certain disease A that may be caused by a shortage V of vitamins or by another cause, with known conditional probabilities P(A|V) that the disease is caused by V and P(A|not V) that is has another cause. If the disease A actually occurs, what would be the conditional probability P(V|A) it was caused by V? Bayes' formula relates:....etcNijdam (talk) 09:47, 12 December 2011 (UTC)[reply]

OK, I see where you're going. I've adapted what you wrote for use as the description of the frequentist interpretation. (It's now in the same format as the description of the Bayesian interpretation. not to suggest I think the new version is perfect - but better than what was there.) To copy here:
In the frequentist interpretation, probability measures the proportion of outcomes in which an event occurs. Bayes' theorem then links inverse representations of the frequency with which events and occur. For example, suppose that members of a population may or may not have a risk factor for a medical condition, and may or may not have the condition. The proportion with the condition will depend on whether the group with or without the risk factor is examined. The proportion having the risk factor will depend on whether the group with or without the condition is examined. If the proportions are known in one view, they may be converted to the other using Bayes' theorem. For events and , is the proportion of outcomes in that are outcomes in , and is the proportion of outcomes in that are outcomes in . and are the overall proportions of and .
I'm still not sure that this should replace what is at the top - I suspect that a suitable description of the different interpretations is always going to be too long to fit in the first paragraph. I think it's better to use the opening paragraph for the most general - and neutral - description, which also serves as a convenient place to define the notation. Gnathan87 (talk) 22:43, 12 December 2011 (UTC)[reply]

Content

There are complaint about the technicality of the article. It always appears that articles tend to much more technicality than needed for an encyclopedia. We have to focus on the more ore less layman, interested in the subject. That's why I rewrote part of the introduction and gave the introductory example. Nijdam (talk) 23:59, 1 January 2012 (UTC)[reply]

Of course, I'm all for accessibility, particularly in the lead. The big issue I have with the current lead is NPOV. It reads like the subjective interpretation is the indisputable meaning of Bayes' theorem:
"In probability theory and statistics, Bayes' theorem is a method of incorporating new knowledge to update the value of the probability of occurence of an event."
Objectivists (such as Popper) strongly would disagree. I think both interpretations must be given equal weight in the lead. (Having said this, I also think that first foot forward in Bayes theorem should tend to be the frequency perspective, basically because Bayesian inference has its own article.) Starting with "Bayes' theorem is [one particular view]" is problematic from an NPOV perspective, which is why I have always kept the more neutral, mathematical explanation right at the top. It is also my preference, because this article is about Bayes theorem, not Bayesian inference, or anything else. Bayes' theorem is fundamentally a mathematical relation, and so that is how I think it should be introduced.
I would also point out that the description is actually wrong - Bayes' theorem is not a "method". The current description is of Bayesian inference.
Part of the problem with accessibility may well be that the frequency interpretation is difficult to explain succinctly (at least, I have found it difficult). It is certainly tricky to explain its significance to the layman. However, I would argue that this is not a reason to remove it from the lead. Incidentally, I would now suggest that we do not worry about interpreting each term in the lead, since that material is now laid out below. This will greatly improve accessibility.
Finally, I'm not sure about having the example up front like that. First of all, it is not NPOV. Secondly, although this is certainly the approach I would take if writing e.g. a textbook, it doesn't strike me as suitable here. Not everybody will want to use the article like a tutorial, some more like a reference. And it is pretty straightforward for the reader to skip to the examples as necessary (and no doubt they would expect that format anyway). Actually, before you added the example I was trying to avoid getting too much into Bayesian inference in Bayes' theorem, but having seen it I actually really like the idea of having a short Bayesian example alongside the others. (By the way, might the thing about quilting come over as politically incorrect?! ^^)
Gnathan87 (talk) 07:15, 2 January 2012 (UTC)[reply]

Edited the lead again, attempted to take everything that has been said on board. The new lead is definitely much more layman-friendly, and NPOV. Hopefully we are progressing towards a consensus? :) (Although, I am still not sure about the introductory example.) Gnathan87 (talk) 07:32, 8 January 2012 (UTC)[reply]

I'm not happy with the new intro. It seems that the meaning of the theorem and the interpretation of probability are somehow interwoven. Nijdam (talk) 20:09, 13 January 2012 (UTC)[reply]
That is in my view as it should be - is it not the interpretation of probability that bears directly on how the theorem is to be interpreted as a whole? For example, using a frequency interpretation, epistemological terms such as "updating beliefs" are meaningless. Of course, it is often possible to view the same example either way., e.g. the beetles example could be seen as either updating the state of knowledge about the rarity of the beetle, or calculating the frequency with which a patterned beetle is rare. Gnathan87 (talk) 14:01, 14 January 2012 (UTC)[reply]

An excellent figure

On the topic of making the article more accessible, I suggest using figure 2 of the article by Spiegelhater et al. 2011. The only problem is that it is probably copyright protected, but something similar should be easy to come up with (using your beetles example, for instance).

Spiegelhater, D., M. Pearson and I. Short. 2011. Visualizing Uncertainty about the Future. Science, vol. 333, pp. 1393-1400.

event?

I speak English. I've read many books about science and physics and I can follow most of them without any problem. But when I read the introductory example on this page, it seemed like greek to me:Call W the event he spoke to a woman, and Q the event "a visitor of the quilt exhibition".

This may be a valid use of the word event in some technical circles, but its nonsense to most people. Does it mean "Call W the event WHERE he spoke to a woman."? or does it mean "Call W the probability that he spoke to a woman?" or perhaps it means "Call W the event OF speaking to a woman"? Or maybe it means something else entirely. I don't know and I haven't a clue what it means.Rodeored (talk) 00:27, 17 February 2012 (UTC)[reply]

Spelling of Bayes's

I believe the proper grammar is Bayes's as opposed to Bayes'.

Nope, the proper way to make a noun ending in "s" possessive is to add an apostrophe without an additional "s". Hence "Thomas' shoes" and "Bayes' Theorem".
It is not that simple. Both New York Times and Oxford University Press write "Bayes's Theorem". Is "Bayes's" more British or more archaic version? I am not a native speaker, and I think I was taught to write "Charles's" many, many years ago... 82.181.47.81 (talk) 08:24, 24 June 2012 (UTC)[reply]

In the English language, the possessive of a singular noun is formed by adding apostrophe and s, regardless of whatever letter the word ends with, e.g., Thomas's pen. The possessive of a plural noun (such as Joneses) is formed by adding s and apostrophe, unless the word already ends in s, in which case just add apostrophe, e.g., the Joneses' house. The point is that apostrophe and s versus s and apostrophe marks a vital semantic distinction. The term Bayes' theorem thus correctly applies only to a theorem drafted by a group of people all named Baye. By contrast, Bayes's theorem denotes a theorem drafted by one person named Bayes.

Reverend Thomas Bayes was not a plural person, and his surname was not Baye.

Many Wikipedians will be inclined to cite any number of popular contrary uses as if they somehow made gratuitous confusion between singular possessive and plural possessive to be acceptable. Sadly, one of those uses would the title of Sharon Grayne's excellent book on the good Reverend's work. To those Wikipedians I would say: How do you distinguish between recording of specie (payment, singular) and recording of species (types, plural), if you'd say species' recording for both?

(It's possible that specie in that sense is a mass noun, not countable, but it's difficult to think offhand of a pair of good common nouns for this illustration, but I hope the point is clear anyway.)
(The unsigned comment added by 208.88.176.15 (talk) 1 November 2012)

I agree that proper use is Bayes's rather than Bayes', whereas the latter could be quite common due to popular confusion or mis-application of grammar rules. Most respected traditional English language grammar styles would unambiguously agree on "Bayes's". I'll go ahead and make changes unless there's a good overwhelming evidence that the current version (Bayes') is one of the few rare exceptions. cherkash (talk) 00:18, 26 January 2013 (UTC)[reply]
Don't ignore the preceding dscussion at http://en.wikipedia.org/wiki/Talk:Bayes%27_theorem/Archive_1#Spelling_of_of_possessive_ending_in_.27s.27 Further. a typical grammar book says "names ending in "-es" pronounced iz are treated like plurals and take only an apostrophe ..." (Oxford English, OUP). 81.98.35.149 (talk) 23:50, 27 January 2013 (UTC)[reply]
You will need to provide a better reference, the one you gave is very vague (is "Oxford English" a book? who's the author? ISBN?). The reference you alluding to is also dubious, as according to it Jones's should be Jones', etc. – which is not the case that most manuals of style would agree with. Besides, the above argument on Baye's and Bayes' is as reasonable as anything you could find in support of proper English grammar. Further, although some speakers may prefer to pronounce Bayes's as Bayes' (the main argument that most proponents of using loose rules on possessives would allude to), there's clear value to adhere to Bayes's in writing as it avoids ambiguity between plural and singular possessives. cherkash (talk) 01:22, 28 January 2013 (UTC)[reply]
The ref has isbn 0198691696, which is better detail than antyhing you have provided. You clearly haven't taken on board the difference in pronouncation between Jones and Bayes, which is of some importance. In particular, everyone says "Bayes theorem", rather than "Bayeses theorem". Further, Wikipedia rules are to follow the general usage within the general field concerned, here statistics and probability, rather than to impose some global set of rules for supposed uniformity. The fact is that general usage in statistics and probability is Bayes' theorem. 81.98.35.149 (talk) 08:44, 28 January 2013 (UTC)[reply]
Both spellings are acceptable in British English but Bayes' is far more common for this subject. Martin Hogbin (talk) 14:57, 22 March 2013 (UTC)[reply]

Introductory example - Sexism?

The sentence reads: "If he told you the person he spoke to was going to visit a quilt exhibition, it is far more likely than 50% it is a woman. " Clearly, in our modern society this is sexist, as men are just as likely to go to quilt exhibitions. I don't think that Wikipedia should be biased. — Preceding unsigned comment added by 129.215.5.255 (talk) 14:12, 12 April 2012 (UTC)[reply]

I don't know about sexist, but it could certainly do with being more explicit. Add, for example, that you read in the paper that 95% of quilt conference attendees are female, and you have something that relies less on your individual assumptions about gender roles and more on a rational interpretation of known facts. ...BTW, there are quilt conferences? I have never heard of this. — Preceding unsigned comment added by 184.187.186.33 (talk) 23:41, 24 May 2012 (UTC)[reply]
Shouldn't the question be "what percentage of people with long hair are women"? Not what percentage of women have long hair. Or are these recipricals of each other. I don't really understand these things very well. It's just a question.Longinus876 (talk) 12:15, 26 March 2013 (UTC)[reply]

Removed the technical tag

I've reworked the introductory example, and I hope I've made it a bit clearer (and also a bit less sexist). So I'm removing the {{technical}} template. Feel free to re-add if you still think this needs work. --Zvika (talk) 09:45, 10 June 2012 (UTC)[reply]

I do not like the new formulation of the introductory example. It's much more complicated than the former one. The one I introduced showed all that's needed. Nijdam (talk) 21:20, 10 June 2012 (UTC)[reply]
The present version looks OK to me. Of course there is the unstated assumption that equal numbers of men and women might be encountered on a train, which seems unlikely to hold ... perhaps it actually needs to be expanded to make this assumption clearer. If someone does have reason to replace the {{technical}} template, it would help if they could indicate a section rather than the whole article, or be more specific here on the talk page. Melcombe (talk) 21:45, 10 June 2012 (UTC)[reply]
What was wrong with my "quilt" example? Will anyone change the introductory example if they like their version better? Nijdam (talk) 08:36, 11 June 2012 (UTC)[reply]

Spelling

If English spelling demands Bayes's theorem is spelled this way, we should correct this anywhere it is necessary. Nijdam (talk) 11:43, 19 November 2012 (UTC)[reply]

No comment? Nijdam (talk) 16:49, 10 December 2012 (UTC)[reply]

See a few sections above. 81.98.35.149 (talk) 23:51, 27 January 2013 (UTC)[reply]

Bayes' Rule vs Bayes' Theorem

The correct way in English to express the possessive when a name ends in an s is by a single apostrophe after the s which is already there. See for instance http://www.cs.ubc.ca/~murphyk/Bayes/bayesrule.html, http://plato.stanford.edu/entries/bayes-theorem/

The text on Bayes' rule said that it depended on the Bayesian interpretation of probability but that is not true. Bayes' rule is equivalent to Bayes' theorem and both are valid for any probability interpretation. The equivalance is a mathematical fact which follows from the normalization of probability. If a probability space is partitioned into some events A, B, C, ... and you know the probabilities of A, B, C ... up to proportionality, then you know them absolutely, you just have to divide by the sum of what you already have. Richard Gill (talk) 13:23, 22 March 2013 (UTC)[reply]