Talk:Bayesian inference

From Wikipedia, the free encyclopedia
Jump to: navigation, search
WikiProject Statistics (Rated C-class, Top-importance)
WikiProject icon

This article is within the scope of the WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page or join the discussion.

C-Class article C  This article has been rated as C-Class on the quality scale.
 Top  This article has been rated as Top-importance on the importance scale.
 
WikiProject Mathematics (Rated C-class, High-importance)
WikiProject Mathematics
This article is within the scope of WikiProject Mathematics, a collaborative effort to improve the coverage of Mathematics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
Mathematics rating:
C Class
High Importance
 Field: Probability and statistics
One of the 500 most frequently viewed mathematics articles.

Formula in Philosophical section[edit]

P(P|E) and others in formula and in text. I think these should be like P(A|B), like in the bayes' theorem article. --Pasixxxx (talk) 19:04, 6 January 2012 (UTC)

Disagree. This notation here makes clearer the epistemological interpretation of Bayes' theorem (dealing with a "Proposition" and "Evidence"). However, Bayes' theorem is fundamentally a mathematical relation on a probability space with no particular interpretation, which is why in my view it should be presented in its own article with more neutral symbols such as A and B. Admittedly, focus in Bayes' theorem has recently shifted towards a Bayesian interpretation; I have opposed that on the talk page because I do not think it is sufficiently NPOV. Gnathan87 (talk) 23:21, 7 January 2012 (UTC)
Letter P should not mean both probability and proposition in the same formula. So P(P|E) cannot be accepted. Bo Jacoby (talk) 20:20, 8 January 2012 (UTC).
Hmm. Technically, the syntactic context resolves any ambiguity (i.e. the probability function is defined to take propositions as arguments, and evaluates to a real, not a proposition). I think that the benefits of labelling "proposition" as "P" outweigh the (quite small) potential for confusion here. Gnathan87 (talk) 06:44, 12 January 2012 (UTC)
Thinking about it, one solution would be to use the function C() for "Credence" instead of P() for "Probability". C() is used in philosophy to emphasise that the interpretation of probability being used is subjective probability, aka credence. I'm not overly keen on this, though, because it adds extra complexity and is inconsistent with the notation in the rest of the article. Also, I'm not sure readers would be familiar with the term credence. Gnathan87 (talk) 12:21, 12 January 2012 (UTC)
This problem can be overcome by use of the Pr notation. After all, <math>\Pr</math> produces \Pr automatically. It is one of the notations in List of mathematical symbols (under P), and is widely used. There is no need to invent new notation (and anyway that woild not be allowable on Wikipedia). Melcombe (talk) 16:05, 12 January 2012 (UTC)
Although it may be worth bearing in mind that again, in the philosophical literature, Pr() is used to distinguish "Propensity" (i.e.objective probability) from P() for "Probability" (see e.g. http://www.joelvelasco.net/teaching/3865/humphreys%2085%20-%20why%20propensities%20cannot%20be%20probabilities.pdf) Gnathan87 (talk) 19:17, 12 January 2012 (UTC)

Bayes theorem?[edit]

It is misleading to say that Bayesian statistics is based on Bayes theorem. The crucial point of Bayesian reasoning is that we are treating our hypothesis as a random variable, and getting the average expectation based on all values of H. The relevant rule is the Law of total probability, as we are predicting new observations on the basis of old observations:


\begin{align}
P(O_{new} | O_{old} ) &= \sum_{h\in H} P(O_{new} | H=h, O_{old} ) P( H=h | O_{old} ) & \mbox{law of total probability} \\
 &= \sum_{h \in H} P(O_{new} | H=h ) P( H=h | O_{old} ) & \mbox{new observations are conditionally independent of old observations, given the hypothesis}\\
\end{align}

With summations replaced with integrals for continuous h.

While Bayes rule is often used in Bayesian models, it is not what makes a model Bayesian. Bayesian reasoning (averaging over hypotheses) and Bayes rule simply happen to have been discovered by the same person, Thomas Bayes. What say the editors? Raptur (talk) 23:26, 5 February 2012 (UTC)

I don't see where the article says "is based on", it is more "uses". What you say above looks reasonable, but can you find a "reliable source" taking the same approach? You also need to consider the article Bayesian probability, which goes into Bayesain interpretation of probability. (That is, finding the most appropriate place to include your points). Melcombe (talk) 15:15, 6 February 2012 (UTC)
In the "Philosophical Background" section, it says that Bayes Rule is the "essence" of Bayesian inference, which is not the case. Raptur (talk) 22:02, 13 February 2012 (UTC)
I do agree that "the essence of" might be misleading when conceptualising Bayesian inference as modifying a distribution. However, as explained below, I do think it is right to begin with this "building block" before moving onto the bigger picture. I've changed "the essence of" to "the fundamental idea in", catering better for all views? Gnathan87 (talk) 03:17, 24 February 2012 (UTC)
What the Bayesian inference article describes is in fact based on Bayes' theorem, for the most part involving the use of new evidence to update one's likelihood estimate for the truth of a single fixed hypothesis, not involving any form of averaging over hypotheses. Bayesians such as I. J. Good have emphasized this in their published writings. The sum over hypotheses shown in the article is just a normalizing factor and is independent of which particular "M" has been selected. What you describe may be some related topic, but it is not what this article is discussing. — DAGwyn (talk) 22:39, 8 February 2012 (UTC)
No. See equation (3) from this paper on Bayesian Hidden Markov Models, equation (1) from this field review, equation (9) from this paper on a Bayesian explanation of the perceptual magnet effect, or Griffiths and Yuille (2006) (some lecture notes available here). Bayesian statistics is about computing a joint probability over observations and hypotheses. Of course, once you have a full joint, you can easily compute any particular conditional probability, including the one computed by Bayes Rule. Hidden markov models, to use the example from the first paper I provided, use Bayes rule in computing marginals over the hidden variables. Parameter estimation (notably the Expectation-maximization algorithm) for Hidden Markov Models, however, often takes only a Maximum Likelihood Estimate rather than estimating the full joint over model parameters (and data). Thus, an HMM trained with the expectation-maximization algorithm uses Bayes rule and changes its expectations after seeing evidence ("training data" in the parlance of the paper), but it is not Bayesian because it maintains no uncertainty about model parameters. Raptur (talk) 22:02, 13 February 2012 (UTC)
None of these 3 sources say anything like "Bayesian inference is ....". I don't see them mentioning inference at all. Instead they talk about "Bayesian model averaging" and "Bayesian modeling" ... which you might think amount to the same thing. But do find something that both meets WP:RELIABLESOURCES and is explicitly about "Bayesian inference". On what you said in the first contrib to this section ... this seems to correspond somewhat to predictive inference. Melcombe (talk) 00:46, 14 February 2012 (UTC)
I've been avoiding this source, since I don't have an electronic version of it, but "Bayesian Data Analysis: Second Edition" by Andrew Gelman, John B. Carlin, Hal S. Stern, and Donald B. Rubin opens with, on page 1, headed "Part 1: Fundamentals of Bayesian inference": "Bayesian inference is the process of fitting a probability model to a set of data and summarizing the result by a probability distribution on the parameters of the model and on unobserved quantities such as predictions for new observations." This is exactly the statement that Bayesian inference is about maintaining probability distributions over both data and models. On page 8, it says: "The primary task of any specific application is to develop the model p(\theta, y) and perform the necessary computations to summarize p(\theta|y) in appropriate ways," where \theta comprises our model parameters and y comprises our data. The probability of a model given data is an important part of Bayesian inference (indeed, it's the second term of the marginalization expression), but what distinguishes Bayesian inference from other statistical approaches is maintaining uncertainty about your model.
I should point out that the other papers I provided do take this approach. The Bayesian HMM paper says "In contrast \[to a point estimate such as Maximum Likelihood or Maximum a posteriori\], the Bayesian approach seeks to identify a distribution over latent variables directly, without ever fixing particular values for the model parameters. The distribution over latent variables given the observed data is obtained by integrating over all possible values of \theta:" and presents the object of computation as P(t|w) = \int P(t|w,\theta)P(\theta|w)d\theta. The perceptual magnet effect paper says on page 5 that listeners are trying to compute the posterior on targets, and on page 6 says that this quantity is computed by marginalizing over category membership: p(T | S) = \sum_c \! P(T| S, c) P( c | S ). The lecture notes I referred to give, in section 6 titled "Bayesian Estimation," precisely the derivation I opened with, and explicitly contrast Bayesian inference with point estimates of model parameters. — Preceding unsigned comment added by Raptur (talkcontribs) 12:28, 14 February 2012 (UTC)
The single hypothesis vs. joint distribution thing is something that has been disputed for some time (and in fact was the subject of comment from Gelman himself). My personal conclusion is this: Bayesian inference is, in practice, used most often by scientists and engineers, who tend to use it exclusively over distributions. In this context, it makes sense to think of inference as being fundamentally something you do on a joint distribution. However - Bayesian inference in philosophy of science tends to be expressed in terms of single hypotheses.
As Bayesian inference is an important topic in philosophy of science, and Bayes' theorem is in any case the building block of the joint distribution view, I certainly think it is appropriate to begin with an explanation of the single hypothesis view before covering distributions. Also, applications in a courtroom setting, discussed later in the article, have to date been the single hypothesis view (as seems appropriate in that context), so it should at least have been mentioned. A final reason to cover the single hypothesis view is because, at least in my view, it is more accessible and insightful for those new to the topic than charging ahead to joint distributions. Gnathan87 (talk) 21:05, 23 February 2012 (UTC)
This is an article on mathematical statistics, so it seems strange to defer to terminology from the philosophy of science or criminal law. I do think a brief summary or a link to an article discussing the Bayesian view of probability is warranted. Scientists, statisticians, and engineers often use (what this article calls) "Bayesian Inference" without averaging over models (as in conditional random fields and MLE Bayes nets, which all use Bayes' theorem), but they don't call it "Bayesian Inference."
And again, Bayes' theorem is not the defining building block of the joint distribution view; the rule of total probability is. Bayes' theorem is just the easiest way to compute the second term of the marginalization expression. Finally, I don't think it's a good idea to focus on a topic just because it's easier if it's actually only partially related to the title of the article. This misleads readers (including students from a course for which I am a TA) into thinking "Bayesian Inference" is inference using Bayes' Theorem, when really it is inference that appreciates subjective uncertainty about your model. Raptur (talk) 15:30, 26 February 2012 (UTC)
I know this is an old discussion, but Raptur is entirely correct here, and the Gelman characterization of Baysian inference is far far better than the current introduction. In fact, I would go as far as to say that the current into is simply wrong. Bayes theorem is often important in Bayesian inference but the way the first sentence is phrased implies Bayes theorem is the basis of it, when that simply is not the case. Also, the intro confuses the idea of "Bayesian updating" (without even really defining it, although I can guess what the author means) with Bayesian inference. I have yet to read the whole article. but the intro hardly instills confidence in what comes below... Atshal (talk) 16:49, 14 November 2012 (UTC)
I just want to emphasize Atshal's point that the "Bayesian updating" and rationality stuff is really misleading as currently written. After reading up to that section, one would have the impression that Bayesian inference is the application of Baye's rule, and that there is something controversial about Baye's rule. There is nothing whatsoever controversial about Baye's rule; Bayes' rule is derived directly from the definition of conditional probability using very simple algebra. The controversy over Bayesian inference stems from the treatment of hypotheses as random variables rather than as fixed, and over the necessary use of a prior. Raptur (talk) 15:37, 20 November 2012 (UTC)
Ugh, the very first section is "Introduction to Bayes' Rule". Why is it not "Introduction to Bayesian Inference", since this article is about Bayesian inference and not Bayes' Rules? Atshal (talk) 16:52, 14 November 2012 (UTC)

Deletion of unnecessary maths portions[edit]

Despite the warnings that stuff that does not meet Wikipedia standards will be deleted, someone feels an explanation is needed. Thus this appeared on my talk page following the re-insertion of clearly inappropriate material. " Melcombe. While still not using the discussion page you remove useful subsections. That is plain vandalism on your part. Bo Jacoby (talk) 23:44, 20 February 2012 (UTC)."


I responded as follows and repeat it here for info. Melcombe (talk) 22:47, 22 February 2012 (UTC)

This is of course nonsense. The material is plainly WP:OR as no WP:Reliable sources have been included. It also fails the test of being important to the description of Bayesian inference in general terms, and serves only to get in the way of expansion in useful directions. Moreover, it fails any reasonable excuse for being included as a "proof" of a result such as outlined as possibilities in WP:MOSMATH. Applying the rules of Wikipedia standards is clearly not vandalism. If this stuff were "useful" someone could provide a source for it and it might then be sensible to construct an article specific to Bayesian inference for that specific distribution, in which this stuff could be included. Melcombe (talk) 22:47, 22 February 2012 (UTC)

The matter was discussed above and on the archived talk page here. Melcombe removed two sections. The one on bayesian estimation of the parameter of a binomial distribution has Bayes himself as a source. The other one, on bayesian estimation of the parameter of a hypergeometric distribution, having Karl Pearson as the source, generalizes the first one and is conceptually simpler, because the estimated parameter only takes a finite number of possible values, and so the prior distribution is defined by the principle of insufficient reason (which was not applicable for the continuous parameter of the binomial distribution). My question to Melcombe regarding his need for further explanation or proof was repeated on his talk page and remains unanswered. Melcombe's contribution is destructive, and he is neither seeking consensus nor compromise. Bo Jacoby (talk) 04:27, 23 February 2012 (UTC).

The material deleted has neither person as an explicit source, and no explicit source at all. The second sentence of WP:Verifiability is "The threshold for inclusion in Wikipedia is verifiability, not truth — whether readers can check that material in Wikipedia has already been published by a reliable source, not whether editors think it is true." ... and this amounts to providing WP:Reliable sources in the article. Then there is the question of whether the amount of mathematical detail that was included should appear in an encyclopedia article. It is clear that it should not, as it fails for reasons described at WP:NOTTEXTBOOK and in WP:MOSMATH. No amount of discussion can reasonably override these established policies. What was left is a summary of the result at a suitable level of detail. Bayes' contribution is contained in a much shortened section, as there is now a separate article on this source publication. Melcombe (talk) 18:09, 24 February 2012 (UTC)

I suppose that Melcombe is right in assuming that he can learn nothing from other wikipedia editors. The deleted sections were referenced from here showing that they are useful in answering elementary questions. The missing sources were mentioned on the talk page, which Melcombe didn't care to read before removing the material. An improvement, rather than a deletion, could have been made. Bo Jacoby (talk) 18:43, 27 February 2012 (UTC).

Bo,
Trotting out your first sentence harms your case. The usual wicked pleasure of provoking another editor won't be available here, because Melcombe will just ignore the bait anyhow. You'ld have better luck with me!
WP is not a textbook, so the helpfulness of the material in answering questions is an irrelevant although good fact.
That said, updating a distribution for the Bernoulli proportion is the simplest and most conventional example, so I should hope that you and Melcombe could agree on a version. Why not look at Donald Berry's book (or the earlier and rare book by David Blackwell), or the famous article for psychologists by Savage, Edwards, and Lindman?
Maybe you both could take a break from beating on each other, and beat on me for a while? ;D *LOL* I made a lot of edits in the last day, and some of them must strike you as terrible!
Cheers,  Kiefer.Wolfowitz 19:03, 27 February 2012 (UTC)

Thanks to Kiefer.Wolfowitz for the reaction. I have got no case to harm. I included the formulas for mean value and standard deviation for the number K of hits in a population of N items, knowing the number k of hits in a sample of n items. I did not myself consider these formulas to be original research on my part, because they are found in the works of Karl Pearson, but, not knowing these formulas himself, Melcombe considers them to be original research on my part, and so he removed them from wikipedia. To me this is a win-win situation: Either my contribution is retained in wikipedia, or I get undeserved credit for inventing the formulas. Cheers! Bo Jacoby (talk) 16:07, 28 February 2012 (UTC).

"Inference over a distribution"[edit]

(@Melcombe) A query on your revert: the intended "distribution" was the probability distribution over some set of exclusive/exhaustive possibilities. Is there some reason that is technically not a "distribution"? I think "Inference over a distribution" is a better name for that section, because it then clearly distinguishes it as an extension of inference on a single hypothesis from the previous section. Gnathan87 (talk) 23:11, 24 February 2012 (UTC)

What was in the section section at the time contained no mention of "distribution", nor gave any indication of what was distributed over what. You are relying far too much on telepathy on the part of readers, with unexplained equations using unexplained and newly invented notations. What WP:Reliable sources are you using? Melcombe (talk) 07:56, 26 February 2012 (UTC)
OK, I see. On second thoughts I do agree. I've tried a new name; might still be able to do better though.
I do admit to not having added sufficient sources recently. My view has been that given the frequent changes to the article, it has been in the interests of the article to concentrate on the basic structure and text, particularly since the content can be found or inferred from any text covering Bayesian inference. Once it begins to stabilize (as it is, I think, now doing), then it will be much clearer which sources are appropriate and where. As for "unexplained equations using unexplained and newly invented notations", I must say that I am unsure what you refer to. There does not appear to me to be anything unexplained inappropriately to the level of the subject, or newly invented. Gnathan87 (talk) 15:07, 26 February 2012 (UTC)

Rationality[edit]

All this stuff about Bayesian inference being the model of rationality is ignorant nonsense.

I clarified this more than a year ago, with in-line cites to the highest quality most reliable sources, and the clarification remains in the article, which I suppose was intolerable.

I don't understand how people can write an article that is shown to be patent nonsense by the later discussion in the same article.  Kiefer.Wolfowitz 17:11, 26 February 2012 (UTC)

The article has now been changed, but I would mention that nowhere did previously it state that Bayesian inference was "the" model of rationality. The previous lead read "how the degree of belief in a proposition might change due to evidence." and that it is "a model of rational reasoning". Later on, we have "The philosophy of Bayesian probability claims". None of these suggest that Bayes' theorem is the sole theory of rationality. It was simply keeping the information relevant to the article, rather than expanding on other possible theories and techniques. Maybe the issue is that this simply was not sufficiently emphasised. Gnathan87 (talk) 19:57, 26 February 2012 (UTC)
Before my edits, the article's lede asserted uniqueness of Bayesian updating as the rational system.  Kiefer.Wolfowitz 21:26, 26 February 2012 (UTC)