Talk:Bayesian inference

From Wikipedia, the free encyclopedia
Jump to: navigation, search
WikiProject Statistics (Rated C-class, Top-importance)
WikiProject icon

This article is within the scope of the WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page or join the discussion.

C-Class article C  This article has been rated as C-Class on the quality scale.
 Top  This article has been rated as Top-importance on the importance scale.
WikiProject Mathematics (Rated C-class, High-importance)
WikiProject Mathematics
This article is within the scope of WikiProject Mathematics, a collaborative effort to improve the coverage of Mathematics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
Mathematics rating:
C Class
High Importance
 Field: Probability and statistics
One of the 500 most frequently viewed mathematics articles.

Bayes theorem?[edit]

It is misleading to say that Bayesian statistics is based on Bayes theorem. The crucial point of Bayesian reasoning is that we are treating our hypothesis as a random variable, and getting the average expectation based on all values of H. The relevant rule is the Law of total probability, as we are predicting new observations on the basis of old observations:

With summations replaced with integrals for continuous h.

While Bayes rule is often used in Bayesian models, it is not what makes a model Bayesian. Bayesian reasoning (averaging over hypotheses) and Bayes rule simply happen to have been discovered by the same person, Thomas Bayes. What say the editors? Raptur (talk) 23:26, 5 February 2012 (UTC)

I don't see where the article says "is based on", it is more "uses". What you say above looks reasonable, but can you find a "reliable source" taking the same approach? You also need to consider the article Bayesian probability, which goes into Bayesain interpretation of probability. (That is, finding the most appropriate place to include your points). Melcombe (talk) 15:15, 6 February 2012 (UTC)
In the "Philosophical Background" section, it says that Bayes Rule is the "essence" of Bayesian inference, which is not the case. Raptur (talk) 22:02, 13 February 2012 (UTC)
I do agree that "the essence of" might be misleading when conceptualising Bayesian inference as modifying a distribution. However, as explained below, I do think it is right to begin with this "building block" before moving onto the bigger picture. I've changed "the essence of" to "the fundamental idea in", catering better for all views? Gnathan87 (talk) 03:17, 24 February 2012 (UTC)
What the Bayesian inference article describes is in fact based on Bayes' theorem, for the most part involving the use of new evidence to update one's likelihood estimate for the truth of a single fixed hypothesis, not involving any form of averaging over hypotheses. Bayesians such as I. J. Good have emphasized this in their published writings. The sum over hypotheses shown in the article is just a normalizing factor and is independent of which particular "M" has been selected. What you describe may be some related topic, but it is not what this article is discussing. — DAGwyn (talk) 22:39, 8 February 2012 (UTC)
No. See equation (3) from this paper on Bayesian Hidden Markov Models, equation (1) from this field review, equation (9) from this paper on a Bayesian explanation of the perceptual magnet effect, or Griffiths and Yuille (2006) (some lecture notes available here). Bayesian statistics is about computing a joint probability over observations and hypotheses. Of course, once you have a full joint, you can easily compute any particular conditional probability, including the one computed by Bayes Rule. Hidden markov models, to use the example from the first paper I provided, use Bayes rule in computing marginals over the hidden variables. Parameter estimation (notably the Expectation-maximization algorithm) for Hidden Markov Models, however, often takes only a Maximum Likelihood Estimate rather than estimating the full joint over model parameters (and data). Thus, an HMM trained with the expectation-maximization algorithm uses Bayes rule and changes its expectations after seeing evidence ("training data" in the parlance of the paper), but it is not Bayesian because it maintains no uncertainty about model parameters. Raptur (talk) 22:02, 13 February 2012 (UTC)
None of these 3 sources say anything like "Bayesian inference is ....". I don't see them mentioning inference at all. Instead they talk about "Bayesian model averaging" and "Bayesian modeling" ... which you might think amount to the same thing. But do find something that both meets WP:RELIABLESOURCES and is explicitly about "Bayesian inference". On what you said in the first contrib to this section ... this seems to correspond somewhat to predictive inference. Melcombe (talk) 00:46, 14 February 2012 (UTC)
I've been avoiding this source, since I don't have an electronic version of it, but "Bayesian Data Analysis: Second Edition" by Andrew Gelman, John B. Carlin, Hal S. Stern, and Donald B. Rubin opens with, on page 1, headed "Part 1: Fundamentals of Bayesian inference": "Bayesian inference is the process of fitting a probability model to a set of data and summarizing the result by a probability distribution on the parameters of the model and on unobserved quantities such as predictions for new observations." This is exactly the statement that Bayesian inference is about maintaining probability distributions over both data and models. On page 8, it says: "The primary task of any specific application is to develop the model and perform the necessary computations to summarize in appropriate ways," where comprises our model parameters and comprises our data. The probability of a model given data is an important part of Bayesian inference (indeed, it's the second term of the marginalization expression), but what distinguishes Bayesian inference from other statistical approaches is maintaining uncertainty about your model.
I should point out that the other papers I provided do take this approach. The Bayesian HMM paper says "In contrast \[to a point estimate such as Maximum Likelihood or Maximum a posteriori\], the Bayesian approach seeks to identify a distribution over latent variables directly, without ever fixing particular values for the model parameters. The distribution over latent variables given the observed data is obtained by integrating over all possible values of :" and presents the object of computation as . The perceptual magnet effect paper says on page 5 that listeners are trying to compute the posterior on targets, and on page 6 says that this quantity is computed by marginalizing over category membership: . The lecture notes I referred to give, in section 6 titled "Bayesian Estimation," precisely the derivation I opened with, and explicitly contrast Bayesian inference with point estimates of model parameters. — Preceding unsigned comment added by Raptur (talkcontribs) 12:28, 14 February 2012 (UTC)
The single hypothesis vs. joint distribution thing is something that has been disputed for some time (and in fact was the subject of comment from Gelman himself). My personal conclusion is this: Bayesian inference is, in practice, used most often by scientists and engineers, who tend to use it exclusively over distributions. In this context, it makes sense to think of inference as being fundamentally something you do on a joint distribution. However - Bayesian inference in philosophy of science tends to be expressed in terms of single hypotheses.
As Bayesian inference is an important topic in philosophy of science, and Bayes' theorem is in any case the building block of the joint distribution view, I certainly think it is appropriate to begin with an explanation of the single hypothesis view before covering distributions. Also, applications in a courtroom setting, discussed later in the article, have to date been the single hypothesis view (as seems appropriate in that context), so it should at least have been mentioned. A final reason to cover the single hypothesis view is because, at least in my view, it is more accessible and insightful for those new to the topic than charging ahead to joint distributions. Gnathan87 (talk) 21:05, 23 February 2012 (UTC)
This is an article on mathematical statistics, so it seems strange to defer to terminology from the philosophy of science or criminal law. I do think a brief summary or a link to an article discussing the Bayesian view of probability is warranted. Scientists, statisticians, and engineers often use (what this article calls) "Bayesian Inference" without averaging over models (as in conditional random fields and MLE Bayes nets, which all use Bayes' theorem), but they don't call it "Bayesian Inference."
And again, Bayes' theorem is not the defining building block of the joint distribution view; the rule of total probability is. Bayes' theorem is just the easiest way to compute the second term of the marginalization expression. Finally, I don't think it's a good idea to focus on a topic just because it's easier if it's actually only partially related to the title of the article. This misleads readers (including students from a course for which I am a TA) into thinking "Bayesian Inference" is inference using Bayes' Theorem, when really it is inference that appreciates subjective uncertainty about your model. Raptur (talk) 15:30, 26 February 2012 (UTC)
I know this is an old discussion, but Raptur is entirely correct here, and the Gelman characterization of Baysian inference is far far better than the current introduction. In fact, I would go as far as to say that the current into is simply wrong. Bayes theorem is often important in Bayesian inference but the way the first sentence is phrased implies Bayes theorem is the basis of it, when that simply is not the case. Also, the intro confuses the idea of "Bayesian updating" (without even really defining it, although I can guess what the author means) with Bayesian inference. I have yet to read the whole article. but the intro hardly instills confidence in what comes below... Atshal (talk) 16:49, 14 November 2012 (UTC)
I just want to emphasize Atshal's point that the "Bayesian updating" and rationality stuff is really misleading as currently written. After reading up to that section, one would have the impression that Bayesian inference is the application of Baye's rule, and that there is something controversial about Baye's rule. There is nothing whatsoever controversial about Baye's rule; Bayes' rule is derived directly from the definition of conditional probability using very simple algebra. The controversy over Bayesian inference stems from the treatment of hypotheses as random variables rather than as fixed, and over the necessary use of a prior. Raptur (talk) 15:37, 20 November 2012 (UTC)
Ugh, the very first section is "Introduction to Bayes' Rule". Why is it not "Introduction to Bayesian Inference", since this article is about Bayesian inference and not Bayes' Rules? Atshal (talk) 16:52, 14 November 2012 (UTC)

Deletion of unnecessary maths portions[edit]

Despite the warnings that stuff that does not meet Wikipedia standards will be deleted, someone feels an explanation is needed. Thus this appeared on my talk page following the re-insertion of clearly inappropriate material. " Melcombe. While still not using the discussion page you remove useful subsections. That is plain vandalism on your part. Bo Jacoby (talk) 23:44, 20 February 2012 (UTC)."

I responded as follows and repeat it here for info. Melcombe (talk) 22:47, 22 February 2012 (UTC)

This is of course nonsense. The material is plainly WP:OR as no WP:Reliable sources have been included. It also fails the test of being important to the description of Bayesian inference in general terms, and serves only to get in the way of expansion in useful directions. Moreover, it fails any reasonable excuse for being included as a "proof" of a result such as outlined as possibilities in WP:MOSMATH. Applying the rules of Wikipedia standards is clearly not vandalism. If this stuff were "useful" someone could provide a source for it and it might then be sensible to construct an article specific to Bayesian inference for that specific distribution, in which this stuff could be included. Melcombe (talk) 22:47, 22 February 2012 (UTC)

The matter was discussed above and on the archived talk page here. Melcombe removed two sections. The one on bayesian estimation of the parameter of a binomial distribution has Bayes himself as a source. The other one, on bayesian estimation of the parameter of a hypergeometric distribution, having Karl Pearson as the source, generalizes the first one and is conceptually simpler, because the estimated parameter only takes a finite number of possible values, and so the prior distribution is defined by the principle of insufficient reason (which was not applicable for the continuous parameter of the binomial distribution). My question to Melcombe regarding his need for further explanation or proof was repeated on his talk page and remains unanswered. Melcombe's contribution is destructive, and he is neither seeking consensus nor compromise. Bo Jacoby (talk) 04:27, 23 February 2012 (UTC).

The material deleted has neither person as an explicit source, and no explicit source at all. The second sentence of WP:Verifiability is "The threshold for inclusion in Wikipedia is verifiability, not truth — whether readers can check that material in Wikipedia has already been published by a reliable source, not whether editors think it is true." ... and this amounts to providing WP:Reliable sources in the article. Then there is the question of whether the amount of mathematical detail that was included should appear in an encyclopedia article. It is clear that it should not, as it fails for reasons described at WP:NOTTEXTBOOK and in WP:MOSMATH. No amount of discussion can reasonably override these established policies. What was left is a summary of the result at a suitable level of detail. Bayes' contribution is contained in a much shortened section, as there is now a separate article on this source publication. Melcombe (talk) 18:09, 24 February 2012 (UTC)

I suppose that Melcombe is right in assuming that he can learn nothing from other wikipedia editors. The deleted sections were referenced from here showing that they are useful in answering elementary questions. The missing sources were mentioned on the talk page, which Melcombe didn't care to read before removing the material. An improvement, rather than a deletion, could have been made. Bo Jacoby (talk) 18:43, 27 February 2012 (UTC).

Trotting out your first sentence harms your case. The usual wicked pleasure of provoking another editor won't be available here, because Melcombe will just ignore the bait anyhow. You'ld have better luck with me!
WP is not a textbook, so the helpfulness of the material in answering questions is an irrelevant although good fact.
That said, updating a distribution for the Bernoulli proportion is the simplest and most conventional example, so I should hope that you and Melcombe could agree on a version. Why not look at Donald Berry's book (or the earlier and rare book by David Blackwell), or the famous article for psychologists by Savage, Edwards, and Lindman?
Maybe you both could take a break from beating on each other, and beat on me for a while? ;D *LOL* I made a lot of edits in the last day, and some of them must strike you as terrible!
Cheers,  Kiefer.Wolfowitz 19:03, 27 February 2012 (UTC)

Thanks to Kiefer.Wolfowitz for the reaction. I have got no case to harm. I included the formulas for mean value and standard deviation for the number K of hits in a population of N items, knowing the number k of hits in a sample of n items. I did not myself consider these formulas to be original research on my part, because they are found in the works of Karl Pearson, but, not knowing these formulas himself, Melcombe considers them to be original research on my part, and so he removed them from wikipedia. To me this is a win-win situation: Either my contribution is retained in wikipedia, or I get undeserved credit for inventing the formulas. Cheers! Bo Jacoby (talk) 16:07, 28 February 2012 (UTC).


All this stuff about Bayesian inference being the model of rationality is ignorant nonsense.

I clarified this more than a year ago, with in-line cites to the highest quality most reliable sources, and the clarification remains in the article, which I suppose was intolerable.

I don't understand how people can write an article that is shown to be patent nonsense by the later discussion in the same article.  Kiefer.Wolfowitz 17:11, 26 February 2012 (UTC)

The article has now been changed, but I would mention that nowhere did previously it state that Bayesian inference was "the" model of rationality. The previous lead read "how the degree of belief in a proposition might change due to evidence." and that it is "a model of rational reasoning". Later on, we have "The philosophy of Bayesian probability claims". None of these suggest that Bayes' theorem is the sole theory of rationality. It was simply keeping the information relevant to the article, rather than expanding on other possible theories and techniques. Maybe the issue is that this simply was not sufficiently emphasised. Gnathan87 (talk) 19:57, 26 February 2012 (UTC)
Before my edits, the article's lede asserted uniqueness of Bayesian updating as the rational system.  Kiefer.Wolfowitz 21:26, 26 February 2012 (UTC)
What the article said is that the philosophy of Bayesian probability asserts that this method of updating is the rational one. The distinction seems clear to me. Removing citation needed tag, obviously it is rational in the sense that sophisticated reason underlies the decision. (talk) 00:19, 2 October 2014 (UTC)


In the end I can only find arguments of why Bayesian updating is not the only rational rule, but I am still left wondering what bayesian updating actually is. Could anyone please fill the section with appropriate content? Thanks Elferdo (talk) 14:00, 14 October 2015 (UTC)

Section "Multiple observations" - conditional independence vs independence[edit]

In the current version (17 Jun 2016), the section "Multiple observations" requires "a sequence of independent and identically distributed observations" while the key to combine multiple observations is conditional independence as confirmed two lines below by the equation . Please notice that neither implies nor is implied by . -- (talk) 09:54, 17 June 2016 (UTC)