Talk:Bayesian inference

From Wikipedia, the free encyclopedia
Jump to: navigation, search
WikiProject Statistics (Rated C-class, Top-importance)
WikiProject icon

This article is within the scope of the WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page or join the discussion.

 C  This article has been rated as C-Class on the quality scale.
 Top  This article has been rated as Top-importance on the importance scale.
 
WikiProject Mathematics     (Rated C-Class)
WikiProject Mathematics
This article is within the scope of WikiProject Mathematics, a collaborative effort to improve the coverage of Mathematics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
Mathematics rating: C Class High Priority Field: Probability and statistics
One of the 500 most frequently viewed mathematics articles.

Please update this rating as the article progresses, or if the rating is inaccurate.


Archives
Archive 1

Contents

[edit] Wording

Most occurrences of "uncertainty" should read "probability". Just one example:

Instead of "the uncertainty of one model tends to 1 while that of the rest tend to 0."
I suggest "the probability of one hypothesis tends to 1 while the probabilities of the other hypotheses tend to 0."

Rainald62 (talk) 15:44, 25 November 2011 (UTC)

I agree that "uncertainty" is misleading. I would prefer "credibility". Like this: "the credibility of one hypothesis tends to 1 while the credibilities of the other hypotheses tend to 0.". The word "probability" is often already used to signify a parameter, and it is confusing to reuse it. Consider this case. A fair die has the probability one sixth of showing a six. You may suspect that the die is unfair when (say) four throws gave four sixs. Then you talk about "the probability that the probability of throwing a six is greater that one sixth". That is hard to understand. It is a little easier to say "the credibility that the probability of throwing a six is greater that one sixth". Bo Jacoby (talk) 08:05, 26 November 2011 (UTC).
I also do not like the word "uncertainty". I think the word "uncertainty" implies the opposite of "certainty", and "certainty" could be taken in the sense "degree of certainty" or "probability". (i.e. uncertainty = 1 - probability). I prefer the word "confidence" when referring to probability specifically under a Bayesian interpretation. To link to the above discussion, the reason I retained "uncertainty" was because it had been added by an expert. I am not sure about "credibility", but only because I have not seen it used in this sense very often. Gnathan87 (talk) 18:05, 29 November 2011 (UTC)
"Confidence" is a terrible term to use, as it leads directly to confusion with the non-Bayesian idea of "confidence intervals". "Degree of belief" or "degree of plausibility" or just "support" are standard terms. --50.135.4.90 (talk) 05:29, 2 December 2011 (UTC)
Good point. I am happy with "degree of belief", which I prefer to "degree of plausibility" because it is shorter. Just to add, shortening "degree of belief" to "belief" should probably be used sparingly if at all; "belief" sounds to me more like a yes/no thing. Gnathan87 (talk) 10:36, 2 December 2011 (UTC)
Glad you like "degree of belief". "State of belief" is not standard however, and its use in the introduction suggests that something importantly different is being defined, which is not the case. The introduction currently uses technical ideas ("degree of belief" being one) before they are defined; this is confusing, particularly to non-experts.--McPastry (talk) 19:13, 2 December 2011 (UTC)
Updated the lead to reflect these points. Gnathan87 (talk) 15:10, 11 December 2011 (UTC)

[edit] Goal of Bayesian evaluation

I think we need a "why do we care" functional statement, e.g. Bayesian inference "can be used to evaluate how reliable is an estimate of probability" or "is used to evaluate how likely a statistical correlation (or anything?) is to be true." Or something--if it can be stated non-mathematically. Help me out, here. I had a statement in mind but I lost it while I was composing the introductory sentence. —Monado (talk) 04:52, 8 December 2011 (UTC)

I've added some new material to the first paragraph. Does that convey the importance better? Gnathan87 (talk) 15:33, 11 December 2011 (UTC)

[edit] Revert of lead

[I would just like to note: since I have been a particularly active in editing this article recently, I hope the revert does not appear oppressive (not at all my intention)] A few comments on my reasoning:

  • The material on philosophy is fundamental to this topic, as it justifies Bayesian inference. Even if philosophy is not discussed at length, IMHO it should at least be mentioned. Otherwise for example, you leave hanging: why should the update provided by Bayes' theorem be used? Why should some other method not be equally valid, or better? Mention of the Ramsey-de Finetti theorem was in fact added partly in response to the comments of Monado (above).
  • On second thoughts, maybe reference to "Bayesian philosophy" is better than picking one justification. Hopefully it is now better?
  • The removed material explained the ideas to somebody new to the topic\a lay reader without going into mathematical details.
  • The structure of the lead was I think previously pretty good, and the edits removed the sense of developing the ideas. First paragraph, conveys the basic idea and background. Second paragraph, more in depth explanation of how Bayesian inference more generally applies to distributions of exclusive/exhaustive possibilities. Then it went on to discuss, as per the above discussion re Andrew Gelman, how it is practically applied.

Gnathan87 (talk) 14:21, 14 December 2011 (UTC)

Since the lead section has been reverted again, I would add some further comments, as in my view, the new version is a significant step backwards. Particularly, I will compare http://en.wikipedia.org/w/index.php?title=Bayesian_inference&oldid=465836189 to http://en.wikipedia.org/w/index.php?title=Bayesian_inference&oldid=465830221. (I agree that the distributions example was not ideal, although I would still like to find a really good but terse example to put in there).

To begin with something more superficial, the lead should be written as prose, not a list of bullet points. (Admittedly, the new version is effectively an expanded version of the old first paragraph. But it was pretty much prose, and the intention had ultimately been to try and make it flow better.) More importantly, the aim of this paragraph was to concisely present the very most important points - with the proviso that further details followed. The new version is missing any sense that Bayesian inference, in full generality, acts on a distribution. It is not explained that beliefs cannot exist in isolation (unless they have probability 1). This is bad for a number of reasons. Firstly, from WP:LEAD, "The lead serves as an introduction to the article and a summary of its most important aspects." Secondly, there is now nothing for those who do not wish to study the mathematics to grasp a flavour of of the details. (Also bearing in mind that many e.g. engineers/scientists are more used to starting from intuitive ideas rather than formulae). Thirdly, it may convey a misleading impression of what Bayesian inference achieves.

To address a few more specific differences:

  • The edit labelled "a detailed repetition of model selection method removed from the abstract". This is not technically "model selection", in which one would typically use ratios rather than the direct probabilities. Furthermore, this is a useful introduction to the ideas pushed for by Gelman; I would agree that it is an important generic method warranting its own mention in the lead.
  • The edits concerning "removal of unclear adjectives". One presumably wants to emphasise to the lay reader that Bayesian inference is very general, hence the use of phrases such as "many diverse applications" and specifying that the application are only "including" these fields. For a reader who is not minded to always be logically pedantic, the phrase now reads as if these are the only four fields in which Bayesian inference applies.
  • The edit concerning the use of Bayesian inference by the brain. "It has been suggested that the brain uses Bayesian inference to update beliefs" does not in my view go far enough. This is admittedly not an area I am very familiar with, so please correct me if I am wrong. But my understanding is that there is certainly evidence for this, past the mere suggestion that this is how the brain works.
  • The introduction of Bayesian probability as "A degree of belief is represented as a Bayesian probability". What should the uninformed reader understand by a "Bayesian probability"? Is it a normal probability? Is it different to a normal probability? Is this some complicated prerequisite? Much better to briefly explain, e.g. "the Bayesian interpretation of probability, in which degrees of belief are represented by probabilities."

Gnathan87 (talk) 16:46, 14 December 2011 (UTC)

Thank you for the above comment.
  1. My edit was not a revert. It was a step in the (IMHO) right direction.
  2. My objection to the lead is that is was confusing and messy. What is the meaning of "many diverse applications" as opposed to "many applications" ? The word "diverse" is confusing and basicly nonsensical in the context. What is the meaning of "many applications" as opposed to "applications" ? It is not easy to count applications. The present formulation "Bayesian inference has applications in science, engineering, medicine and law" is straightforward and does not imply neither that the number of applications within any of the four fields is low, nor that the applications are restricted to the four fields only. I also find it difficult to imagine an application which is not somehow included in the field of 'science'. The application of bayesian inference to 'law' is an application of 'science'.
  3. Re "It has been suggested that the brain uses Bayesian inference to update beliefs". It is better if it does not go far enough than if it goes too far. This article on Bayesian inference does not elaborate on the subject, so why not just link to an article that does.
  4. If any aspect needs to be adressed in the lead, it should have a subsection of its own in the article.
  5. My objection against a link to Ramsey-de Finetti theorem is that no such article is found in wikipedia. Write it first, link to it later.
Bo Jacoby (talk) 17:59, 14 December 2011 (UTC).
Bo - my concern is really over how the phrases read.
2. The difference is in conveying a qualitative generality. Just "applications" is likely to be understood (rightly or wrongly) as in some sense restricted. Presumably, Bayesian inference should be understood as having a proliferation of applications spanning virtually any subject. I think it is quite correct to go on to list some fields in which Bayesian inference has well known specific uses. (What is implied is a more restricted view of "science" as "what a scientist does", "law" as "what a lawyer does" etc.) However, a reader will almost certainly take that it is limited (at least mainly) to these areas unless this is qualified with "including".
3.I agree, but I think the wording is wrong. "it has been suggested" suggests a paucity of evidence, whereas "there is evidence" suggests more validity (and hence justifies this information going in the lead). I don't think mentioning that evidence exists necessarily means you then have to go into details - diverting to Bayesian brain is fine. How about the version that's there now?
4. The deleted material does have its own subsection - Method. The deleted material was basically interpreting the mathematics.
Gnathan87 (talk) 21:28, 14 December 2011 (UTC)
2. You cannot stop people misunderstanding, but you can stop writing bullshit. "many diverse applications" is bullshit. Abandoning logic is abandoning communication. The purpose of an encyclopedia is not 'conveying a sense' but stating facts.
3. "There is evidence that the brain uses Bayesian inference to update beliefs" is a very strong formulation calling for proof and documentation and references. Personally I do not believe it to be true. The operation of the brain is an area of current research.
4. The nonmathematical summary of 'Method' is contained in the sentence: "Bayes' theorem is used to calculate how belief in a proposition changes due to evidence."
Bo Jacoby (talk) 22:40, 14 December 2011 (UTC).
2. With respect, good communication is about being specific. Good understanding is about being logical. "There are applications" leaves open non-factual options such as for example that the applications are limited. Furthermore, this is the understanding to which readers will probably gravitate. This is poor communication. I think there would be overwhelming, if not complete consensus that the applications of Bayesian inference are both "many" and "diverse". Why not qualify with "many diverse applications"?
3. Yes. That is why I originally wrote "there is evidence to suggest" :) Maybe then we need a more neutral phrasing. It is certainly clear that the brain does not always use Bayesian inference. Perhaps "Research has suggested"?
4. That is a summary, but the summary surely warrants more detail? For example, where in that summary does it explain that degrees of belief must always sum to 1 over the exhaustive/exclusive possibilities? What is there to explain that you cannot just take two arbitrary degrees of belief for heads and tails and then just update them independently of each other?
Gnathan87 (talk) 22:58, 14 December 2011 (UTC)
2. I do not know the limitations of the application of bayesian inference. Do you? It may turn out to be more (or less) limited that what we thought before, or more (or less) limited than what we think the reader thinks. This kind of information is not encyclopedic and should simply be omitted. When I read "many diverse applications" my feeling is: Please cut the crap and give me the facts. Provide rock solid knowledge, not hints or feelings.
3. Is there hard encyclopedic results from this brain research?
4. The fact that any probability distribution sums to 1 is general and not specific for bayesian inference. The details belong to the subsection and not to the summary.
Bo Jacoby (talk) 09:36, 15 December 2011 (UTC).
2. I agree that the descriptive version is not "rock solid fact", due to the inherent subjectivity in the meaning of adjectives. However, I somewhat object to the idea that anything that is not a "rock solid fact" is inappropriate. Many people will only read the lead, in order to come away with a feel for the topic. You must ask: if the reader had all of the details, would they apply this characterisation? If the answer is virtually certain to be yes, it is then helpful and appropriate. (Besides, what is a "fact" if not just something that everybody agrees with?) On the other hand, I do agree that it is good to strive for objectivity. Maybe there is some other way of putting it that would achieve consensus.
3. As I say, I'm not familiar with this area and I hope somebody more knowledgeable can step in. However, in the meantime I've done some research and the answer seems to be yes. According to http://www.bayesiancognition.org/readings/, "There are no comprehensive treatments of the relevance of Bayesian methods to cognitive science." However, by following references it is not hard to find examples. e.g. http://cogsci.uwaterloo.ca/courses/COGSCI600.2009/KorWol_TICS_06.pdf, "It thus seems that people are able to continuously update their estimates based on information coming in from the sensors in a way predicted by Bayesian statistics". From http://www.mrc-cbu.cam.ac.uk/people/dennis.norris/personal/BayesianReader.pdf, "The Bayesian Reader successfully simulates some of the most significant data on human reading.". From http://web.mit.edu/cocosci/Papers/f881-XuTenenbaum.pdf, "We report two experiments with 3- and 4-year-old children, providing evidence that the basic principles of Bayesian inference are employed when children acquire new words at different hierarchical level"
4. Yes, but the fact that a belief distribution must sum to 1 is a constraint imposed by Bayesian inference. Valid probability space => valid degrees of belief, but valid of degrees of belief =\> valid probability space. Because degrees of belief need not be coherent, it is not necessarily intuitive to view an individual belief as part of a set, or to think of degrees of belief as being dependent on one another. Particularly for those who do not study the mathematics there is nothing to emphasise the important idea that this is the constraint imposed. I still think this should be explicitly explained in the lead.
Gnathan87 (talk) 19:57, 15 December 2011 (UTC)
2. "if the reader had all of the details, would they apply this characterisation?" No they would not. The characterization "many diverse applications" is void of meaning, and the reader having all the details will recognize that fact.
3. Don't write in wikipedia until you really know what you are talking about.
4. The lead should be specific to the article. Other articles tell the properties of probability in general. I don't understand what you mean by "degrees of belief need not be coherent".
Bo Jacoby (talk) 22:52, 15 December 2011 (UTC).
2. I disagree, but this is really not worth arguing any more... 3. Thank you, but I was not being reckless. I am quite familiar with the topic, I have just not worked in that particular sub-discipline. As it happens, I have studied under one of the authors I linked to above and knew full well that this research existed. 4. Coherence is the property defined by Ramsey and De Finetti from which the axioms of probability are implied. Subjective degrees of belief need not be coherent, but it is only if they do that they are seen to be "rational". Gnathan87 (talk) 23:07, 15 December 2011 (UTC)
3. If you provide references to prove your claim as an encyclopedic fact, then I have no objection. Otherwise it does not belong here.
4. The reader has no chance to follow your distinction between rational and irrational beliefs.
Bo Jacoby (talk) 00:08, 16 December 2011 (UTC).
I'm not suggesting that coherence should be explained. What I am saying is this: Just because you say that "degrees of belief are represented by probabilities", the reader does not necessarily immediately conceptualize degrees of belief as part of a set that must sum to 1 and should be updated together. Why? Because that is not how degrees of belief are intuitively represented. Understanding the philosophy of Bayesian inference is about constructing the analogy in your mind between degrees of belief and probability spaces. This is a fundamental paradigm shift that I think should be explicitly walked through in the lead, even though it is technically implied. Particularly for those who are not so mathematically experienced, or do not study the mathematics.
I would still assert that the old version (which I am by no means suggesting was perfect, but had been under continuous development and was the result of much thought and balancing) was beginning to present a clear and understandable development of these ideas. I would also re-emphasise the reasoning behind the structure of the old version: The first paragraph, as has been retained in the current version, was deliberately "dumbed down", containing just the essential ideas for the casual reader. The second and third paragraphs then went on to build on these ideas, acting as 1. a more detailed summary for those who do not study the mathematics or 2. a preparatory introduction to the mathematics. Finally, there was a pointer to Bayesian model selection, a use for Bayesian inference that warrants mention in this article, but is covered elsewhere. Gnathan87 (talk) 17:15, 16 December 2011 (UTC)
Hmm, I happened to be using this book today: http://www.amazon.co.uk/Bayesian-Networks-Introduction-Probability-Statistics/dp/0470743042/ref=sr_1_1?ie=UTF8&qid=1324409881&sr=8-1#reader_0470743042. Of course it is not an encyclopaedia, but I would point out what (quite coincidentally) is written in the first paragraph on page 1: "The topic provides a natural tool for dealing with a large class of problems containing uncertainty and complexity. These features occur throughout applied mathematics and engineering and therefore the material has diverse applications in the engineering sciences." Gnathan87 (talk) 19:46, 20 December 2011 (UTC)
Nice! Without loss of meaning the author could simply have written: "The topic provides a tool for dealing with uncertainty and complexity, which occur throughout applied mathematics and engineering, and therefore the material has applications in the engineering sciences." Some authors get paid per page. We don't. Bo Jacoby (talk) 22:34, 20 December 2011 (UTC).
2. "throughout" could be the hint preventing the wrong interpretation of the current wording. Note the title of Jaynes' book, "... - the logic of science".
3. Although I'm pretty sure that my brain works the Bayesian way, I would prefer the cautious wording.
Besides, the very beginning, "In statistics ..." does not make sense to me. At least the current content of Statistics may be taken as proof that Bayesian inference is not "in" statistics. -- Rainald62 (talk) 00:17, 18 February 2012 (UTC)

[edit] Formula in Philosophical section

P(P|E) and others in formula and in text. I think these should be like P(A|B), like in the bayes' theorem article. --Pasixxxx (talk) 19:04, 6 January 2012 (UTC)

Disagree. This notation here makes clearer the epistemological interpretation of Bayes' theorem (dealing with a "Proposition" and "Evidence"). However, Bayes' theorem is fundamentally a mathematical relation on a probability space with no particular interpretation, which is why in my view it should be presented in its own article with more neutral symbols such as A and B. Admittedly, focus in Bayes' theorem has recently shifted towards a Bayesian interpretation; I have opposed that on the talk page because I do not think it is sufficiently NPOV. Gnathan87 (talk) 23:21, 7 January 2012 (UTC)
Letter P should not mean both probability and proposition in the same formula. So P(P|E) cannot be accepted. Bo Jacoby (talk) 20:20, 8 January 2012 (UTC).
Hmm. Technically, the syntactic context resolves any ambiguity (i.e. the probability function is defined to take propositions as arguments, and evaluates to a real, not a proposition). I think that the benefits of labelling "proposition" as "P" outweigh the (quite small) potential for confusion here. Gnathan87 (talk) 06:44, 12 January 2012 (UTC)
Thinking about it, one solution would be to use the function C() for "Credence" instead of P() for "Probability". C() is used in philosophy to emphasise that the interpretation of probability being used is subjective probability, aka credence. I'm not overly keen on this, though, because it adds extra complexity and is inconsistent with the notation in the rest of the article. Also, I'm not sure readers would be familiar with the term credence. Gnathan87 (talk) 12:21, 12 January 2012 (UTC)
This problem can be overcome by use of the Pr notation. After all, <math>\Pr</math> produces \Pr automatically. It is one of the notations in List of mathematical symbols (under P), and is widely used. There is no need to invent new notation (and anyway that woild not be allowable on Wikipedia). Melcombe (talk) 16:05, 12 January 2012 (UTC)
Although it may be worth bearing in mind that again, in the philosophical literature, Pr() is used to distinguish "Propensity" (i.e.objective probability) from P() for "Probability" (see e.g. http://www.joelvelasco.net/teaching/3865/humphreys%2085%20-%20why%20propensities%20cannot%20be%20probabilities.pdf) Gnathan87 (talk) 19:17, 12 January 2012 (UTC)

[edit] Bayes theorem?

It is misleading to say that Bayesian statistics is based on Bayes theorem. The crucial point of Bayesian reasoning is that we are treating our hypothesis as a random variable, and getting the average expectation based on all values of H. The relevant rule is the Law of total probability, as we are predicting new observations on the basis of old observations:


\begin{align}
P(O_{new} | O_{old} ) &= \sum_{h\in H} P(O_{new} | H=h, O_{old} ) P( H=h | O_{old} ) & \mbox{law of total probability} \\
 &= \sum_{h \in H} P(O_{new} | H=h ) P( H=h | O_{old} ) & \mbox{new observations are conditionally independent of old observations, given the hypothesis}\\
\end{align}

With summations replaced with integrals for continuous h.

While Bayes rule is often used in Bayesian models, it is not what makes a model Bayesian. Bayesian reasoning (averaging over hypotheses) and Bayes rule simply happen to have been discovered by the same person, Thomas Bayes. What say the editors? Raptur (talk) 23:26, 5 February 2012 (UTC)

I don't see where the article says "is based on", it is more "uses". What you say above looks reasonable, but can you find a "reliable source" taking the same approach? You also need to consider the article Bayesian probability, which goes into Bayesain interpretation of probability. (That is, finding the most appropriate place to include your points). Melcombe (talk) 15:15, 6 February 2012 (UTC)
In the "Philosophical Background" section, it says that Bayes Rule is the "essence" of Bayesian inference, which is not the case. Raptur (talk) 22:02, 13 February 2012 (UTC)
I do agree that "the essence of" might be misleading when conceptualising Bayesian inference as modifying a distribution. However, as explained below, I do think it is right to begin with this "building block" before moving onto the bigger picture. I've changed "the essence of" to "the fundamental idea in", catering better for all views? Gnathan87 (talk) 03:17, 24 February 2012 (UTC)
What the Bayesian inference article describes is in fact based on Bayes' theorem, for the most part involving the use of new evidence to update one's likelihood estimate for the truth of a single fixed hypothesis, not involving any form of averaging over hypotheses. Bayesians such as I. J. Good have emphasized this in their published writings. The sum over hypotheses shown in the article is just a normalizing factor and is independent of which particular "M" has been selected. What you describe may be some related topic, but it is not what this article is discussing. — DAGwyn (talk) 22:39, 8 February 2012 (UTC)
No. See equation (3) from this paper on Bayesian Hidden Markov Models, equation (1) from this field review, equation (9) from this paper on a Bayesian explanation of the perceptual magnet effect, or Griffiths and Yuille (2006) (some lecture notes available here). Bayesian statistics is about computing a joint probability over observations and hypotheses. Of course, once you have a full joint, you can easily compute any particular conditional probability, including the one computed by Bayes Rule. Hidden markov models, to use the example from the first paper I provided, use Bayes rule in computing marginals over the hidden variables. Parameter estimation (notably the Expectation-maximization algorithm) for Hidden Markov Models, however, often takes only a Maximum Likelihood Estimate rather than estimating the full joint over model parameters (and data). Thus, an HMM trained with the expectation-maximization algorithm uses Bayes rule and changes its expectations after seeing evidence ("training data" in the parlance of the paper), but it is not Bayesian because it maintains no uncertainty about model parameters. Raptur (talk) 22:02, 13 February 2012 (UTC)
None of these 3 sources say anything like "Bayesian inference is ....". I don't see them mentioning inference at all. Instead they talk about "Bayesian model averaging" and "Bayesian modeling" ... which you might think amount to the same thing. But do find something that both meets WP:RELIABLESOURCES and is explicitly about "Bayesian inference". On what you said in the first contrib to this section ... this seems to correspond somewhat to predictive inference. Melcombe (talk) 00:46, 14 February 2012 (UTC)
I've been avoiding this source, since I don't have an electronic version of it, but "Bayesian Data Analysis: Second Edition" by Andrew Gelman, John B. Carlin, Hal S. Stern, and Donald B. Rubin opens with, on page 1, headed "Part 1: Fundamentals of Bayesian inference": "Bayesian inference is the process of fitting a probability model to a set of data and summarizing the result by a probability distribution on the parameters of the model and on unobserved quantities such as predictions for new observations." This is exactly the statement that Bayesian inference is about maintaining probability distributions over both data and models. On page 8, it says: "The primary task of any specific application is to develop the model p(\theta, y) and perform the necessary computations to summarize p(\theta|y) in appropriate ways," where \theta comprises our model parameters and y comprises our data. The probability of a model given data is an important part of Bayesian inference (indeed, it's the second term of the marginalization expression), but what distinguishes Bayesian inference from other statistical approaches is maintaining uncertainty about your model.
I should point out that the other papers I provided do take this approach. The Bayesian HMM paper says "In contrast \[to a point estimate such as Maximum Likelihood or Maximum a posteriori\], the Bayesian approach seeks to identify a distribution over latent variables directly, without ever fixing particular values for the model parameters. The distribution over latent variables given the observed data is obtained by integrating over all possible values of \theta:" and presents the object of computation as P(t|w) = \int P(t|w,\theta)P(\theta|w)d\theta. The perceptual magnet effect paper says on page 5 that listeners are trying to compute the posterior on targets, and on page 6 says that this quantity is computed by marginalizing over category membership: p(T | S) = \sum_c \! P(T| S, c) P( c | S ). The lecture notes I referred to give, in section 6 titled "Bayesian Estimation," precisely the derivation I opened with, and explicitly contrast Bayesian inference with point estimates of model parameters. — Preceding unsigned comment added by Raptur (talkcontribs) 12:28, 14 February 2012 (UTC)
The single hypothesis vs. joint distribution thing is something that has been disputed for some time (and in fact was the subject of comment from Gelman himself). My personal conclusion is this: Bayesian inference is, in practice, used most often by scientists and engineers, who tend to use it exclusively over distributions. In this context, it makes sense to think of inference as being fundamentally something you do on a joint distribution. However - Bayesian inference in philosophy of science tends to be expressed in terms of single hypotheses.
As Bayesian inference is an important topic in philosophy of science, and Bayes' theorem is in any case the building block of the joint distribution view, I certainly think it is appropriate to begin with an explanation of the single hypothesis view before covering distributions. Also, applications in a courtroom setting, discussed later in the article, have to date been the single hypothesis view (as seems appropriate in that context), so it should at least have been mentioned. A final reason to cover the single hypothesis view is because, at least in my view, it is more accessible and insightful for those new to the topic than charging ahead to joint distributions. Gnathan87 (talk) 21:05, 23 February 2012 (UTC)
This is an article on mathematical statistics, so it seems strange to defer to terminology from the philosophy of science or criminal law. I do think a brief summary or a link to an article discussing the Bayesian view of probability is warranted. Scientists, statisticians, and engineers often use (what this article calls) "Bayesian Inference" without averaging over models (as in conditional random fields and MLE Bayes nets, which all use Bayes' theorem), but they don't call it "Bayesian Inference."
And again, Bayes' theorem is not the defining building block of the joint distribution view; the rule of total probability is. Bayes' theorem is just the easiest way to compute the second term of the marginalization expression. Finally, I don't think it's a good idea to focus on a topic just because it's easier if it's actually only partially related to the title of the article. This misleads readers (including students from a course for which I am a TA) into thinking "Bayesian Inference" is inference using Bayes' Theorem, when really it is inference that appreciates subjective uncertainty about your model. Raptur (talk) 15:30, 26 February 2012 (UTC)

[edit] Deletion of unnecessary maths portions

Despite the warnings that stuff that does not meet Wikipedia standards will be deleted, someone feels an explanation is needed. Thus this appeared on my talk page following the re-insertion of clearly inappropriate material. " Melcombe. While still not using the discussion page you remove useful subsections. That is plain vandalism on your part. Bo Jacoby (talk) 23:44, 20 February 2012 (UTC)."


I responded as follows and repeat it here for info. Melcombe (talk) 22:47, 22 February 2012 (UTC)

This is of course nonsense. The material is plainly WP:OR as no WP:Reliable sources have been included. It also fails the test of being important to the description of Bayesian inference in general terms, and serves only to get in the way of expansion in useful directions. Moreover, it fails any reasonable excuse for being included as a "proof" of a result such as outlined as possibilities in WP:MOSMATH. Applying the rules of Wikipedia standards is clearly not vandalism. If this stuff were "useful" someone could provide a source for it and it might then be sensible to construct an article specific to Bayesian inference for that specific distribution, in which this stuff could be included. Melcombe (talk) 22:47, 22 February 2012 (UTC)

The matter was discussed above and on the archived talk page here. Melcombe removed two sections. The one on bayesian estimation of the parameter of a binomial distribution has Bayes himself as a source. The other one, on bayesian estimation of the parameter of a hypergeometric distribution, having Karl Pearson as the source, generalizes the first one and is conceptually simpler, because the estimated parameter only takes a finite number of possible values, and so the prior distribution is defined by the principle of insufficient reason (which was not applicable for the continuous parameter of the binomial distribution). My question to Melcombe regarding his need for further explanation or proof was repeated on his talk page and remains unanswered. Melcombe's contribution is destructive, and he is neither seeking consensus nor compromise. Bo Jacoby (talk) 04:27, 23 February 2012 (UTC).

The material deleted has neither person as an explicit source, and no explicit source at all. The second sentence of WP:Verifiability is "The threshold for inclusion in Wikipedia is verifiability, not truth — whether readers can check that material in Wikipedia has already been published by a reliable source, not whether editors think it is true." ... and this amounts to providing WP:Reliable sources in the article. Then there is the question of whether the amount of mathematical detail that was included should appear in an encyclopedia article. It is clear that it should not, as it fails for reasons described at WP:NOTTEXTBOOK and in WP:MOSMATH. No amount of discussion can reasonably override these established policies. What was left is a summary of the result at a suitable level of detail. Bayes' contribution is contained in a much shortened section, as there is now a separate article on this source publication. Melcombe (talk) 18:09, 24 February 2012 (UTC)

I suppose that Melcombe is right in assuming that he can learn nothing from other wikipedia editors. The deleted sections were referenced from here showing that they are useful in answering elementary questions. The missing sources were mentioned on the talk page, which Melcombe didn't care to read before removing the material. An improvement, rather than a deletion, could have been made. Bo Jacoby (talk) 18:43, 27 February 2012 (UTC).

Bo,
Trotting out your first sentence harms your case. The usual wicked pleasure of provoking another editor won't be available here, because Melcombe will just ignore the bait anyhow. You'ld have better luck with me!
WP is not a textbook, so the helpfulness of the material in answering questions is an irrelevant although good fact.
That said, updating a distribution for the Bernoulli proportion is the simplest and most conventional example, so I should hope that you and Melcombe could agree on a version. Why not look at Donald Berry's book (or the earlier and rare book by David Blackwell), or the famous article for psychologists by Savage, Edwards, and Lindman?
Maybe you both could take a break from beating on each other, and beat on me for a while? ;D *LOL* I made a lot of edits in the last day, and some of them must strike you as terrible!
Cheers,  Kiefer.Wolfowitz 19:03, 27 February 2012 (UTC)

Thanks to Kiefer.Wolfowitz for the reaction. I have got no case to harm. I included the formulas for mean value and standard deviation for the number K of hits in a population of N items, knowing the number k of hits in a sample of n items. I did not myself consider these formulas to be original research on my part, because they are found in the works of Karl Pearson, but, not knowing these formulas himself, Melcombe considers them to be original research on my part, and so he removed them from wikipedia. To me this is a win-win situation: Either my contribution is retained in wikipedia, or I get undeserved credit for inventing the formulas. Cheers! Bo Jacoby (talk) 16:07, 28 February 2012 (UTC).

[edit] "Inference over a distribution"

(@Melcombe) A query on your revert: the intended "distribution" was the probability distribution over some set of exclusive/exhaustive possibilities. Is there some reason that is technically not a "distribution"? I think "Inference over a distribution" is a better name for that section, because it then clearly distinguishes it as an extension of inference on a single hypothesis from the previous section. Gnathan87 (talk) 23:11, 24 February 2012 (UTC)

What was in the section section at the time contained no mention of "distribution", nor gave any indication of what was distributed over what. You are relying far too much on telepathy on the part of readers, with unexplained equations using unexplained and newly invented notations. What WP:Reliable sources are you using? Melcombe (talk) 07:56, 26 February 2012 (UTC)
OK, I see. On second thoughts I do agree. I've tried a new name; might still be able to do better though.
I do admit to not having added sufficient sources recently. My view has been that given the frequent changes to the article, it has been in the interests of the article to concentrate on the basic structure and text, particularly since the content can be found or inferred from any text covering Bayesian inference. Once it begins to stabilize (as it is, I think, now doing), then it will be much clearer which sources are appropriate and where. As for "unexplained equations using unexplained and newly invented notations", I must say that I am unsure what you refer to. There does not appear to me to be anything unexplained inappropriately to the level of the subject, or newly invented. Gnathan87 (talk) 15:07, 26 February 2012 (UTC)

[edit] Rationality

All this stuff about Bayesian inference being the model of rationality is ignorant nonsense.

I clarified this more than a year ago, with in-line cites to the highest quality most reliable sources, and the clarification remains in the article, which I suppose was intolerable.

I don't understand how people can write an article that is shown to be patent nonsense by the later discussion in the same article.  Kiefer.Wolfowitz 17:11, 26 February 2012 (UTC)

The article has now been changed, but I would mention that nowhere did previously it state that Bayesian inference was "the" model of rationality. The previous lead read "how the degree of belief in a proposition might change due to evidence." and that it is "a model of rational reasoning". Later on, we have "The philosophy of Bayesian probability claims". None of these suggest that Bayes' theorem is the sole theory of rationality. It was simply keeping the information relevant to the article, rather than expanding on other possible theories and techniques. Maybe the issue is that this simply was not sufficiently emphasised. Gnathan87 (talk) 19:57, 26 February 2012 (UTC)
Before my edits, the article's lede asserted uniqueness of Bayesian updating as the rational system.  Kiefer.Wolfowitz 21:26, 26 February 2012 (UTC)
Personal tools
Namespaces

Variants
Actions
Navigation
Interaction
Toolbox
Print/export