# Talk:Prior probability

WikiProject Statistics (Rated Start-class, High-importance)

This article is within the scope of the WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page or join the discussion.

Start  This article has been rated as Start-Class on the quality scale.
High  This article has been rated as High-importance on the importance scale.

## Jaynes statement

The statement: "For example, Edwin T. Jaynes has published an argument [a reference here would be useful] based on Lie groups that if one is so uncertain about the value of the aforementioned proportion p that one knows only that at least one voter will vote for Kerry and at least one will not, then the conditional probability distribution of p given one's state of ignorance is the uniform distribution on the interval [0, 1]." seems highly improbable, unless Jaynes posthumously thought that the elctorate of the United States was infinite. --Henrygb 21:51, 19 Feb 2005 (UTC)

I think I know what's being referred to here. Jaynes wrote a paper, "Prior Probabilities," [IEEE Transactions of Systems Science and Cybernetics, SSC-4, Sept. 1968, 227-241], which I have reprinted in E. T. Jaynes: Papers on Probability, Statistics and Statistical Physics, Dordrecht, Holland: Reidel Publishing Company (1983), pp. 116-130. On p. 128 of my copy (corresponding to p. 239 of the IEEE paper, I presume) Jaynes, after deriving from a group-theoretic argument the prior ${\displaystyle \theta ^{-1}(1-\theta )^{-1}}$, remarks: "The prior (60) thus accounts for the kind of inductive inferences noted in the case of the chemical, which we all make intuitively. However, once we have seen at least one success and one failure, then we know that the experiment is a true binary one, in the sense of physical possibility, and from that point on all posterior distributions (69) remain normalized, permitting definite inferences about ${\displaystyle \theta }$."

The reference to "the chemical" in this excerpt refers to Jaynes' example on the previous page, where he discusses a chemical dissolving or not dissolving in water, with the inference that it will do so reliably if it does so once; only when both cases are observed does one strongly think that the parameter might be on (0,1).

I infer from the passage that Jaynes would say that if we have one success and one failure, then for all other observations (excluding these two), the prior would be flat (after applying Bayes' theorem to these two observations using the prior displayed above).

Parenthetically, the Jeffreys prior, which many feel to be the right one in this case, is ${\displaystyle \theta ^{-1/2}(1-\theta )^{-1/2}}$--Billjefferys 18:50, 4 Apr 2005 (UTC)

I have expanded the section on uninformative priors and the references.--Bill Jefferys 18:53, 26 Apr 2005 (UTC)

I have replaced the example about voting with the actual example that Jaynes gave. The two are genuinely quite different: in Jaynes' example the variable over which the prior is defined is in itself an epistemic probability, whereas in the voting example it is a frequency of actual events. Jaynes was pretty hot on the distinction, even going as far as to call confusion between the two a logical fallacy. It must have seemed to that section's original author that one could freely translate between the two, but Jaynes gives plenty of good arguments as to why you can't, and in this case confusing the two made a nonsense of his argument. There are also some unattributed arguments in that paragraph, regarding criticism of Jaynes' prior, which I have flagged. I would be genuinely interested if anyone can track them down.

In the voting case the obvious appropriate prior is not the Jaynes/Haldane prior, nor the Jeffrey's prior nor even the uniform prior, but simply to assume that each voter has a 50% probability of voting for each candidate, independently of other voters. This results in a prior proportional to ${\displaystyle p^{N}(1-p)^{N}}$, where ${\displaystyle N}$ is the population of the United States. This prior is heavily centred around ${\displaystyle p=0.5}$.

Nathaniel Virgo (talk) 10:24, 29 June 2010 (UTC)

## Exponents reversed?

In the article, the Jeffrey's prior for a binomial proportion is given as ${\displaystyle p^{1/2}(1-p)^{1/2}}$. However, a number of other sources on the internet give a Beta(0.5,0.5) distribution as the prior. But this corresponds to ${\displaystyle p^{-1/2}(1-p)^{-1/2}}$. Similarly, my reading leads me to believe that the Jaynes' prior would be a Beta(2,2) distribution, corresponding to ${\displaystyle p^{1}(1-p)^{1}}$, rather than the negative exponent.

Is it standard to give a prior in inverted form without the constants, or is there some convention I am unaware of? If so, perhaps it would be good to include it in the page. As a novice, I am puzzled by the introduction of, for example ${\displaystyle p^{-1}(1-p)^{-1}}$, for which the integral over (0,1) doesn't even exist, as a prior, as well.--TheKro 12:50, 13 October 2006 (UTC)

You're right, the exponents should be ${\displaystyle -1/2}$ in the case of the Jeffreys prior (note, no apostrophe, it's not Jeffrey's). The Jaynes prior is correct; it is an improper prior.
I have corrected the article. Bill Jefferys 14:41, 13 October 2006 (UTC)
${\displaystyle p^{-1}(1-p)^{-1}}$ was not original to Jaynes, but that was the form he used. In general the normalizing constant is not vital, as explained in the first part of the improper priors section, since it is implicit for proper priors and meaningless for improper ones.--Henrygb 09:13, 14 October 2006 (UTC)

## a priori

Hello all, I found this article after spending quite some time working on the article a priori (statistics). I'm thinking that article should probably be integrated into this one, what do people think? The general article on a priori and a priori (math modeling) are both in need of work, and I thought I would engage some editors from this article into the work. Really, the math modelling article should be integrated as well. jugander (t) 22:02, 14 October 2006 (UTC)

Yes I agree, it would make a more complete view on subject. Alfaisanomega (talk) 10:28, 7 December 2010 (UTC)

## Haldane prior

"The Haldane prior has been criticized on the grounds that it yields an improper posterior distribution that puts 100% of the probability content at either p = 0 or at p = 1 if a finite sample of voters all favor the same candidate, even though mathematically the posterior probability is simply not defined and thus we cannot even speak of a probability content."

Shouldn't it be "puts 100% of the probability content NEAR either p = 0 or p = 1" because you get a continuous distribution and {0,1} has measure zero? —Preceding unsigned comment added by JumpDiscont (talkcontribs) 17:17, 12 October 2009 (UTC)

Such a posterior puts infinite measure on any set of the form (0, ε) no matter how small ε is, and finite measure on any set bounded away from 0. Neither "near" nor "at" really captures this. Michael Hardy (talk) 18:29, 12 October 2009 (UTC)
Thanks to whoever is contributing to this article, it has helped me understand a lot. I'm still confused on this point about the Haldane prior. You can't integrate it from zero (because as you said it would be infinite) - so how can it make sense? At least in the other distributions, I can integrate from zero to, say, .7, and I get the probability that p < .7. What would be the probability that p < .7 in this case? I think the paragraph above is trying to address it, but it's still baffling me. Maxsklar (talk) 18:38, 17 July 2010 (UTC)

## Acronym

I've found the APP acronym for a priori probability in some works, but I can't find a reference/source for this; sometimes the AAP is used instead. For example, looking for "a priori probability + app" in Google shows differences in usage. What do you think? Alfaisanomega (talk) 10:28, 7 December 2010 (UTC)

I work with Bayesian statistics all the time, but I don't recall ever coming across the acronym "APP." Given that priori and posteriori both start with "p," it seems like it would be a fairly confusing acronym. —Quantling (talk | contribs) 13:31, 7 December 2010 (UTC)

## Diffuse Prior

The term 'diffuse prior' links to this page but does not appear in the article. I think this is the same idea as an uninformative prior, however because I'm not 100% sure I do not want to edit the article. Would someone who knows more like to confirm or deny this? — Preceding unsigned comment added by Mgwalker (talkcontribs) 01:05, 18 October 2011 (UTC)

## What is an "uncertain quantity"?

Seems like a weasel word, replace by "unknown quantity"? Is it opposed to a "certain quantity"? The word uncertainty is used a lot in Bayesian statistics, but it's not always illuminating. Biker333 (talk) 11:14, 10 March 2013 (UTC)

## Improper priors

This section states that the beta prior with parameters (1,1) - which is uniform - is improper. This is wrong. The distribution has finite support and is finite everywhere, so can be integrated. It's a anyway a beta distribution, which can always be normalized. — Preceding unsigned comment added by 92.74.64.12 (talk) 06:12, 7 July 2016 (UTC)

Removed reference to beta(1,1) distribution. – Jt512 (talk) 20:12, 12 March 2017 (UTC)

The paragraph starts with: "If Bayes' theorem is written as ..."

It seems to me that the following formula holds only if ${\displaystyle \sum _{j}P(A_{j})=1}$ and the events ${\displaystyle A_{1},\ldots ,A_{j}}$ are mutually exclusive. Because only then we have ${\displaystyle P(B)=\sum _{j}P(B\&A)}$. Am I correct? If yes, then I think that this clarification should be added to the paragraph. — Preceding unsigned comment added by 194.126.102.10 (talk) 09:22, 7 March 2014 (UTC)

Preceeding comment was by me. Because no-one commented, I added the change to document. See also https://en.wikipedia.org/wiki/Bayes%27_theorem#Extended_form. — Preceding unsigned comment added by RMasta (talkcontribs) 07:47, 10 March 2014 (UTC)

## Problems with first paragraph

The first paragraph said:

p is the probability distribution that would express one's uncertainty about p before some evidence is taken into account.

It's circular to say that p expresses ones uncertainty about p. In fact p expresses ones beliefs (which are typically uncertain) about some situation. So, I'll change this to

p is the probability distribution that would express one's beliefs about a situation before some evidence is taken into account.

Feel free to improve it, but let's not say p expresses our uncertainty about p.

John Baez (talk) 18:10, 25 March 2015 (UTC)

## Improper posterior

The Improper prior section states: "[...] However, the posterior distribution need not be a proper distribution if the prior is improper. This is clear from the case where event B is independent of all of the Aj." I do not see how this makes the posterior improper since the constant term which multiplies the priors is still both in the numerator and denominator of the posterior. Can this be further elaborated on?