Talk:Conjugate prior

From Wikipedia, the free encyclopedia
Jump to: navigation, search
WikiProject Statistics (Rated Start-class, Low-importance)
WikiProject icon

This article is within the scope of the WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page or join the discussion.

Start-Class article Start  This article has been rated as Start-Class on the quality scale.
 Low  This article has been rated as Low-importance on the importance scale.

WikiProject Mathematics (Rated Start-class, Low-importance)
WikiProject Mathematics
This article is within the scope of WikiProject Mathematics, a collaborative effort to improve the coverage of Mathematics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
Mathematics rating:
Start Class
Low Importance
 Field: Probability and statistics


You may want to see some of the pages on Empirical Bayes, the Beta-Binomial Model, Bayesian Linear RegressionCharlesmartin14 23:43, 19 October 2006 (UTC).

This article and disagree about the hyper parameters to the posterior. —Preceding unsigned comment added by (talk) 00:57, 12 May 2009 (UTC)

This was my comment last night, sorry I didn't sign it, I've just corrected this on the page. Bazugb07 (talk) 14:32, 12 May 2009 (UTC)

Could someone fill in the table for multivariate normals and and pareto? 06:21, 21 February 2007 (UTC)

It would be nice to actually state which parameters mean what, since the naming in the table does not correspond to the naming on the pages for the corresponding distributions (atm I have a problem figuring out which of the hyperparameters for the prior for the normal (variance and mean) belong to the inverse gamma function and which to the normal) —Preceding unsigned comment added by (talk) 10:44, 14 September 2007 (UTC)

For the Gamma likelihood with prior over the rate parameter, the posterior parameters are for any . This is in the Fink reference.Paulpeeling (talk) 11:53, 24 May 2008 (UTC)

May want to consider splitting the tables into scalar and multivariate conjugate distributions.

Changed "assuming dependence" (under normal with no known parameters) to "assuming exchangability". "Dependence" is wrong; "independence" is better, but since technically that should be "independence, conditional on parameters", I replaced it with the usual "exchangability" for brevity. (talk) 00:48, 18 October 2008 (UTC)

Wasn't the "dependence" referring to dependence among the parameters, not the data? -- (talk) 23:00, 30 March 2009 (UTC)

Family of distributions[edit]

How does one tell whether two distributions are conjugate priors? What distinguishes "families"?

Incorrect posterior parameters[edit]

Has anyone else noticed the posterior parameters are wrong? At least according (Degroot, 1970), the multivariate normal distribution posterior in terms of precision is listed incorrectly: it should be what the multivariate normal distribution in terms of the covariance matrix is listed as on the table. I don't really have the time to make these changes right now or check any of the other posterior parameters for accuracy, but someone needs to double check these tables. Maybe I'll do it when I'm not so busy. Also the, Fink (1995) article disagrees with DeGroot on a number of points, so I question it's legitimacy, given that the latter is published work and former is an ongoing report. Maybe it should be removed as a source? DeverLite (talk) 23:22, 8 January 2010 (UTC)

I just implemented the Multivariate Gaussian with Normal - Wishart conjugate distribution according to the article and found that it does not integrate to one. I corrected the posterior distribution in that case, but the others probably also need to be corrected. — Preceding unsigned comment added by (talk) 01:55, 20 August 2012 (UTC)

To prevent confusion, it should be made clear that the Student's t distribution specified as the posterior for the multivariate normal cases is a multivariate student's t distribution parametrized by precision matrix, not by covariance as in the wikipedia article on the multivariate student's t distribution. — Preceding unsigned comment added by (talk) 02:03, 20 August 2012 (UTC)

I was just looking at it and it looked wrong to me. The accuracy-based posterior parameters should have no inversions (as can be seen in the univariate case for example). I can fix that according to DeGroot's formulation. --Olethros (talk) 15:35, 14 September 2010 (UTC)

Marginal distributions[edit]

I think it would be useful to augment the tables with the "marginal distribution" as well. The drawback here is the tables will widen, and they are already pretty dense. Thoughts? -- (talk) 23:00, 30 March 2009 (UTC)

I am not clear what you mean by marginal distribution here ... if it is what I first thought (marginal dist of the observations) then these marginal distributions might find a better and useful place under an article named like compound distributions. Or is it the marginal distribution of new observations conditional on the existing observations marginalised over the parameters (ie predictive distributions)? Melcombe (talk) 08:59, 31 March 2009 (UTC)
I was referring to the marginal distribution of the observations (not the predictive distribution). I often use this page as a reference guide (much simpler than pulling out my copy of Gelman et al.) and at times I have wanted to know the marginal distribution of the data. Granted, many books don't include this information, but it would be useful. As an example, in the Poisson-Gamma model (when the gamma is parameterized by rate). This information is largely contained in Negative binomial#Gamma-Poisson_mixture but that article does not specifically mention that it is the marginal distribution of the data in the Bayesian setting. Plus, it would be more convenient to have the information in one place. Your proposal to put it on a dedicated page may be a reasonable compromise since the tables are already large and this information is used much less frequently. -- (talk) 17:27, 1 April 2009 (UTC)
I thought that giving such marginal distributions would be unusual in a Bayesian context, but I see that Bernardo & Smith do include them in the table in their book .. but they do this by a having a separate list of results for each distribution/model, which would be a drastic rearrangement of what is here. An article on compound distributions does seem to be needed for its own sake. Melcombe (talk) 13:21, 2 April 2009 (UTC)


It keeps getting changed back and forth, but I have the hyperparameters as: \alpha + n,\ \beta + \sum_{i=1}^n x_i\!

-- There is certainly a problem as it currently stands. The wikipedia page on the gamma explicitly gives both the two forms. Particularly k=alpha, beta=1/theta. Hence the update rules must be consistent with this notation. I have corrected this for now. —Preceding unsigned comment added by (talk) 15:23, 20 January 2010 (UTC)

Please add discussion if this is incorrect before changing it! —Preceding unsigned comment added by Occawen (talkcontribs) 05:06, 6 December 2009 (UTC)

Most unintelligible article on Wikipedia[edit]

Just a cheeky comment to say that this is the hardest article to understand of all those I've read so far. It assumes a lot of background knowledge of statistics. Maybe a real-world analogy or example would help clarify what a conjugate prior is. Abstractions are valuable but people need concrete examples if they want to jump in half-way through the course. I'm really keen to understand the relationship between the beta distribution and the binomial distribution, but this article (and the ones it links to) just leave me befuddled. (talk) 00:39, 21 June 2010 (UTC)

You haven't read enough of Wikipedia if you think this is it's most unintelligible ! However, I agree that it's baffling. I came here with no prior knowledge of what a conjugate prior is, following a link from a page that mentioned the beta distribution being the conjugate of various other distributions. I find myself reading a page that tells me a conjugate prior is (in effect) a likelihood function that changes metaparameters but not form when given new data; this does not tell me how this prior's form is conjugate *to* any other distribution, which was what I was trying to glean. Lurking in the back is the fact that the variate being modelled has a distribution, let's call it X; when the prior for its parameter has distribution Y, then data about the primary refines our knowledge of the parameter to a posterior likelihood of the same form as the prior Y; in such a case, I'm guessing "the form of Y" is what's being described as "conjugate to" (possibly the form of) X; but I don't actually see the text **saying that** so I'm left wondering whether I've guessed wrong. An early remark about the gaussian seemed to be saying that, but it was hard to be sure because it was being described as self-conjugate and similar phrasing was used to describe the prior and posterior as conjugate, so I was left in doubt as to whether X=gauss has Y=gauss work. I lost hope of finding any confirmation or correction for my guess as the subsequent page descended into unintelligible gibberish. (It might not seem like that to its author, but that's the problem of knowing what you're talking about and only being used to talking about it to others who already understand it: as you talk about it, you say the things you think when you think about it and can't see that, although it all fits nicely together within any mind that understands it already, it *conveys nothing* to anyone who doesn't already understand it. Such writing will satisfy examiners or your peers that you understand the subject matter, but won't teach a student anything.) -- Eddy (talk) 06:14, 30 July 2015 (UTC)

Another less cheeky comment.[edit]

Paragraphs 1 through to contents - Great. The rest - incomprehensible. I have no doubt that if you already know the content, it is probably superb, but I saw a long trail of introduced jargon going seemingly in no particular direction. I was looking for a what & some "WHY do this", but I did not find it here. Many thanks for the opening paragraphs. Yes, I may be asking for you to be the first ever to actually explain Bayesian (conjugate) priors in an intuitive way. [not logged in] — Preceding unsigned comment added by (talk) 20:13, 1 August 2011 (UTC)

So, working through the example, thanks for one, being my only hope to work out what it all means: If we sample this random ... f - Arh, not the "f" of a few lines above. x - Arh, "s,f" that's x & "x", well that's the value for q = x, that's theta, from a few lines above I'm rewriting it on my page to just get the example clear. — Preceding unsigned comment added by (talk) 03:12, 10 August 2011 (UTC)

This article[edit]


Simple English sans maths in the intro would be great. —Preceding unsigned comment added by (talk) 14:48, 24 March 2011 (UTC)

Broken link[edit]

The external link is broken. Should I remove it? — Preceding unsigned comment added by (talk) 17:38, 12 December 2011 (UTC)

Wrong posterior[edit]

Some of the posterior are wrong. I just discovered one:

Normal with known precision τ μ (mean)

The posterior variance is (τ0+nτ)^-1. — Preceding unsigned comment added by (talk) 04:09, 15 May 2012 (UTC)

That Table[edit]

Yeah... That table, while informative, is not formatted very well. It wasn't clear at first what the Posterior Hyperparameters column represented, or what any of the variables meant in the Posterior Predictive column. — Preceding unsigned comment added by (talk) 05:00, 10 December 2013 (UTC)

Assessment comment[edit]

The comment(s) below were originally left at Talk:Conjugate prior/Comments, and are posted here for posterity. Following several discussions in past years, these subpages are now deprecated. The comments may be irrelevant or outdated; if so, please feel free to remove this section.

Last edited at 21:57, 25 September 2007 (UTC). Substituted at 19:53, 1 May 2016 (UTC)