Talk:Conjugate prior
Statistics Start‑class Low‑importance | ||||||||||
|
Mathematics Start‑class Low‑priority | ||||||||||
|
Untitled
You may want to see some of the pages on Empirical Bayes, the Beta-Binomial Model, Bayesian Linear RegressionCharlesmartin14 23:43, 19 October 2006 (UTC).
This article and http://en.wikipedia.org/wiki/Poisson_distribution#Bayesian_inference disagree about the hyper parameters to the posterior. —Preceding unsigned comment added by 98.202.187.2 (talk) 00:57, 12 May 2009 (UTC)
- This was my comment last night, sorry I didn't sign it, I've just corrected this on the page. Bazugb07 (talk) 14:32, 12 May 2009 (UTC)
Could someone fill in the table for multivariate normals and and pareto? 128.114.60.100 06:21, 21 February 2007 (UTC)
It would be nice to actually state which parameters mean what, since the naming in the table does not correspond to the naming on the pages for the corresponding distributions (atm I have a problem figuring out which of the hyperparameters for the prior for the normal (variance and mean) belong to the inverse gamma function and which to the normal) —Preceding unsigned comment added by 129.26.160.2 (talk) 10:44, 14 September 2007 (UTC)
For the Gamma likelihood with prior over the rate parameter, the posterior parameters are for any . This is in the Fink reference.Paulpeeling (talk) 11:53, 24 May 2008 (UTC)
May want to consider splitting the tables into scalar and multivariate conjugate distributions.
Changed "assuming dependence" (under normal with no known parameters) to "assuming exchangability". "Dependence" is wrong; "independence" is better, but since technically that should be "independence, conditional on parameters", I replaced it with the usual "exchangability" for brevity. 128.59.111.72 (talk) 00:48, 18 October 2008 (UTC)
- Wasn't the "dependence" referring to dependence among the parameters, not the data? --128.187.80.2 (talk) 23:00, 30 March 2009 (UTC)
Family of distributions
How does one tell whether two distributions are conjugate priors? What distinguishes "families"?
Incorrect posterior parameters
Has anyone else noticed the posterior parameters are wrong? At least according (Degroot, 1970), the multivariate normal distribution posterior in terms of precision is listed incorrectly: it should be what the multivariate normal distribution in terms of the covariance matrix is listed as on the table. I don't really have the time to make these changes right now or check any of the other posterior parameters for accuracy, but someone needs to double check these tables. Maybe I'll do it when I'm not so busy. Also the, Fink (1995) article disagrees with DeGroot on a number of points, so I question it's legitimacy, given that the latter is published work and former is an ongoing report. Maybe it should be removed as a source? DeverLite (talk) 23:22, 8 January 2010 (UTC)
I just implemented the Multivariate Gaussian with Normal - Wishart conjugate distribution according to the article and found that it does not integrate to one. I corrected the posterior distribution in that case, but the others probably also need to be corrected. — Preceding unsigned comment added by 169.229.222.176 (talk) 01:55, 20 August 2012 (UTC)
To prevent confusion, it should be made clear that the Student's t distribution specified as the posterior for the multivariate normal cases is a multivariate student's t distribution parametrized by precision matrix, not by covariance as in the wikipedia article on the multivariate student's t distribution. — Preceding unsigned comment added by 169.229.222.176 (talk) 02:03, 20 August 2012 (UTC)
I was just looking at it and it looked wrong to me. The accuracy-based posterior parameters should have no inversions (as can be seen in the univariate case for example). I can fix that according to DeGroot's formulation. --Olethros (talk) 15:35, 14 September 2010 (UTC)
Marginal distributions
I think it would be useful to augment the tables with the "marginal distribution" as well. The drawback here is the tables will widen, and they are already pretty dense. Thoughts? --128.187.80.2 (talk) 23:00, 30 March 2009 (UTC)
- I am not clear what you mean by marginal distribution here ... if it is what I first thought (marginal dist of the observations) then these marginal distributions might find a better and useful place under an article named like compound distributions. Or is it the marginal distribution of new observations conditional on the existing observations marginalised over the parameters (ie predictive distributions)? Melcombe (talk) 08:59, 31 March 2009 (UTC)
- I was referring to the marginal distribution of the observations (not the predictive distribution). I often use this page as a reference guide (much simpler than pulling out my copy of Gelman et al.) and at times I have wanted to know the marginal distribution of the data. Granted, many books don't include this information, but it would be useful. As an example, in the Poisson-Gamma model (when the gamma is parameterized by rate). This information is largely contained in Negative binomial#Gamma-Poisson_mixture but that article does not specifically mention that it is the marginal distribution of the data in the Bayesian setting. Plus, it would be more convenient to have the information in one place. Your proposal to put it on a dedicated page may be a reasonable compromise since the tables are already large and this information is used much less frequently. --128.187.80.2 (talk) 17:27, 1 April 2009 (UTC)
- I am not clear what you mean by marginal distribution here ... if it is what I first thought (marginal dist of the observations) then these marginal distributions might find a better and useful place under an article named like compound distributions. Or is it the marginal distribution of new observations conditional on the existing observations marginalised over the parameters (ie predictive distributions)? Melcombe (talk) 08:59, 31 March 2009 (UTC)
- I thought that giving such marginal distributions would be unusual in a Bayesian context, but I see that Bernardo & Smith do include them in the table in their book .. but they do this by a having a separate list of results for each distribution/model, which would be a drastic rearrangement of what is here. An article on compound distributions does seem to be needed for its own sake. Melcombe (talk) 13:21, 2 April 2009 (UTC)
Poisson-Gamma
It keeps getting changed back and forth, but I have the hyperparameters as: \alpha + n,\ \beta + \sum_{i=1}^n x_i\!
-- There is certainly a problem as it currently stands. The wikipedia page on the gamma explicitly gives both the two forms. Particularly k=alpha, beta=1/theta. Hence the update rules must be consistent with this notation. I have corrected this for now. —Preceding unsigned comment added by 129.215.197.80 (talk) 15:23, 20 January 2010 (UTC)
Please add discussion if this is incorrect before changing it! —Preceding unsigned comment added by Occawen (talk • contribs) 05:06, 6 December 2009 (UTC)
Most unintelligible article on Wikipedia
Just a cheeky comment to say that this is the hardest article to understand of all those I've read so far. It assumes a lot of background knowledge of statistics. Maybe a real-world analogy or example would help clarify what a conjugate prior is. Abstractions are valuable but people need concrete examples if they want to jump in half-way through the course. I'm really keen to understand the relationship between the beta distribution and the binomial distribution, but this article (and the ones it links to) just leave me befuddled. 111.69.251.147 (talk) 00:39, 21 June 2010 (UTC)
- You haven't read enough of Wikipedia if you think this is it's most unintelligible ! However, I agree that it's baffling. I came here with no prior knowledge of what a conjugate prior is, following a link from a page that mentioned the beta distribution being the conjugate of various other distributions. I find myself reading a page that tells me a conjugate prior is (in effect) a likelihood function that changes metaparameters but not form when given new data; this does not tell me how this prior's form is conjugate *to* any other distribution, which was what I was trying to glean. Lurking in the back is the fact that the variate being modelled has a distribution, let's call it X; when the prior for its parameter has distribution Y, then data about the primary refines our knowledge of the parameter to a posterior likelihood of the same form as the prior Y; in such a case, I'm guessing "the form of Y" is what's being described as "conjugate to" (possibly the form of) X; but I don't actually see the text **saying that** so I'm left wondering whether I've guessed wrong. An early remark about the gaussian seemed to be saying that, but it was hard to be sure because it was being described as self-conjugate and similar phrasing was used to describe the prior and posterior as conjugate, so I was left in doubt as to whether X=gauss has Y=gauss work. I lost hope of finding any confirmation or correction for my guess as the subsequent page descended into unintelligible gibberish. (It might not seem like that to its author, but that's the problem of knowing what you're talking about and only being used to talking about it to others who already understand it: as you talk about it, you say the things you think when you think about it and can't see that, although it all fits nicely together within any mind that understands it already, it *conveys nothing* to anyone who doesn't already understand it. Such writing will satisfy examiners or your peers that you understand the subject matter, but won't teach a student anything.) -- Eddy 84.215.30.244 (talk) 06:14, 30 July 2015 (UTC)
Another less cheeky comment.
Paragraphs 1 through to contents - Great. The rest - incomprehensible. I have no doubt that if you already know the content, it is probably superb, but I saw a long trail of introduced jargon going seemingly in no particular direction. I was looking for a what & some "WHY do this", but I did not find it here. Many thanks for the opening paragraphs. Yes, I may be asking for you to be the first ever to actually explain Bayesian (conjugate) priors in an intuitive way. [not logged in] — Preceding unsigned comment added by 131.203.13.81 (talk) 20:13, 1 August 2011 (UTC)
So, working through the example, thanks for one, being my only hope to work out what it all means: If we sample this random ... f - Arh, not the "f" of a few lines above. x - Arh, "s,f" that's x & "x", well that's the value for q = x, that's theta, from a few lines above I'm rewriting it on my page to just get the example clear. — Preceding unsigned comment added by 131.203.13.81 (talk) 03:12, 10 August 2011 (UTC)
This article
wat
Simple English sans maths in the intro would be great. —Preceding unsigned comment added by 78.101.145.17 (talk) 14:48, 24 March 2011 (UTC)
Broken link
The external link is broken. Should I remove it? — Preceding unsigned comment added by 163.1.211.163 (talk) 17:38, 12 December 2011 (UTC)
Wrong posterior
Some of the posterior are wrong. I just discovered one:
Normal with known precision τ μ (mean)
The posterior variance is (τ0+nτ)^-1. — Preceding unsigned comment added by 173.19.34.157 (talk) 04:09, 15 May 2012 (UTC)
I just discovered another for Normal with unknown mean and variance. Second hyper parameter should be . — Preceding unsigned comment added by 193.48.2.5 (talk) 10:56, 17 January 2019 (UTC)
That Table
Yeah... That table, while informative, is not formatted very well. It wasn't clear at first what the Posterior Hyperparameters column represented, or what any of the variables meant in the Posterior Predictive column. — Preceding unsigned comment added by 129.93.5.131 (talk) 05:00, 10 December 2013 (UTC)
Is there some reference for the log-normal to normal conversion? It seems strange that the estimates would be optimal after exponentiation. — Preceding unsigned comment added by Amrozack (talk • contribs) 21:50, 17 June 2020 (UTC)
Assessment comment
The comment(s) below were originally left at Talk:Conjugate prior/Comments, and are posted here for posterity. Following several discussions in past years, these subpages are now deprecated. The comments may be irrelevant or outdated; if so, please feel free to remove this section.
Comment(s) | Press [show] to view → |
---|---|
Please add useful comments here--Cronholm144 09:02, 24 May 2007 (UTC)
2. The fourth reference should be removed because of serious errors; it is a reference, and the errors are in the material for which is it supposed to be a reference. Fink, D. 1995 A Compendium of Conjugate Priors. , http://www.people.cornell.edu/pages/df36/CONJINTRnew%20TEX.pdf At least two of the posterior hyperparameters in that reference are incorrect The first, p 11, is for a gamma prior for the rate of a Poisson. The prior is parameterized with beta as a scale (eqn 31), but the posterior scale hyperparameter beta is given as beta/(1+n), which is incorrect. It should be beta/(1 + beta * N). The second, page 18, is for a gamma prior for the precision of a normal. The prior is parameterized with beta as scale hyperparameter (eqn 62), but the posterior scale hyperparameter beta is given as beta + n, (eqn 63) which is correct when beta is rate, not scale. Dstivers 21:57, 25 September 2007 (UTC) |
Last edited at 21:57, 25 September 2007 (UTC). Substituted at 19:53, 1 May 2016 (UTC)
Any appetite for a "practical application" section?
I've recently used bayesian conjugate priors for computing the probability that there will be at least 1 rental car available in my area at any given day. Would there be any appetite for a section showing how one can use the table to compute something like this? If so i would write that up in the next few days. I figure it might help with making the page a little more understandable. — Preceding unsigned comment added by Rasmusbergpalm (talk • contribs) 08:59, 11 February 2020 (UTC)
Beta note
I am not persuaded that saying the interpretation of a Beta prior/posterior distribution with hyperparameters α and β is successes and failures, with the note
The exact interpretation of the parameters of a beta distribution in terms of number of successes and failures depends on what function is used to extract a point estimate from the distribution. The mode of a beta distribution is which corresponds to successes and failures; but the mean is which corresponds to successes and failures. The use of and has the advantage that a uniform prior corresponds to 0 successes and 0 failures, but the use of and is somewhat more convenient mathematically and also corresponds well with the fact that Bayesians generally prefer to use the posterior mean rather than the posterior mode as a point estimate. The same issues apply to the Dirichlet distribution.
My problem is that this becomes a nonsense when used with the Jeffreys prior : the fractional values might be explainable away, but the negative values really cannot. I would much prefer saying the interpretation of α and β is successes and failures with a note like the following - 11:03, 23 June 2020 (UTC)
The exact interpretation of the parameters of a beta distribution in terms of number of successes and failures depends on what function is used to extract a point estimate from the distribution. The mean of a beta distribution is which corresponds to successes and failures, while the mode is which corresponds to successes and failures. Bayesians generally prefer to use the posterior mean rather than the posterior mode as a point estimate, justified by a quadratic loss function, and the use of and is more convenient mathematically, while the use of and has the advantage that a uniform prior corresponds to 0 successes and 0 failures. The same issues apply to the Dirichlet distribution.
looks like an error in predictive priors
For the a poisson likelihood and gamma prior, the predictive prior is given as when specifying the Gamma Distribution using scale (); and when specifying the Gamma Distribution using rate . Since , I do not see how both equations can be correct.
Furthermore, there are a number of conventions for the Negative binomial (NB) and it does not specify which is used.
The root of all these problems is that the equations are unsourced. Every single equation should be sourced, or deleted. Adpete (talk) 01:38, 26 August 2020 (UTC)
mean + variance gaussian chain
These https://people.eecs.berkeley.edu/~jordan/courses/260-spring10/lectures/lecture5.pdf notes present a chained prior for the Gaussian when neither mean or variance are fixed - is it a good idea to put this in the table too? Thank you,