# Talk:Exponential family

WikiProject Statistics (Rated C-class, High-importance)

This article is within the scope of the WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page or join the discussion.

C  This article has been rated as C-Class on the quality scale.
High  This article has been rated as High-importance on the importance scale.
WikiProject Mathematics (Rated C-class, Mid-importance)
This article is within the scope of WikiProject Mathematics, a collaborative effort to improve the coverage of Mathematics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
Mathematics rating:
 C Class
 Mid Importance
Field: Probability and statistics

## Must H be a pdf

Question: must the reference distribution H be a probability distribution, or will a positive measure do? Note that the normalization condition applies to F, not to H. I am thinking of cases where the reference "distribution" is Lebesgue measure (to wit: the Normal distribution) or counting measure. — Miguel 11:08, 2005 Apr 16 (UTC)

Certainly it is Lebesgue measure in some cases. And counting measure on positive integers -- clearly not assigning finite measure to the whole space -- in some cases. Which causes me to notice that this article is woefully deficient in examples. I'll be back.... Michael Hardy 20:36, 16 Apr 2005 (UTC)

## Article reversion

Would you mind explaining the reversion of my edits on the article on the Exponential family? — Miguel 07:14, 2005 Apr 18 (UTC)

It said:

A is important in its own right, as it the cumulant-generating function of the probability distribution of the sufficient statistic T(X) when the distribution of X is H.

You changed it to:

A is important in its own right, as it is the cumulant-generating function of the probability distribution of the sufficient statistic T(X).

The edit consisted of deleting the words "when the distribution of X is H. The statement doesn't make sense without those words. A cumulant-generating function is always a cumulant-generating function of some particular probability distribution.

Well, actually the derivatives of A(η) evaluated at η instead of at zero give you the cumulants of dF(x|η), which is what I meant. The cumulants of dH are actually irrelevant to dF, what is interesting is that the cumulants of the entire family of exponential distributions with the same dH and T are encoded in A. Miguel 09:34, 2005 Apr 19 (UTC)

I see now that you also changed some other things. I haven't looked at those closely, but I now see that you changed "cdf" to "Lebesgue-Stieltjes integrator". I don't think that change makes sense either. That it is the Lebesgue-Stieltjes integrator is true, but the fact that it is the cdf is more to the point in this context. Michael Hardy 21:59, 18 Apr 2005 (UTC)

Except that, as you agree above, dH need not be a probability distribution, and hence it need not have a cdf. It is a positive measure, and dH is the integrated measure. I have never seen x on the whole real line called a cdf. It is fine if you want to call it a cdf, but then you'll have to explain somewhere else that the corresponding probability distribution may be "non-normalizable", and that will raise some eyebrows (not mine, though).
You also reverted a lot of valid content about the relationship between the exponential family and information entropy, as well as a reorganization of the existing information into sections, plus placeholders for discussing estimation and testing. The two edits that so bothered you were the last of a long series spanning two days. You could have been a little more careful. The appropriate thing would have been to discuss these things on this page. Miguel 09:34, 2005 Apr 19 (UTC)

## Disputed

The current revision says that the Weibull distributions do not form an exponential family. This seems to ignore non-canonical exponential families (where the natural parameter may be transformed by another function). What am I missing? --MarkSweep (call me collect) 23:28, 14 November 2005 (UTC)

I may have lifted that from another source without checking the pdf. This is not open to interpretation. Either the Weibull distribution is exponential or it isn't according to the definition. Transforming the natural parameter is not the issue: the normal distribution's natural parameters are not the mean and variance. Miguel 09:31, 15 November 2005 (UTC)

AFAIK, it's not an exponential family according to the definition given in this article. So I guess I put the {{dubious}} tag in the wrong place. What I was trying to point out is that there is a more general definition of exponential family in common use, see e.g. [1]. The difference between these definition becomes apparent when one considers the Weibull distribution: it's not an exponential family according to the definition used here, but it is according to the more general definition (unless I'm missing something). --MarkSweep (call me collect) 19:49, 15 November 2005 (UTC)

If you "transform" the natural parameter, you're using a different parametrization of the family of probability distributions involved, but you're not looking at a different family of probability distributions. Is that what you're talking about? Michael Hardy 22:46, 15 November 2005 (UTC)

The Weibull distribution is not in the exponential family according to the definition given here, which is one that can also be found in widely-used textbooks such as Casella and Berger, Statistical Inference (2nd edition), 2002, page 114. Later in the article it is noted that you NEED this specific definition for a distribution to have sufficient statistics (a result of Darmois, Koopman and Pitman from the 1930s), so you can't generalize the definition any further without penalty. To shorten the controversy about Weibull, I propose that that ALL MENTION of Weibull be dropped from the article, leaving only Cauchy as the agreed-upon non-member of the exponential family. Ed 02:01, 16 June 2006 (UTC)

OK, further study does not find any references supporting Weibull in the exponential family, so I suggest that we remove the "dispute" tag and let the original wording stand, the one where Cauchy and Weibull are both left out of the family. I consulted books on the exponential family by Lawrence Brown and Ole Barndorff-Nielsen. EdJohnston 17:14, 23 June 2006 (UTC)

It is an exponential distribution and is one according to the definition given in this article. The cdf defines a measure. It does not depend on the particular parameterization of the family of distributions. The natural parameters of the Weibull are (λ^-k,k-1) where λ and k are the parameters given in Wikipedia's defn of the Weibull. H is then Lebesque measure, and you can work out A yourself. Note that the Weibull is a generalization of the exponential distribution which is Weibull with k=1. CWoo on Planetmath gets this right (though his article seems to simplify some things). Odometer 05:34, 27 December 2006 (UTC)
To put Weibull in the exponential family needs a reference, in my opinion. Membership in the family is not expected to be invariant under transformation of the parameters. I left a question about this at User_talk:Odometer but have not received a response. I found that I was unable to 'work out A myself'. PlanetMath is interesting but it is not a reliable source for our purposes. EdJohnston 18:06, 8 January 2007 (UTC)
Look at the planetmath definition of exponential family. It's an equivalent definition when density functions exist, and it's the one that's often used because it's much easier to comprehend and pattern match on, so it's probably the more appropriate definition to use on wikipedia. Casella and Berger use that defn in their book. You just factor the density into a parameter part, a "data" part, and an interaction part. Generally speaking the 'A' doesn't matter that much when determining if it's an exponential family. 'A' is just the normalizing constant to make the density integrate to 1. It's sometimes called the log partition function while the rest of the density is sometimes called the kernel. In this case it's -k log(λ) + log(k) using the parameterization in wikipedia. You can stick that in the natural parameterization if you want. Also membership in the family is definitely invariant under transformation of the parameters. A family of distributions is just a collection of probability measures which are indexed by some parameters. You can change the indices, and it's still the same collection. Odometer 00:33, 16 February 2007 (UTC)
Can you provide the functions a, b, c and d needed at [2] for the Weibull distribution? I would be happy if a simpler pattern for the exponential family could be used in this article. It's not crystal clear that measure theory is essential for explaining this stuff. It does set a high prerequisite for understanding the article. EdJohnston 02:15, 16 February 2007 (UTC)

It's worth noting that McCullagh and Nelder's "Generalized Linear Models" talks about knowing the dispersion parameter (i.e. $\sigma^2$ in the normal case) as being akin to having a one-parameter ($\mu$) distribution. But this implies that the distribution could be placed in exponential family form before the dispersion parameter was fixed. They also talk about fitting with a Weibull distribution and mention that the fitting procedure isn't entirely inside the framework of a GLM--meaning it can't be placed in exponential family form except when it's coincident with the exponential distribution (pp 423). O18 06:54, 9 July 2007 (UTC)

## Question

Questions: Is the "prior" mentioned in the opening section the same as the "prior distribution" mentioned later? If so isn't the convention that the first mention of something links to the wikipedia article on it? (And calling it the "prior distribution there would be good too - people like me don't actually know any stats)

Also shouldn't "cdf" appear in brackets after the words "Cumulative distribution function" for clarity rather than just straight in the text? 20:25, 18 May 2006 (BST)

## Technical(expert) tag

The general idea is to start simple (i.e. without invoking complicated concepts), which is definitely possible as I have noted two examples from textbooks, and then to get to more rigorous treatments later on. Thanks in advance to anyone who helps with this! O18 17:17, 1 June 2007 (UTC)

Yes, this article would be more useful without the measure theory and the Lebesgue-Stieltjes integration. No books on my shelf seem to do a patient exposition of the exponential family. Maybe the treatment you found in McCullagh and Nelder would provide inspiration for our article? Also there's quite a bit of matrix notation used in the article and that adds to the reader's burden. EdJohnston 04:31, 24 June 2007 (UTC)
I think the MC&N method is too focused on GLM, I think others might be better. O18 14:43, 24 June 2007 (UTC)

## minus sign mismatch? and no previous mention of $K(u)$

The definition of the exponential family is given at the top of the page by $dF(x|\eta) = e^{-\eta^{\top} T(x) - A(\eta)}\, dH(x)$, which has a minus sign before the first $\eta$. However, section "Differential identities: an example" claims the "exponential family with canonical parameter" is not preceded by a minus sign. Is the minus absorbed into $\eta$ in the "Differential identities" in order for the Expectation and Variance formulas to work? If so, perhaps the form of the exponential family should remain the same and this result be formulated consistent with the single form.

Also, section "Differential identities: an example" claims that "As mentioned above $\scriptstyle K(u) = A(u + \eta) - A(\eta)$", but $\scriptstyle K(u)$ is not previously mentioned.

Erik Barry Erhardt 21:10, 23 June 2007 (UTC)

I personally prefer the definition with the plus sign, but then there is an inconsistency among the three definitions given at the beginning. The elementary single-parameter definition and the vector-parameter definitions have a + sign in front of the η but a - sign in front of the 'A', whereas the measure-theoretic definition has a minus signs on all terms. Miguel (talk) 17:17, 14 April 2010 (UTC)

## uniform dist

This used to say that the uniform distribution is not in the exponential family. But the uniform dist is a special kind of beta dist, and the beta dist is in the family. Benwing 06:02, 22 August 2007 (UTC)

I've never seen anyone miss the point of a definition so completely. Yes the uniform distribution is a beta distribution, yes, the beta distributions form an exponential family. To suggest that that in some way implies that the uniform distributions form an exponential family is ridiculous. You really need to read and understand the definition of "exponential family" before you can understand such things. An exponential family is a not just a distribution; it is a family of distributions. Every uniform distribution belongs to some exponential family, and in fact to more than one exponential family. That does not mean that the family of distributions that includes only the uniform distributions is an exponential family. It obviously is not, since the support of the various distributions in the family varies. Michael Hardy 00:16, 23 August 2007 (UTC)
OK, fine, I made a mistake, but you have too -- you've forgotten to be civil, see WP:CIVIL. Benwing 07:40, 23 August 2007 (UTC)

"since the support of the various distributions in the family varies. "- what does this mean? — Preceding unsigned comment added by Wikiusers1 (talkcontribs) 19:54, 2 March 2011 (UTC)

## far too technical

This page is a very good example of how a math page should *NOT* be. It's totally opaque to someone who doesn't already have a PhD in statistics, and such a person has no need for this page anyway. It would be far, far better if this page dispensed with all this Lebesgue measure business and gave a simple explanation, plus simple, clearly explained examples, plus a simple derivation of the relation to maximum entropy, etc. Then -- maybe -- include a completely, techically correct discussion at the bottom.

It should go something like this:

First, explain the simple case:

• Explain the simple case of one parameter, f(x|t) = exp(a(t) + b(x) + c(t)*d(x)) = A(t)*h(x)*exp(c(t)*d(x)) = B(u)*h(x)*exp(u*d(x)).
• Show briefly why these different definitions are equivalent, and how you can reparameterize from t to u to eliminate c(t), and describe that u is a "natural parameter".
• Explain how A(t) is just a normalization factor.
• Explain briefly that d(x) is the basis of a sufficient statistic.

Then expand the definition to cover vector-valued x and t.

Then rewrite sections 1-4 to eliminate the discussion in terms of CDF's, Lebesgue measure, Radon-Nikodym derivatives, Einstein's summation convention, and anything else that an undergrad is likely to find opaque. Show how the maxent relationship is derived, rather than just saying "it's a simple matter of variational calculus" -- and do it *without* invoking any calculus of variations. (If you're not sure how to do this, look up standard NLP papers on maximum entropy.)

Then, finally, if you want, include the full gory advanced details.

If you are a statistics whiz, you may think that doing what I suggested is intolerably stupid or obvious, etc. But keep in mind that Wikipedia articles are *NOT* addressed to fellow experts, but to a general audience. See WP:WPM and WP:MOSDEF for more discussion.

The corresponding article on PlanetMath would be a good place to start.

Benwing 07:40, 23 August 2007 (UTC)

Actually, I am no slouch at statistics, yet what you propose (aside for details to be fleshed out later) is neither stupid nor intolerably obvious. I hesitate to dive right in though, as my tendency is to keep as much intrinsic material if possible and appropriate but change the writing style, yet my comfort level with these topics would make my work quite drudging and painful.
A good start might be to adapt the section from generalized linear model; I should note I had a hand at writing that section. Check it out and let us know if that is closer to what you had in mind (I know it would have to be adapted somewhat away from GLM context...). Baccyak4H (Yak!) 14:29, 23 August 2007 (UTC)
I agree with every word Benwing said. In fact, I have been meaning to work on this but never seem to get around to it. I think there is no escaping a major rewrite here. --Zvika 18:55, 7 September 2007 (UTC)

"It's totally opaque to someone who doesn't already have a PhD in statistics"

That is nonsense. Any mathematics graduate student who knows the basic definitions in probability theory would understand it. Many mathematicians who do not already know this material would understand it readily. Michael Hardy 19:06, 7 September 2007 (UTC)

Perhaps nonsense in letter, but not in spirit. The thrust of this talk section is dicussion to improve the article, not to point out true but minor pedagogical points which will not help improve the article, and may as a side effect disparage well meaning editors. That aside, I know you have a good handle on these topics, your participation could be very helpful. Would you care to contribute to this effort? Baccyak4H (Yak!) 19:21, 7 September 2007 (UTC)
(edit conflict) The point is that a basically simple idea is explained in overly technical terms. You should think not of a mathematics graduate but of an engineer or scientist, who has taken a basic course in probability. These courses very often do not include any measure theory at all; many readers will certainly never have heard of a Lebesgue-Stieltjes integral. Yet the idea of an exponential family -- a class of distributions having a particular probability function -- can be understood both at the technical and at the intuitive level by such a reader. All we have to do is initially talk only about continuous and discrete distributions, and defer the general treatment to a later section. "Things should be made as simple as possible, but not any simpler." [3] --Zvika 19:33, 7 September 2007 (UTC)

I agree that it could be made accessible to a wider audience. But there's no reason to dismiss as worthless the many mathematicians not familiar with this concept who could learn it by reading this article. Michael Hardy 23:21, 7 September 2007 (UTC)

(edit conflict) Michael, do you agree with Benwing thought that maybe this is the least important audience(add text, "Wikipedia:Make_technical_articles_accessible suggests that this topic should be accessible to the widest possible audience?")
Would anyone else be willing to work on a (sandbox version)? O18 00:14, 8 September 2007 (UTC) (edit O18 00:41, 8 September 2007 (UTC)) -- edited to remove broken link O18 04:10, 30 September 2007 (UTC)

What, specifically, is the least important audience? I don't see anything above where Benwing says any particular audience is the least important one. Michael Hardy 00:21, 8 September 2007 (UTC)

## proposed overhaul

Several editors have written an alternative version of this page that is intended to be more readable for readers with out Ph.D.s in maths. This effort resulted form the above conversation, and is located at (User:Pdbailey/Sandbox/Exponential_family). I intend to replace the body of this article in a few days, if there are no objections. Please also feel free to edit the linked page before it is moved or after it is moved. O18 19:51, 22 September 2007 (UTC) -- edited to remove broken sandbox link

Update: The rewrite has now been carried out. We hope you like it. Feel free to directly edit the exponential family page. --Zvika 08:38, 27 September 2007 (UTC)

## Conjugate priors

Hello, the article says that "In the case of a likelihood which belongs to the exponential family there exists a conjugate prior, which is often also in the exponential family." This suggests that exponential families exist with conjugate priors that are not in the exponential family, is that true? But the form given for the conjugate prior, $\pi(\eta) \propto \exp(-\eta^{\top} \alpha - \beta\, A(\eta))$, looks like it is in the exponential family (for instance if we append to the vector $\eta$ a component $A(\eta)$). Thanks in advance. A5 10:05, 1 November 2007 (UTC)

isn't it rather darmois koopman pitman theorem ? letters... +maybe the dates —Preceding unsigned comment added by 129.175.52.7 (talk) 13:45, 14 February 2008 (UTC)

## Still too technical ? (and inaccurate)

I changed the first sentence and I hope you find it clearer. Regarding the use of the exponential family vs. an exponential family, two standard textbooks that I have with me both use the an option:
• Lehmann and Casella, Theory of Point Estimation, 2nd ed., p.22.
• Shao, Mathematical Statistics, 2nd ed., p.96.
In both cases, they define an exponential family as a parametric family of distributions; each such family is a different exponential family. So there is no "the" (in the sense of "one and only") exponential family. --Zvika (talk) 09:14, 19 February 2008 (UTC)
Previous talk on this page has tried to figure out if an exponential family is just a set of functions, or if any of the parameters also separate one family from another. I don't recall the result, but certainly the premise was that there are many families. Even if 130.223.123.54 can not produce a reference, it might be worth considering having a section that addresses exactly this topic--notably, I am not qualified to write it. O18 (talk) 21:05, 20 February 2008 (UTC)
There may be a confusion here with the exponential distribution. The exponential distributions do form an exponential family. Miguel (talk) 18:08, 14 April 2010 (UTC)
No, that is not the confusion. The question is, what is the definition of a family? 018 (talk) 18:27, 14 April 2010 (UTC)
Right. I see you linked to natural exponential family below. I wonder whether the content of that article shouldn't be merged into this one. In fact, the opening paragraph of the article as it now stands claims
an exponential family is a class of probability distributions sharing a certain form, specified below. It is said that such distributions belong to the exponential class of density functions
So we have exponential function, exponential distribution, exponential family, natural exponential family, and exponential class... No wonder people get confused. — Miguel (talk) 19:07, 14 April 2010 (UTC)

## Redirection from Pitman-Koopman theorem

Pitman-Koopman theorem redirects to here; it deserves its own page. The theorem is also not listed on this page under its own name. Agalmic (talk) 01:05, 4 July 2008 (UTC)

## Negative Binomial

Mention of the negative binomial should be qualified, too. As far as I can tell, it wouldn't be in the an exponential family unless the parameter, r, is fixed. Schomerus (talk) 23:29, 12 October 2009 (UTC)

I think you're right, and I changed the article accordingly. P.S. In the future please add new discussions at the bottom of the talk page. --Zvika (talk) 05:45, 13 October 2009 (UTC)

## Normal distribution

No objections to the article and no typos found.

I came across this article looking for inspiration on the following question. Having not gained any, I pose the question here:

The family of distributions N(0, sigma^2) (or any normal distribution with known mean), is evidently not complete, and thus also not an exponential family. Where does the definition of an exponential family fail? As the article indicates, should the mean be unknown, things follow through. —Preceding unsigned comment added by 131.155.71.49 (talk) 12:23, 20 October 2009 (UTC)

Actually, unless I'm not seeing straight today, if the mean is known to be $\mu$, the definition is satisfied as
$f_\mu(x;\theta)dx = e^{{-1\over 2}\bigl({x-\mu\over\sigma}\bigr)^2}{dx\over\sqrt{2\pi\sigma^2}}= h(x) \exp(\eta(\theta) T_\mu(x) - A(\theta))dx$
if we let $\theta = \sigma^2$ and
$h(x)={1\over\sqrt{2\pi}}$
$\eta(\theta)={-1\over 2\theta}$
$T_\mu(x)=(x-\mu)^2$
$A(\theta) = {1\over 2}\log\theta$
See Cramér–Rao bound. --- Miguel (talk) 13:35, 29 November 2009 (UTC)
I wonder if this editor was thinking of Natural exponential family. 018 (talk) 18:32, 14 April 2010 (UTC)
A natural exponential family is just an exponential family for which the sample mean is a sufficient statistic... — Miguel (talk) 19:53, 14 April 2010 (UTC)

## Canonical parameters

I believe the exposition would be clearer and easier to understand if we start with the exponential family in its canonical form, where θ is the natural parameter of the family. In that way we would be able to talk about the dual parameters, and about estimation of the parameters (e.g. the theorem that the exponential family is the only family where parameters can be estimated efficiently). stpasha » 19:34, 24 December 2009 (UTC)

## Measure-theoretic formulation excludes dx !

I edited

Suppose H is a non-decreasing function of a real variable and H(x) approaches 0 as x approaches −∞.

into

Suppose H is a non-decreasing function of a real variable.

Clearly we want to include the case where H(x) = x. — Miguel (talk) 16:53, 14 April 2010 (UTC)

## Newbie question. . . but it really bugs me! Math Notation, read as . . .

How do we read the expressions fx(x|θ) and fσ(x;μ)? While I'm on the topic, wouldn't it be better if every new symbol introduced into a mathematics article were also posted in the Table of Mathematical Symbols [[4]]? As it is, whenever I read a mathematical article, I am then forced to look up the symbols elsewhere to get the pronunciation, rendering the otherwise fine math content of Wikipedia pretty much redundunt. Wikipedia is, after all, supposed to be a resource for the non-expert as well as the expert. — Preceding unsigned comment added by Romanicus (talkcontribs) 17:14, 16 December 2010 (UTC)

I guess you are looking for the Greek alphabet article? // stpasha » 21:41, 16 December 2010 (UTC)

I mean, how does one read the entire expression? For instance, this - "2+3" - is not read as "two and a little cross and three". It's read as, "two plus three". So, how are these two expressions - "fx(x|θ)" and "fσ(x;μ)" - read? — Preceding unsigned comment added by Romanicus (talkcontribs) 21:12, 14 January 2011 (UTC)

They're not read, they're written. Higher-level maths isn't a spoken language, it's a written one. Even when face-to-face, mathematicians need pencil and paper, or blackboard and chalk, or whiteboard and marker, to communicate. --Qwfp (talk) 08:57, 15 January 2011 (UTC)

In the definition, h(x) is not any arbitrary function- there are conditions imposed on it like it is differentiable everywhere. Right? —Preceding unsigned comment added by 91.125.78.233 (talk) 12:03, 16 March 2011 (UTC)

Since x can be discrete, differentiability of h(x) can play no role. Melcombe (talk) 12:54, 16 March 2011 (UTC)

Dear Qwfp. I've been told that:
"fx(x|θ)" can be read as, "f of x, at x, restricted to the possible choices of parameter θ".
"fσ(x;μ)" can be read as, "f of x, at σ, across the possible choices of parameter μ".
But then, I don't know "higher math". (your math class must have had a zen-like silence).

## Query on Correctness of Moment generating function section

The MGF given is stated to apply for all exponential families... I believe this might be an error and that it only applies to Natural Exponential Families. If someone can refute this please do so. Otherwise I will try to find a reference that clears up the matter one way or the other.... —Preceding unsigned comment added by 137.194.233.21 (talk) 12:41, 22 April 2011 (UTC)

OK... I think I see what is happening the distribution of the sufficient statistic has a MGF with the given form, but the MGF of the distribution of x will also be in that form if it is natural exponential family (but not for some exponential families)...

As this confused me for some time maybe something could be added.... —Preceding unsigned comment added by 137.194.233.21 (talk) 13:07, 22 April 2011 (UTC) --137.194.233.21 (talk) 14:49, 22 April 2011 (UTC)

## Table of distributions

I think this article would benefit from a table of commonly encountered exponential family distributions and their corresponding representations as such. There are a couple in the article now, but I think having a table with many more would be useful. I've created an initial version of what I have in mind (below) which no doubt contains at least a few mistakes and omissions. Thoughts?

Nippashish (talk) 08:20, 7 March 2012 (UTC)

Here's an updated version of the table. I've fixed a few mistakes and added a column which gives $A$ in terms of the natural parameter, in addition to what I had before. I'll leave this here for a week or two and if there are still no objections I'll add it to the article.

Nippashish (talk) 01:09, 17 March 2012 (UTC)

Very strange! I was obviously thinking the same thing, and went ahead and added a table. But I had no idea until just now that you already made one; I populated the table myself, and it took a fair amount of math, since I couldn't always find sources or they had errors in them. I'll cross-check my results against yours and maybe incorporate the column where you write A() in turns of the standard parameters rather than the natural parameters. Benwing (talk) 08:02, 1 April 2012 (UTC)
I did the math to populate my version myself also, since I had the same experience looking for sources as you did. Hopefully our combined effort has lead to something with few mistakes. Nippashish (talk) 19:25, 1 April 2012 (UTC)

I think the table on the page can benefit from column with moments of the sufficient statistics. What do you think?

BorisYangel (talk) 21:43, 17 March 2013 (UTC)

### Table of distributions

In the following table the support of the density is denoted $x$. The symbol $\theta$ stands for the distribution parameters as they appear in their own wiki page, for example in the entry for the Bernoulli, $\theta=p$ whereas in the entry for the Normal distribution, we use $\theta = \{\mu, \sigma\}$.

Distribution $h(x)$ $T(x)$ $\eta(\theta)$ $A(\theta)$ $A(\eta)$
Bernoulli $1$ $x$ $\log\left(\frac{p}{1-p}\right)$ $-\log(1-p)$ $\log(1 + \exp(\eta))$
Binomial (known n) ${n \choose x}$ $x$ $\log\left(\frac{p}{1-p}\right)$ $-n\log(1-p)$ $n\log(1 + \exp(\eta))$
Multinomial (known n) $n!\left(\prod_{i=1}^Kx_i!\right)^{-1}$ $\begin{bmatrix}x_1 \\ \vdots \\ x_K\end{bmatrix}$ $\begin{bmatrix}\log(p_1) \\ \vdots \\ \log(p_K) \end{bmatrix}$ $0$ $0$
Poisson $\frac{1}{x!}$ $x$ $\log(\lambda)$ $\lambda$ $\exp(\eta)$
Normal (known variance) $\frac{1}{\sqrt{2\pi}}\exp\left(-\frac{x^2}{2\sigma^2}\right)$ $\frac{x}{\sigma}$ $\frac{\mu}{\sigma}$ $\frac{\mu^2}{2\sigma^2}$ $\frac{\eta^2}{2}$
Normal $\frac{1}{\sqrt{2\pi}}$ $\begin{bmatrix} x \\ x^2 \end{bmatrix}$ $\begin{bmatrix}\frac{\mu}{\sigma^2} \\ -\frac{1}{2\sigma^2} \end{bmatrix}$ $\frac{\mu^2}{2\sigma^2} + \log(\sigma)$ $-\frac{1\eta_1^2}{4\eta_2}-\frac{1}{2}\log(-2\eta_2)$
Multivariate Normal (d dimensions) $(2\pi)^{-d/2}$ $\begin{bmatrix} x \\ xx^{\mathrm{T}}\end{bmatrix}$ $\begin{bmatrix}\Sigma^{-1}\mu \\ -\frac{1}{2}\Sigma^{-1}\end{bmatrix}$ $1$ $1$
Exponential (rate) $1$ $x$ $-\lambda$ $-\log(\lambda)$ $-\log(-\eta)$
Gamma (rate) $1$ $\begin{bmatrix}x \\ \log(x)\end{bmatrix}$ $\begin{bmatrix}-\lambda \\ k-1 \end{bmatrix}$ $\log(\Gamma(k)) - k\log(\lambda)$ $\log(\Gamma(\eta_2+1)) - (\eta_2+1)\log(-\eta_1)$
Chi-squared $\exp\left(-\frac{x}{2}\right)$ $\log(x)$ $\frac{k}{2}-1$ $\frac{k}{2}\log(2) + \log(\Gamma(\frac{k}{2}))$ $(\eta+1)\log(2) + \log(\Gamma(\eta+1))$
Beta $1$ $\begin{bmatrix}\log(x) \\ \log(1-x)\end{bmatrix}$ $\begin{bmatrix}\alpha-1 \\ \beta-1 \end{bmatrix}$ $\log(B(\alpha,\beta))$ $\log(B(\eta_1+1,\eta_2+1))$
Dirichlet $1$ $\begin{bmatrix}x_1 \\ \vdots \\ x_K\end{bmatrix}$ $\begin{bmatrix}\alpha_1 - 1 \\ \vdots \\ \alpha_K - 1\end{bmatrix}$ $\sum_{i=1}^K\log(\Gamma(\alpha_i)) - \log(\Gamma(\sum_{i=1}^K\alpha_i))$ $\sum_{i=1}^K\log(\Gamma(\eta_i+1)) - \log(\Gamma(\sum_{i=1}^K\eta_i - K))$