The scaled inverse chi-squared distribution is the distribution for x = 1/s2, where s2 is a sample mean of the squares of ν independent normal random variables that have mean 0 and inverse variance 1/σ2 = τ2. The distribution is therefore parametrised by the two quantities ν and τ2, referred to as the number of chi-squared degrees of freedom and the scaling parameter, respectively.
This family of scaled inverse chi-squared distributions is closely related to two other distribution families, those of the inverse-chi-squared distribution and the inverse gamma distribution. Compared to the inverse-chi-squared distribution, the scaled distribution has an extra parameter τ2, which scales the distribution horizontally and vertically, representing the inverse-variance of the original underlying process. Also, the scale inverse chi-squared distribution is presented as the distribution for the inverse of the mean of ν squared deviates, rather than the inverse of their sum. The two distributions thus have the relation that if
Compared to the inverse gamma distribution, the scaled inverse chi-squared distribution describes the same data distribution, but using a different parametrization, which may be more convenient in some circumstances. Specifically, if
Either form may be used to represent the maximum entropy distribution for a fixed first inverse moment and first logarithmic moment .
The scaled inverse chi-squared distribution also has a particular use in Bayesian statistics, somewhat unrelated to its use as a predictive distribution for x = 1/s2. Specifically, the scaled inverse chi-squared distribution can be used as a conjugate prior for the variance parameter of a normal distribution. In this context the scaling parameter is denoted by σ02 rather than by τ2, and has a different interpretation. The application has been more usually presented using the inverse gamma distribution formulation instead; however, some authors, following in particular Gelman et al. (1995/2004) argue that the inverse chi-squared parametrisation is more intuitive.
where D represents the data and I represents any initial information about σ2 that we may already have.
The simplest scenario arises if the mean μ is already known; or, alternatively, if it is the conditional distribution of σ2 that is sought, for a particular assumed value of μ.
Then the likelihood term L(σ2|D) = p(D|σ2) has the familiar form
Combining this with the rescaling-invariant prior p(σ2|I) = 1/σ2, which can be argued (e.g. following Jeffreys) to be the least informative possible prior for σ2 in this problem, gives a combined posterior probability
This form can be recognised as that of a scaled inverse chi-squared distribution, with parameters ν = n and τ2 = s2 = (1/n) Σ (xi-μ)2
Gelman et al remark that the re-appearance of this distribution, previously seen in a sampling context, may seem remarkable; but given the choice of prior the "result is not surprising".
In particular, the choice of a rescaling-invariant prior for σ2 has the result that the probability for the ratio of σ2 / s2 has the same form (independent of the conditioning variable) when conditioned on s2 as when conditioned on σ2:
In the sampling-theory case, conditioned on σ2, the probability distribution for (1/s2) is a scaled inverse chi-squared distribution; and so the probability distribution for σ2 conditioned on s2, given a scale-agnostic prior, is also a scaled inverse chi-squared distribution.
If more is known about the possible values of σ2, a distribution from the scaled inverse chi-squared family, such as Scale-inv-χ2(n0, s02) can be a convenient form to represent a less uninformative prior for σ2, as if from the result of n0 previous observations (though n0 need not necessarily be a whole number):
Such a prior would lead to the posterior distribution
which is itself a scaled inverse chi-squared distribution. The scaled inverse chi-squared distributions are thus a convenient conjugate prior family for σ2 estimation.
If the mean is not known, the most uninformative prior that can be taken for it is arguably the translation-invariant prior p(μ|I) ∝ const., which gives the following joint posterior distribution for μ and σ2,
The marginal posterior distribution for σ2 is obtained from the joint posterior distribution by integrating out over μ,
This is again a scaled inverse chi-squared distribution, with parameters and .