= Normal-inverse-Wishart distribution =

In probability theory and statistics, the normal-inverse-Wishart distribution (or Gaussian-inverse-Wishart distribution) is a multivariate four-parameter family of continuous probability distributions. It is the conjugate prior of a multivariate normal distribution with an unknown mean and covariance matrix (the inverse of the precision matrix).

==Definition==
Suppose

$\boldsymbol\mu|\boldsymbol\mu_0,\lambda,\boldsymbol\Sigma \sim \mathcal{N}\left(\boldsymbol\mu\Big|\boldsymbol\mu_0,\frac{1}{\lambda}\boldsymbol\Sigma\right)$
has a multivariate normal distribution with mean $\boldsymbol\mu_0$ and covariance matrix $\tfrac{1}{\lambda}\boldsymbol\Sigma$, where

$\boldsymbol\Sigma|\boldsymbol\Psi,\nu \sim \mathcal{W}^{-1}(\boldsymbol\Sigma|\boldsymbol\Psi,\nu)$
has an inverse Wishart distribution. Then $(\boldsymbol\mu,\boldsymbol\Sigma)$
has a normal-inverse-Wishart distribution, denoted as
$(\boldsymbol\mu,\boldsymbol\Sigma) \sim \mathrm{NIW}(\boldsymbol\mu_0,\lambda,\boldsymbol\Psi,\nu) .$

==Characterization==

===Probability density function===

 $f(\boldsymbol\mu,\boldsymbol\Sigma|\boldsymbol\mu_0,\lambda,\boldsymbol\Psi,\nu) = \mathcal{N}\left(\boldsymbol\mu\Big|\boldsymbol\mu_0,\frac{1}{\lambda}\boldsymbol\Sigma\right) \mathcal{W}^{-1}(\boldsymbol\Sigma|\boldsymbol\Psi,\nu)$

The full version of the PDF is as follows:

$f(\boldsymbol{\mu},\boldsymbol{\Sigma} |
\boldsymbol{\mu}_0,\lambda,\boldsymbol{\Psi},\nu )
=\frac{\lambda^{D/2}|\boldsymbol{\Psi}|^{\nu /
    2}|\boldsymbol{\Sigma}|^{-\frac{\nu + D + 2}{2}}}{(2
  \pi)^{D/2}2^{\frac{\nu
      D}{2}}\Gamma_D(\frac{\nu}{2})}\text{exp}\left\{
  -\frac{1}{2}Tr(\boldsymbol{\Psi
    \Sigma}^{-1})-\frac{\lambda}{2}(\boldsymbol{\mu}-\boldsymbol{\mu}_0)^T\boldsymbol{\Sigma}^{-1}(\boldsymbol{\mu}
  - \boldsymbol{\mu}_0) \right\}$

Here $\Gamma_D[\cdot]$ is the multivariate gamma function and $Tr(\boldsymbol{\Psi})$ is the Trace of the given matrix.

==Properties==

===Marginal distributions===
By construction, the marginal distribution over $\boldsymbol\Sigma$ is an inverse Wishart distribution, and the conditional distribution over $\boldsymbol\mu$ given $\boldsymbol\Sigma$ is a multivariate normal distribution. The marginal distribution over $\boldsymbol\mu$ is a multivariate t-distribution.

== Posterior distribution of the parameters ==
Suppose the sampling density is a multivariate normal distribution

$\boldsymbol{y_i}|\boldsymbol\mu,\boldsymbol\Sigma \sim \mathcal{N}_p(\boldsymbol\mu,\boldsymbol\Sigma)$

where $\boldsymbol{y}$ is an $n\times p$ matrix and $\boldsymbol{y_i}$ (of length $p$) is row $i$ of the matrix .

With the mean and covariance matrix of the sampling distribution is unknown, we can place a Normal-Inverse-Wishart prior on the mean and covariance parameters jointly

$(\boldsymbol\mu,\boldsymbol\Sigma) \sim \mathrm{NIW}(\boldsymbol\mu_0,\lambda,\boldsymbol\Psi,\nu).$

The resulting posterior distribution for the mean and covariance matrix will also be a Normal-Inverse-Wishart

$(\boldsymbol\mu,\boldsymbol\Sigma|y) \sim \mathrm{NIW}(\boldsymbol\mu_n,\lambda_n,\boldsymbol\Psi_n,\nu_n),$

where

$\boldsymbol\mu_n = \frac{\lambda\boldsymbol\mu_0 + n \bar{\boldsymbol y}}{\lambda+n}$

$\lambda_n = \lambda + n$

$\nu_n = \nu + n$

$\boldsymbol\Psi_n = \boldsymbol{\Psi + S} +\frac{\lambda n}{\lambda+n}
(\boldsymbol{\bar{y}-\mu_0})(\boldsymbol{\bar{y}-\mu_0})^T
~~~\mathrm{ with }~~\boldsymbol{S}= \sum_{i=1}^{n} (\boldsymbol{y_i-\bar{y}})(\boldsymbol{y_i-\bar{y}})^T$.

To sample from the joint posterior of $(\boldsymbol\mu,\boldsymbol\Sigma)$, one simply draws samples from $\boldsymbol\Sigma|\boldsymbol y \sim \mathcal{W}^{-1}(\boldsymbol\Psi_n,\nu_n)$, then draw $\boldsymbol\mu | \boldsymbol{\Sigma,y} \sim \mathcal{N}_p(\boldsymbol\mu_n,\boldsymbol\Sigma/\lambda_n)$. To draw from the posterior predictive of a new observation, draw $\boldsymbol\tilde{y}|\boldsymbol{\mu,\Sigma,y} \sim \mathcal{N}_p(\boldsymbol\mu,\boldsymbol\Sigma)$ , given the already drawn values of $\boldsymbol\mu$ and $\boldsymbol\Sigma$.

== Generating normal-inverse-Wishart random variates ==
Generation of random variates is straightforward:
1. Sample $\boldsymbol\Sigma$ from an inverse Wishart distribution with parameters $\boldsymbol\Psi$ and $\nu$
2. Sample $\boldsymbol\mu$ from a multivariate normal distribution with mean $\boldsymbol\mu_0$ and variance $\boldsymbol \tfrac{1}{\lambda} \boldsymbol\Sigma$

== Related distributions ==
- The normal-Wishart distribution is essentially the same distribution parameterized by precision rather than variance. If $(\boldsymbol\mu,\boldsymbol\Sigma) \sim \mathrm{NIW}(\boldsymbol\mu_0,\lambda,\boldsymbol\Psi,\nu)$ then $(\boldsymbol\mu,\boldsymbol\Sigma^{-1}) \sim \mathrm{NW}(\boldsymbol\mu_0,\lambda,\boldsymbol\Psi^{-1},\nu)$ .
- The normal-inverse-gamma distribution is the one-dimensional equivalent.
- The multivariate normal distribution and inverse Wishart distribution are the component distributions out of which this distribution is made.
