In multivariate statistics, if ${\displaystyle \varepsilon }$ is a vector of ${\displaystyle n}$ random variables, and ${\displaystyle \Lambda }$ is an ${\displaystyle n}$-dimensional symmetric matrix, then the scalar quantity ${\displaystyle \varepsilon ^{T}\Lambda \varepsilon }$ is known as a quadratic form in ${\displaystyle \varepsilon }$.

## Expectation

It can be shown that[1]

${\displaystyle \operatorname {E} \left[\varepsilon ^{T}\Lambda \varepsilon \right]=\operatorname {tr} \left[\Lambda \Sigma \right]+\mu ^{T}\Lambda \mu }$

where ${\displaystyle \mu }$ and ${\displaystyle \Sigma }$ are the expected value and variance-covariance matrix of ${\displaystyle \varepsilon }$, respectively, and tr denotes the trace of a matrix. This result only depends on the existence of ${\displaystyle \mu }$ and ${\displaystyle \Sigma }$; in particular, normality of ${\displaystyle \varepsilon }$ is not required.

A book treatment of the topic of quadratic forms in random variables is that of Mathai and Provost.[2]

### Proof

Since the quadratic form is a scalar quantity, ${\displaystyle \varepsilon ^{T}\Lambda \varepsilon =\operatorname {tr} (\varepsilon ^{T}\Lambda \varepsilon )}$.

Next, by the cyclic property of the trace operator,

${\displaystyle \operatorname {E} [\operatorname {tr} (\varepsilon ^{T}\Lambda \varepsilon )]=\operatorname {E} [\operatorname {tr} (\Lambda \varepsilon \varepsilon ^{T})].}$

Since the trace operator is a linear combination of the components of the matrix, it therefore follows from the linearity of the expectation operator that

${\displaystyle \operatorname {E} [\operatorname {tr} (\Lambda \varepsilon \varepsilon ^{T})]=\operatorname {tr} (\Lambda \operatorname {E} (\varepsilon \varepsilon ^{T})).}$

A standard property of variances then tells us that this is

${\displaystyle \operatorname {tr} (\Lambda (\Sigma +\mu \mu ^{T})).}$

Applying the cyclic property of the trace operator again, we get

${\displaystyle \operatorname {tr} (\Lambda \Sigma )+\operatorname {tr} (\Lambda \mu \mu ^{T})=\operatorname {tr} (\Lambda \Sigma )+\operatorname {tr} (\mu ^{T}\Lambda \mu )=\operatorname {tr} (\Lambda \Sigma )+\mu ^{T}\Lambda \mu .}$

## Variance in the Gaussian case

In general, the variance of a quadratic form depends greatly on the distribution of ${\displaystyle \varepsilon }$. However, if ${\displaystyle \varepsilon }$ does follow a multivariate normal distribution, the variance of the quadratic form becomes particularly tractable. Assume for the moment that ${\displaystyle \Lambda }$ is a symmetric matrix. Then,

${\displaystyle \operatorname {var} \left[\varepsilon ^{T}\Lambda \varepsilon \right]=2\operatorname {tr} \left[\Lambda \Sigma \Lambda \Sigma \right]+4\mu ^{T}\Lambda \Sigma \Lambda \mu }$.[3]

In fact, this can be generalized to find the covariance between two quadratic forms on the same ${\displaystyle \varepsilon }$ (once again, ${\displaystyle \Lambda _{1}}$ and ${\displaystyle \Lambda _{2}}$ must both be symmetric):

${\displaystyle \operatorname {cov} \left[\varepsilon ^{T}\Lambda _{1}\varepsilon ,\varepsilon ^{T}\Lambda _{2}\varepsilon \right]=2\operatorname {tr} \left[\Lambda _{1}\Sigma \Lambda _{2}\Sigma \right]+4\mu ^{T}\Lambda _{1}\Sigma \Lambda _{2}\mu }$.[4]

In addition, a quadratic form such as this follows a generalized chi-squared distribution.

### Computing the variance in the non-symmetric case

The case for general ${\displaystyle \Lambda }$ can be derived by noting that

${\displaystyle \varepsilon ^{T}\Lambda ^{T}\varepsilon =\varepsilon ^{T}\Lambda \varepsilon }$

so

${\displaystyle \varepsilon ^{T}{\tilde {\Lambda }}\varepsilon =\varepsilon ^{T}\left(\Lambda +\Lambda ^{T}\right)\varepsilon /2}$

is a quadratic form in the symmetric matrix ${\displaystyle {\tilde {\Lambda }}=\left(\Lambda +\Lambda ^{T}\right)/2}$, so the mean and variance expressions are the same, provided ${\displaystyle \Lambda }$ is replaced by ${\displaystyle {\tilde {\Lambda }}}$ therein.

In the setting where one has a set of observations ${\displaystyle y}$ and an operator matrix ${\displaystyle H}$, then the residual sum of squares can be written as a quadratic form in ${\displaystyle y}$:

${\displaystyle {\textrm {RSS}}=y^{T}(I-H)^{T}(I-H)y.}$

For procedures where the matrix ${\displaystyle H}$ is symmetric and idempotent, and the errors are Gaussian with covariance matrix ${\displaystyle \sigma ^{2}I}$, ${\displaystyle {\textrm {RSS}}/\sigma ^{2}}$ has a chi-squared distribution with ${\displaystyle k}$ degrees of freedom and noncentrality parameter ${\displaystyle \lambda }$, where

${\displaystyle k=\operatorname {tr} \left[(I-H)^{T}(I-H)\right]}$
${\displaystyle \lambda =\mu ^{T}(I-H)^{T}(I-H)\mu /2}$

may be found by matching the first two central moments of a noncentral chi-squared random variable to the expressions given in the first two sections. If ${\displaystyle Hy}$ estimates ${\displaystyle \mu }$ with no bias, then the noncentrality ${\displaystyle \lambda }$ is zero and ${\displaystyle {\textrm {RSS}}/\sigma ^{2}}$ follows a central chi-squared distribution.