In multivariate statistics, if $\varepsilon$ is a vector of $n$ random variables, and $\Lambda$ is an $n$ -dimensional symmetric matrix, then the scalar quantity $\varepsilon ^{T}\Lambda \varepsilon$ is known as a quadratic form in $\varepsilon$ .

## Expectation

It can be shown that

$\operatorname {E} \left[\varepsilon ^{T}\Lambda \varepsilon \right]=\operatorname {tr} \left[\Lambda \Sigma \right]+\mu ^{T}\Lambda \mu$ where $\mu$ and $\Sigma$ are the expected value and variance-covariance matrix of $\varepsilon$ , respectively, and tr denotes the trace of a matrix. This result only depends on the existence of $\mu$ and $\Sigma$ ; in particular, normality of $\varepsilon$ is not required.

A book treatment of the topic of quadratic forms in random variables is that of Mathai and Provost.

### Proof

Since the quadratic form is a scalar quantity, $\varepsilon ^{T}\Lambda \varepsilon =\operatorname {tr} (\varepsilon ^{T}\Lambda \varepsilon )$ .

Next, by the cyclic property of the trace operator,

$\operatorname {E} [\operatorname {tr} (\varepsilon ^{T}\Lambda \varepsilon )]=\operatorname {E} [\operatorname {tr} (\Lambda \varepsilon \varepsilon ^{T})].$ Since the trace operator is a linear combination of the components of the matrix, it therefore follows from the linearity of the expectation operator that

$\operatorname {E} [\operatorname {tr} (\Lambda \varepsilon \varepsilon ^{T})]=\operatorname {tr} (\Lambda \operatorname {E} (\varepsilon \varepsilon ^{T})).$ A standard property of variances then tells us that this is

$\operatorname {tr} (\Lambda (\Sigma +\mu \mu ^{T})).$ Applying the cyclic property of the trace operator again, we get

$\operatorname {tr} (\Lambda \Sigma )+\operatorname {tr} (\Lambda \mu \mu ^{T})=\operatorname {tr} (\Lambda \Sigma )+\operatorname {tr} (\mu ^{T}\Lambda \mu )=\operatorname {tr} (\Lambda \Sigma )+\mu ^{T}\Lambda \mu .$ ## Variance in the Gaussian case

In general, the variance of a quadratic form depends greatly on the distribution of $\varepsilon$ . However, if $\varepsilon$ does follow a multivariate normal distribution, the variance of the quadratic form becomes particularly tractable. Assume for the moment that $\Lambda$ is a symmetric matrix. Then,

$\operatorname {var} \left[\varepsilon ^{T}\Lambda \varepsilon \right]=2\operatorname {tr} \left[\Lambda \Sigma \Lambda \Sigma \right]+4\mu ^{T}\Lambda \Sigma \Lambda \mu$ .

In fact, this can be generalized to find the covariance between two quadratic forms on the same $\varepsilon$ (once again, $\Lambda _{1}$ and $\Lambda _{2}$ must both be symmetric):

$\operatorname {cov} \left[\varepsilon ^{T}\Lambda _{1}\varepsilon ,\varepsilon ^{T}\Lambda _{2}\varepsilon \right]=2\operatorname {tr} \left[\Lambda _{1}\Sigma \Lambda _{2}\Sigma \right]+4\mu ^{T}\Lambda _{1}\Sigma \Lambda _{2}\mu$ .

In addition, a quadratic form such as this follows a generalized chi-squared distribution.

### Computing the variance in the non-symmetric case

Some texts incorrectly[citation needed] state that the above variance or covariance results hold without requiring $\Lambda$ to be symmetric. The case for general $\Lambda$ can be derived by noting that

$\varepsilon ^{T}\Lambda ^{T}\varepsilon =\varepsilon ^{T}\Lambda \varepsilon$ so

$\varepsilon ^{T}{\tilde {\Lambda }}\varepsilon =\varepsilon ^{T}\left(\Lambda +\Lambda ^{T}\right)\varepsilon /2$ is a quadratic form in the symmetric matrix ${\tilde {\Lambda }}=\left(\Lambda +\Lambda ^{T}\right)/2$ , so the mean and variance expressions are the same, provided $\Lambda$ is replaced by ${\tilde {\Lambda }}$ therein.

In the setting where one has a set of observations $y$ and an operator matrix $H$ , then the residual sum of squares can be written as a quadratic form in $y$ :
${\textrm {RSS}}=y^{T}(I-H)^{T}(I-H)y.$ For procedures where the matrix $H$ is symmetric and idempotent, and the errors are Gaussian with covariance matrix $\sigma ^{2}I$ , ${\textrm {RSS}}/\sigma ^{2}$ has a chi-squared distribution with $k$ degrees of freedom and noncentrality parameter $\lambda$ , where
$k=\operatorname {tr} \left[(I-H)^{T}(I-H)\right]$ $\lambda =\mu ^{T}(I-H)^{T}(I-H)\mu /2$ may be found by matching the first two central moments of a noncentral chi-squared random variable to the expressions given in the first two sections. If $Hy$ estimates $\mu$ with no bias, then the noncentrality $\lambda$ is zero and ${\textrm {RSS}}/\sigma ^{2}$ follows a central chi-squared distribution.