Multivariate analysis of covariance

(Redirected from MANCOVA)

Multivariate analysis of covariance (MANCOVA) is an extension of analysis of covariance (ANCOVA) methods to cover cases where there is more than one dependent variable and where the control of concomitant continuous independent variables – covariates – is required. The most prominent benefit of the MANCOVA design over the simple MANOVA is the 'factoring out' of noise or error that has been introduced by the covariant.[1] A commonly used multivariate version of the ANOVA F-statistic is Wilks' Lambda (Λ), which represents the ratio between the error variance (or covariance) and the effect variance (or covariance).[1]

Goals of MANCOVA

Similarly to all tests in the ANOVA family, the primary aim of the MANCOVA is to test for significant differences between group means.[1]The process of characterising a covariate in a data source allows the reduction of the magnitude of the error term, represented in the MANCOVA design as MSerror. Subsequently, the overall Wilks' Lambda will become larger and more likely to be characterised as significant.[1] This grants the researcher more statistical power to detect differences within the data. The multivariate aspect of the MANCOVA allows the characterisation of differences in group means in regards to a linear combination of multiple dependent variables, while simultaneously controlling for covariates.

Example situation where MANCOVA is appropriate: Suppose a scientist is interested in testing two new drugs for their effects on depression and anxiety scores. Also suppose that the scientist has information pertaining to the overall responsivity to drugs for each patient; accounting for this covariate will grant the test higher sensitivity in determining the effects of each drug on both dependent variables.

Assumptions

Certain assumptions must be met for the MANCOVA to be used appropriately:

1. Normality: For each group, each dependent variable must represent a normal distribution of scores. Furthermore, any linear combination of dependent variables must be normally distributed. Transformation or removal of outliers can help ensure this assumption is met.[2] Violation of this assumption may lead to an increase in Type I error rates.[3]
2. Independence of observations: Each observation must be independent of all other observations; this assumption can be met by employing random sampling techniques. Violation of this assumption may lead to an increase in Type I error rates.[3]
3. Homogeneity of variances: Each dependent variable must demonstrate similar levels of variance across each independent variable. Violation of this assumption can be conceptualised as a correlation existing between the variances and the means of dependent variables. This violation is often called 'heteroscedasticity'[4] and can be tested for using Levene's test.[5]
4. Homogeneity of covariances: The intercorrelation matrix between dependent variables must be equal across all levels of the independent variable. Violation of this assumption may lead to an increase in Type I error rates as well as decreased statistical power.[3]

Logic of MANOVA

Analogous to ANOVA, MANOVA is based on the product of model variance matrix, ${\displaystyle \Sigma _{\text{model}}}$ and inverse of the error variance matrix, ${\displaystyle \Sigma _{\text{res}}^{-1}}$, or ${\displaystyle A=\Sigma _{\text{model}}\times \Sigma _{\text{res}}^{-1}}$. The hypothesis that ${\displaystyle \Sigma _{\text{model}}=\Sigma _{\text{residual}}}$ implies that the product ${\displaystyle A\sim I}$.[6] Invariance considerations imply the MANOVA statistic should be a measure of magnitude of the singular value decomposition of this matrix product, but there is no unique choice owing to the multi-dimensional nature of the alternative hypothesis.

The most common[7][8] statistics are summaries based on the roots (or eigenvalues) ${\displaystyle \lambda _{p}}$ of the ${\displaystyle A}$ matrix:

${\displaystyle \Lambda _{\text{Wilks}}=\prod _{1\ldots p}{\frac {1}{1+\lambda _{p}}}=\det(I+A)^{-1}={\frac {\det(\Sigma _{\text{res}})}{\det(\Sigma _{\text{res}}+\Sigma _{\text{model}})}}}$
distributed as lambda (Λ)
${\displaystyle \Lambda _{\text{Pillai}}=\sum _{1\ldots p}{\frac {1}{1+\lambda _{p}}}=\operatorname {tr} ((I+A)^{-1})}$
${\displaystyle \Lambda _{\text{LH}}=\sum _{1\ldots p}\lambda _{p}=\operatorname {tr} (A)}$
${\displaystyle \Lambda _{\text{Roy}}=\max _{p}(\lambda _{p})=\|A\|_{\infty }}$

Covariates

In statistics, a covariate represents a source of variation that has not been controlled in the experiment and is believed to affect the dependent variable.[9] The aim of such techniques as ANCOVA is to remove the effects of such uncontrolled variation, in order to increase statistical power and to ensure an accurate measurement of the true relationship between independent and dependent variables.[9]

An example is provided by the analysis of trend in sea-level by Woodworth (1987). Here the dependent variable (and variable of most interest) was the annual mean sea level at a given location for which a series of yearly values were available. The primary independent variable was "time". Use was made of a "covariate" consisting of yearly values of annual mean atmospheric pressure at sea level. The results showed that inclusion of the covariate allowed improved estimates of the trend against time to be obtained, compared to analyses which omitted the covariate.