Two-way analysis of variance
In statistics, the two-way analysis of variance (ANOVA) is an extension of the one-way ANOVA that examines the influence of two different categorical independent variables on one dependent variable. The two-way ANOVA can not only determine the main effect of contributions of each independent variable but also identifies if there is a significant interaction effect between them.
In 1925, Ronald Fisher mentions the two-way ANOVA in his celebrated book from 1925, Statistical Methods for Research Workers (chapters 7 and 8). In 1934, Frank Yates published procedures for the unbalanced case. Since then, an extensive literature has been produced, reviewed in 1993 by Fujikoshi. In 2005, Andrew Gelman proposed a different approach of ANOVA, viewed as a multilevel model.
Assumptions to use two-way ANOVA
As with other parametric tests, we make the following assumptions when using two-way ANOVA:
- The errors of populations from which the samples are obtained must be normally distributed.
- Sampling is done correctly. Observations for within and between groups must be independent.
- The variances among populations must be equal (homoscedastic).
- Data are interval or ratio.
Let us imagine a data set for which a dependent variable may be influenced by two factors (sources of variation). The first factor has levels () and the second has levels (). Each combination defines a treatment, for a total of treatments. We represent the number of replicates for treatment by , and let be the index of the replicate in this treatment ().
From these data, we can build a contingency table, where and , and the total number of replicates is equal to .
The design is balanced if each treatment has the same number of replicates, . In such a case, the design is also said to be orthogonal, allowing to fully distinguish the effects of both factors. We hence can write , and .
Let us denote as the value of the dependent variable of unit which received treatment . The two-way ANOVA model can be written as:
The effect of both factors are explicitly written as:
where is the grand mean, is the additive main effect of level from the first factor (i-th row in the contigency table), is the additive main effect of level from the second factor (j-th column in the contigency table) and is the non-additive interaction effect of treatment from both factors (cell at row i and column j in the contigency table).
To ensure identifiability of parameters, we can add the following "sum-to-zero" constraints:
- Yates, Frank (March 1934). "The analysis of multiple classifications with unequal numbers in the different classes". Journal of the American Statistical Association (American Statistical Association) 29 (185): 51–66. Retrieved 19 June 2014.
- Fujikoshi, Yasunori (1993). "Two-way ANOVA models with unbalanced data". Discrete Mathematics (Elsevier) 116 (1): 315–334. doi:10.1016/0012-365X(93)90410-U.
- Gelman, Andrew (February 2005). "Analysis of variance? why it is more important than ever". The Annals of Statistics 33 (1): 1–53. doi:10.1214/009053604000001048.
- Yi-An Ko et al. (September 2013). "Novel Likelihood Ratio Tests for Screening Gene-Gene and Gene-Environment Interactions with Unbalanced Repeated-Measures Data". Genetic epidemiology 37 (6): 581–591. doi:10.1002/gepi.21744.