Random effects model
This article needs attention from an expert in statistics.(January 2011)
|Part of a series on|
In statistics, a random effects model, also called a variance components model, is a statistical model where the model parameters are random variables. It is a kind of hierarchical linear model, which assumes that the data being analysed are drawn from a hierarchy of different populations whose differences relate to that hierarchy. In econometrics, random effects models are used in panel analysis of hierarchical or panel data when one assumes no fixed effects (it allows for individual effects). A random effects model is a special case of a mixed model.
Contrast this to the biostatistics definitions, as biostatisticians use "fixed" and "random" effects to respectively refer to the population-average and subject-specific effects (and where the latter are generally assumed to be unknown, latent variables).
Random effect models assist in controlling for unobserved heterogeneity when the heterogeneity is constant over time and not correlated with independent variables. This constant can be removed from longitudinal data through differencing, since taking a first difference will remove any time invariant components of the model.
Two common assumptions can be made about the individual specific effect: the random effects assumption and the fixed effects assumption. The random effects assumption is that the individual unobserved heterogeneity is uncorrelated with the independent variables. The fixed effect assumption is that the individual specific effect is correlated with the independent variables.
If the random effects assumption holds, the random effects estimator is more efficient than the fixed effects model. However, if this assumption does not hold, the random effects estimator is not consistent.
Suppose m large elementary schools are chosen randomly from among thousands in a large country. Suppose also that n pupils of the same age are chosen randomly at each selected school. Their scores on a standard aptitude test are ascertained. Let Yij be the score of the jth pupil at the ith school. A simple way to model the relationships of these quantities is
where μ is the average test score for the entire population. In this model Ui is the school-specific random effect: it measures the difference between the average score at school i and the average score in the entire country. The term Wij is the individual-specific random effect, i.e., it's the deviation of the j-th pupil’s score from the average for the i-th school.
The model can be augmented by including additional explanatory variables, which would capture differences in scores among different groups. For example:
where Sexij is the dummy variable for boys/girls and ParentsEducij records, say, the average education level of a child’s parents. This is a mixed model, not a purely random effects model, as it introduces fixed-effects terms for Sex and Parents' Education.
The variance of Yij is the sum of the variances τ2 and σ2 of Ui and Wij respectively.
be the average, not of all scores at the ith school, but of those at the ith school that are included in the random sample. Let
be the grand average.
be respectively the sum of squares due to differences within groups and the sum of squares due to difference between groups. Then it can be shown that
The τ2 parameter is also called the Intraclass correlation coefficient.
In general, random effects are efficient, and should be used (over fixed effects) if the assumptions underlying them are believed to be satisfied. For random effects to work in the school example it is necessary that the school-specific effects be uncorrelated to the other covariates of the model. This can be tested by running fixed effects, then random effects, and doing a Hausman specification test. If the test rejects, then random effects is biased and fixed effects is the correct estimation procedure.
- Bühlmann model
- Hierarchical linear modeling
- Fixed effects
- Covariance estimation
- Conditional variance
- Baltagi, Badi H. (2008). Econometric Analysis of Panel Data (4th ed.). New York, NY: Wiley. pp. 17–22. ISBN 978-0-470-51886-1.
- Hsiao, Cheng (2003). Analysis of Panel Data (2nd ed.). New York, NY: Cambridge University Press. pp. 73–92. ISBN 0-521-52271-4.
- Wooldridge, Jeffrey M. (2002). Econometric Analysis of Cross Section and Panel Data. Cambridge, MA: MIT Press. pp. 257–265. ISBN 0-262-23219-7.
- Diggle, Peter J.; Heagerty, Patrick; Liang, Kung-Yee; Zeger, Scott L. (2002). Analysis of Longitudinal Data (2nd ed.). Oxford University Press. pp. 169–171. ISBN 0-19-852484-6.
- Fitzmaurice, Garrett M.; Laird, Nan M.; Ware, James H. (2004). Applied Longitudinal Analysis. Hoboken: John Wiley & Sons. pp. 326–328. ISBN 0-471-21487-6.
- Laird, Nan M.; Ware, James H. (1982). "Random-Effects Models for Longitudinal Data". Biometrics. 38 (4): 963–974. doi:10.2307/2529876. JSTOR 2529876.
- Gardiner, Joseph C.; Luo, Zhehui; Roman, Lee Anne (2009). "Fixed effects, random effects and GEE: What are the differences?". Statistics in Medicine. 28 (2): 221–239. doi:10.1002/sim.3478. PMID 19012297.
- Wooldridge, Jeffrey (2010). Econometric analysis of cross section and panel data (2nd ed.). Cambridge, Mass.: MIT Press. p. 252. ISBN 9780262232586. OCLC 627701062.