# Fixed effects model

In statistics, a fixed effects model is a statistical model in which the model parameters are fixed or non-random quantities. This is in contrast to random effects models and mixed models in which all or some of the model parameters are considered as random variables. In many applications including econometrics[1] and biostatistics[2][3][4][5] a fixed effects model refers to a regression model in which the group means are fixed (non-random) as opposed to a random effects model in which the group means are a random sample from a population.[6] Generally, data can be grouped according to several observed factors. The group means could be modeled as fixed or random effects for each grouping. In a fixed effects model each group mean is a group-specific fixed quantity.

In panel data where longitudinal observations exist for the same subject, fixed effects represent the subject-specific means. In panel data analysis the term fixed effects estimator (also known as the within estimator) is used to refer to an estimator for the coefficients in the regression model including those fixed effects (one time-invariant intercept for each subject).

## Qualitative description

Such models assist in controlling for unobserved heterogeneity when this heterogeneity is constant over time. This heterogeneity can be removed from the data through differencing, for example by taking a first difference which will remove any time invariant components of the model.

There are two common assumptions made about the individual specific effect, the random effects assumption and the fixed effects assumption. The random effects assumption (made in a random effects model) is that the individual-specific effects are uncorrelated with the independent variables. The fixed effect assumption is that the individual-specific effects are correlated with the independent variables. If the random effects assumption holds, the random effects model is more efficient than the fixed effects model. However, if this assumption does not hold, the random effects model is not consistent. The Durbin–Wu–Hausman test is often used to discriminate between the fixed and the random effects model.[7][8]

## Formal description

### Classical Representation

Consider the linear unobserved effects model for ${\displaystyle N}$ observations and ${\displaystyle T}$ time periods:

${\displaystyle y_{it}=X_{it}\mathbf {\beta } +\alpha _{i}+u_{it}}$ for ${\displaystyle t=1,..,T}$ and ${\displaystyle i=1,...,N}$

Where:

• ${\displaystyle y_{it}}$ is the dependent variable observed for individual ${\displaystyle i}$ at time ${\displaystyle t}$.
• ${\displaystyle X_{it}}$ is the time-variant ${\displaystyle 1\times k}$ (the number of independent variables) regressor vector.
• ${\displaystyle \beta }$ is the ${\displaystyle k\times 1}$ matrix of parameters.
• ${\displaystyle \alpha _{i}}$ is the unobserved time-invariant individual effect. For example, the innate ability for individuals or historical and institutional factors for countries.
• ${\displaystyle u_{it}}$ is the error term.

Unlike ${\displaystyle X_{it}}$, ${\displaystyle \alpha _{i}}$ cannot be directly observed.

Unlike the random effects model where the unobserved ${\displaystyle \alpha _{i}}$ is independent of ${\displaystyle X_{it}}$ for all ${\displaystyle t=1,...,T}$, the fixed effects (FE) model allows ${\displaystyle \alpha _{i}}$ to be correlated with the regressor matrix ${\displaystyle X_{it}}$. Strict exogeneity with respect to the idiosyncratic error term ${\displaystyle u_{it}}$, however, is still required.

Since ${\displaystyle \alpha _{i}}$ is not observable, it cannot be directly controlled for. The FE model eliminates ${\displaystyle \alpha _{i}}$ by demeaning the variables using the within transformation:

${\displaystyle y_{it}-{\overline {y_{i}}}=\left(X_{it}-{\overline {X_{i}}}\right)\beta +\left(\alpha _{i}-{\overline {\alpha _{i}}}\right)+\left(u_{it}-{\overline {u_{i}}}\right)\implies {\ddot {y_{it}}}={\ddot {X_{it}}}\beta +{\ddot {u_{it}}}}$

where ${\displaystyle {\overline {X_{i}}}={\frac {1}{T}}\sum \limits _{t=1}^{T}X_{it}}$ and ${\displaystyle {\overline {u_{i}}}={\frac {1}{T}}\sum \limits _{t=1}^{T}u_{it}}$.

Since ${\displaystyle \alpha _{i}}$ is constant, ${\displaystyle {\overline {\alpha _{i}}}=\alpha _{i}}$ and hence the effect is eliminated. The FE estimator ${\displaystyle {\hat {\beta }}_{FE}}$ is then obtained by an OLS regression of ${\displaystyle {\ddot {y}}}$ on ${\displaystyle {\ddot {X}}}$.

At least three alternatives to the within transformation exist with variations.

One is to add a dummy variable for each individual ${\displaystyle i>1}$ (omitting the first individual because of multicollinearity). This is numerically, but not computationally, equivalent to the fixed effect model and only works if the sum of the number of series and the number of global parameters is smaller than the number of observations.[9] The dummy variable approach is particularly demanding with respect to computer memory usage and it is not recommended for problems larger than the available RAM, and the applied program compilation, can accommodate.

Second alternative is to use consecutive reiterations approach to local and global estimations.[10] This approach is very suitable for low memory systems on which it is much more computationally efficient than the dummy variable approach.

The third approach is a nested estimation whereby the local estimation for individual series is programmed in as a part of the model definition.[11] This approach is the most computationally and memory efficient, but it requires proficient programming skills and access to the model programming code; although, it can be programmed even in SAS.[12][13]

Finally, each of the above alternatives can be improved if the series-specific estimation is linear (within a nonlinear model), in which case the direct linear solution for individual series can be programmed in as part of the nonlinear model definition.[14]

### Generalization with input uncertainty

When there is input uncertainty for the ${\displaystyle y}$ data, ${\displaystyle \delta y}$, then the ${\displaystyle \chi ^{2}}$ value, rather than the sum of squared residuals, should be minimized.[15] This can be directly achieved from substitution rules:

${\displaystyle {\frac {y_{it}}{\delta y_{it}}}=\mathbf {\beta } {\frac {X_{it}}{\delta y_{it}}}+\alpha _{i}{\frac {1}{\delta y_{it}}}+{\frac {u_{it}}{\delta y_{it}}}}$,

then the values and standard deviations for ${\displaystyle \mathbf {\beta } }$ and ${\displaystyle \alpha _{i}}$ can be determined via classical ordinary least squares analysis and variance-covariance matrix.

## Equality of Fixed Effects (FE) and First Differences (FD) estimators when T=2

For the special two period case (${\displaystyle T=2}$), the FE estimator and the FD estimator are numerically equivalent. This is because the FE estimator effectively "doubles the data set" used in the FD estimator. To see this, establish that the fixed effects estimator is: ${\displaystyle {FE}_{T=2}=\left[(x_{i1}-{\bar {x}}_{i})(x_{i1}-{\bar {x}}_{i})'+(x_{i2}-{\bar {x}}_{i})(x_{i2}-{\bar {x}}_{i})'\right]^{-1}\left[(x_{i1}-{\bar {x}}_{i})(y_{i1}-{\bar {y}}_{i})+(x_{i2}-{\bar {x}}_{i})(y_{i2}-{\bar {y}}_{i})\right]}$

Since each ${\displaystyle (x_{i1}-{\bar {x}}_{i})}$ can be re-written as ${\displaystyle (x_{i1}-{\dfrac {x_{i1}+x_{i2}}{2}})={\dfrac {x_{i1}-x_{i2}}{2}}}$, we'll re-write the line as:

${\displaystyle {FE}_{T=2}=\left[\sum _{i=1}^{N}{\dfrac {x_{i1}-x_{i2}}{2}}{\dfrac {x_{i1}-x_{i2}}{2}}'+{\dfrac {x_{i2}-x_{i1}}{2}}{\dfrac {x_{i2}-x_{i1}}{2}}'\right]^{-1}\left[\sum _{i=1}^{N}{\dfrac {x_{i1}-x_{i2}}{2}}{\dfrac {y_{i1}-y_{i2}}{2}}+{\dfrac {x_{i2}-x_{i1}}{2}}{\dfrac {y_{i2}-y_{i1}}{2}}\right]}$

${\displaystyle =\left[\sum _{i=1}^{N}2{\dfrac {x_{i2}-x_{i1}}{2}}{\dfrac {x_{i2}-x_{i1}}{2}}'\right]^{-1}\left[\sum _{i=1}^{N}2{\dfrac {x_{i2}-x_{i1}}{2}}{\dfrac {y_{i2}-y_{i1}}{2}}\right]}$
${\displaystyle =2\left[\sum _{i=1}^{N}(x_{i2}-x_{i1})(x_{i2}-x_{i1})'\right]^{-1}\left[\sum _{i=1}^{N}{\frac {1}{2}}(x_{i2}-x_{i1})(y_{i2}-y_{i1})\right]}$
${\displaystyle =\left[\sum _{i=1}^{N}(x_{i2}-x_{i1})(x_{i2}-x_{i1})'\right]^{-1}\sum _{i=1}^{N}(x_{i2}-x_{i1})(y_{i2}-y_{i1})={FD}_{T=2}}$

## Hausman–Taylor method

Need to have more than one time-variant regressor (${\displaystyle X}$) and time-invariant regressor (${\displaystyle Z}$) and at least one ${\displaystyle X}$ and one ${\displaystyle Z}$ that are uncorrelated with ${\displaystyle \alpha _{i}}$.

Partition the ${\displaystyle X}$ and ${\displaystyle Z}$ variables such that ${\displaystyle {\begin{array}{c}X=[{\underset {TN\times K1}{X_{1it}}}\vdots {\underset {TN\times K2}{X_{2it}}}]\\Z=[{\underset {TN\times G1}{Z_{1it}}}\vdots {\underset {TN\times G2}{Z_{2it}}}]\end{array}}}$ where ${\displaystyle X_{1}}$ and ${\displaystyle Z_{1}}$ are uncorrelated with ${\displaystyle \alpha _{i}}$. Need ${\displaystyle K1>G2}$.

Estimating ${\displaystyle \gamma }$ via OLS on ${\displaystyle {\widehat {di}}=Z_{i}\gamma +\varphi _{it}}$ using ${\displaystyle X_{1}}$ and ${\displaystyle Z_{1}}$ as instruments yields a consistent estimate.

## Testing fixed effects (FE) vs. random effects (RE)

We can test whether a fixed or random effects model is appropriate using a Durbin–Wu–Hausman test.

${\displaystyle H_{0}}$: ${\displaystyle \alpha _{i}\perp X_{it},Z_{i}}$
${\displaystyle H_{a}}$: ${\displaystyle \alpha _{i}\not \perp X_{it},Z_{i}}$

If ${\displaystyle H_{0}}$ is true, both ${\displaystyle {\widehat {\beta }}_{RE}}$ and ${\displaystyle {\widehat {\beta }}_{FE}}$ are consistent, but only ${\displaystyle {\widehat {\beta }}_{RE}}$ is efficient. If ${\displaystyle H_{a}}$ is true, ${\displaystyle {\widehat {\beta }}_{FE}}$ is consistent and ${\displaystyle {\widehat {\beta }}_{RE}}$ is not.

${\displaystyle {\widehat {Q}}=}$ ${\displaystyle {\widehat {\beta }}_{RE}-{\widehat {\beta }}_{FE}}$
${\displaystyle {\widehat {HT}}=T{\widehat {Q}}^{\prime }[Var({\widehat {\beta }}_{FE})-Var({\widehat {\beta }}_{RE})]^{-1}{\widehat {Q}}\sim \chi _{K}^{2}}$ where ${\displaystyle K=\dim(Q)}$

The Hausman test is a specification test so a large test statistic might be indication that there might be errors-in-variables (EIV) or our model is misspecified. If the FE assumption is true, we should find that ${\displaystyle {\widehat {\beta }}_{LD}\approx {\widehat {\beta }}_{FD}\approx {\widehat {\beta }}_{FE}}$.

A simple heuristic is that if ${\displaystyle \left\vert {\widehat {\beta }}_{LD}\right\vert >\left\vert {\widehat {\beta }}_{FE}\right\vert >\left\vert {\widehat {\beta }}_{FD}\right\vert }$ there could be EIV.

## Steps in Fixed Effects Model for sample data

1. Calculate group and grand means
2. Calculate k=number of groups, n=number of observations per group, N=total number of observations (k x n)
3. Calculate SS-total (or total variance) as: (Each score - Grand mean)^2 then summed
4. Calculate SS-treat (or treatment effect) as: (Each group mean- Grand mean)^2 then summed x n
5. Calculate SS-error (or error effect) as (Each score - Its group mean)^2 then summed
6. Calculate df-total: N-1, df-treat: k-1 and df-error k(n-1)
7. Calculate Mean Square MS-treat: SS-treat/df-treat, then MS-error: SS-error/df-error
8. Calculate obtained f value: MS-treat/MS-error
9. Use F-table or probability function, to look up critical f value with a certain significance level
10. Conclude as to whether treatment effect significantly affects the variable of interest

## Notes

1. ^ Greene, W.H., 2011. Econometric Analysis, 7th ed., Prentice Hall
2. ^ Diggle, Peter J.; Heagerty, Patrick; Liang, Kung-Yee; Zeger, Scott L. (2002). Analysis of Longitudinal Data (2nd ed.). Oxford University Press. pp. 169–171. ISBN 0-19-852484-6.
3. ^ Fitzmaurice, Garrett M.; Laird, Nan M.; Ware, James H. (2004). Applied Longitudinal Analysis. Hoboken: John Wiley & Sons. pp. 326–328. ISBN 0-471-21487-6.
4. ^ Laird, Nan M.; Ware, James H. (1982). "Random-Effects Models for Longitudinal Data". Biometrics. 38 (4): 963–974. JSTOR 2529876.
5. ^ Gardiner, Joseph C.; Luo, Zhehui; Roman, Lee Anne (2009). "Fixed effects, random effects and GEE: What are the differences?". Statistics in Medicine. 28: 221–239. doi:10.1002/sim.3478.
6. ^ Ramsey, F., Schafer, D., 2002. The Statistical Sleuth: A Course in Methods of Data Analysis, 2nd ed. Duxbury Press
7. ^ Cameron, A. Colin; Trivedi, Pravin K. (2005). Microeconometrics: Methods and Applications. Cambridge University Press. pp. 717–19.
8. ^ Nerlove, Marc (2005). Essays in Panel Data Econometrics. Cambridge University Press. pp. 36–39.
9. ^ Garcia, Oscar. (1983). "A stochastic differential equation model for the height growth of forest stands". Biometrics: 1059–1072.
10. ^ Tait, David; Cieszewski, Chris J.; Bella, Imre E. (1986). "The stand dynamics of lodgepole pine". Can. J. For. Res. 18: 1255–1260.
11. ^ Strub, Mike; Cieszewski, Chris J. (2006). "Base–age invariance properties of two techniques for estimating the parameters of site index models". Forest Science. 52 (2): 182–186.
12. ^ Strub, Mike; Cieszewski, Chris J. (2003). "Fitting global site index parameters when plot or tree site index is treated as a local nuisance parameter In: Burkhart HA, editor. Proceedings of the Symposium on Statistics and Information Technology in Forestry; 2002 September 8–12; Blacksburg, Virginia: Virginia Polytechnic Institute and State University": 97–107.
13. ^ Cieszewski, Chris J.; Harrison, Mike; Martin, Stacey W. (2000). "Practical methods for estimating non-biased parameters in self-referencing growth and yield models" (PDF). PMRC Technical Report. 2000 (7): 12.
14. ^ Schnute, Jon; McKinnell, Skip (1984). "A biologically meaningful approach to response surface analysis". Can. J. Fish. Aquat. Sci. 41: 936–953.
15. ^ Ren, Bin; Dong, Ruobing; Esposito, Thomas M.; Pueyo, Laurent; Debes, John H.; Poteet, Charles A.; Choquet, Élodie; Benisty, Myriam; Chiang, Eugene; Grady, Carol A.; Hines, Dean C.; Schneider, Glenn; Soummer, Rémi (2018). "A Decade of MWC 758 Disk Images: Where Are the Spiral-Arm-Driving Planets?". The Astrophysical Journal Letters. 857: L9. arXiv:1803.06776. Bibcode:2018ApJ...857L...9R. doi:10.3847/2041-8213/aab7f5.

## References

• Christensen, Ronald (2002). Plane Answers to Complex Questions: The Theory of Linear Models (Third ed.). New York: Springer. ISBN 0-387-95361-2.
• Gujarati, Damodar N.; Porter, Dawn C. (2009). "Panel Data Regression Models". Basic Econometrics (Fifth international ed.). Boston: McGraw-Hill. pp. 591–616. ISBN 978-007-127625-2.
• Hsiao, Cheng (2003). "Fixed-effects models". Analysis of Panel Data (2nd ed.). New York: Cambridge University Press. pp. 95–103. ISBN 0-521-52271-4.
• Wooldridge, Jeffrey M. (2013). "Fixed Effects Estimation". Introductory Econometrics: A Modern Approach (Fifth international ed.). Mason, OH: South-Western. pp. 466–474. ISBN 978-1-111-53439-4.