# Blinder–Oaxaca decomposition

The Blinder–Oaxaca decomposition is a statistical method that explains the difference in the means of a dependent variable between two groups by decomposing the gap into that part that is due to differences in the mean values of the independent variable within the groups, on the one hand, and group differences in the effects of the independent variable, on the other hand. The method was introduced by sociologist and demographer Evelyn M. Kitagawa in 1955.[1] Ronald Oaxaca introduced this method in economics in his doctoral thesis at Princeton University and eventually published in 1973.[2] The decomposition technique also carries the name of Alan Blinder who proposed a similar approach in the same year.[3] Oaxaca's original research question was the wage differential between two different groups of workers, but his method has since been applied to numerous other topics.[4]

## Method

The following three equations illustrate this decomposition. Estimate separate linear wage regressions for individuals i in groups A and B:

{\displaystyle {\begin{aligned}(1)\qquad \ln({\text{wages}}_{A_{i}})&=X_{A_{i}}\beta _{A}+\mu _{A_{i}}\\(2)\qquad \ln({\text{wages}}_{B_{i}})&=X_{B_{i}}\beta _{B}+\mu _{B_{i}}\end{aligned}}}

where Χ is a vector of explanatory variables such as education, experience, industry, and occupation, βA and βB are vectors of coefficients and μ is an error term.

Let bA and bB be respectively the regression estimates of βA and βB. Then, since the average value of residuals in a linear regression is zero, we have:

{\displaystyle {\begin{aligned}(3)\qquad &\operatorname {mean} (\ln({\text{wages}}_{A}))-\operatorname {mean} (\ln({\text{wages}}_{B}))\\[4pt]={}&b_{A}\operatorname {mean} (X_{A})-b_{B}\operatorname {mean} (X_{B})\\[4pt]={}&b_{A}(\operatorname {mean} (X_{A})-\operatorname {mean} (X_{B}))+\operatorname {mean} (X_{B})(b_{A}-b_{B})\end{aligned}}}

The first part of the last line of (3) is the impact of between-group differences in the explanatory variables X, evaluated using the coefficients for group A. The second part is the differential not explained by these differences in observed characteristics X.

## Interpretation

The unexplained differential in wages for the same values of explanatory variables should not be interpreted as the amount of the difference in wages due only to discrimination. This is because other explanatory variables not included in the regression (e.g. because they are unobserved) may also account for wage differences. For example, David Card and Alan Krueger found in a paper entitled, "School Quality and Black-White Relative Earnings: A Direct Assessment"[5] that improvements in the quality of schools for Black men born in the Southern states of the United States between 1915 and 1966 increased the return to education for these men, leading to narrowing of the black-white earnings gap. In terms of wage regressions, the poor quality of schools for Black men had meant a lower value of the β coefficient on years of schooling for Black men than for White men. Thus, some of this lower β coefficient reflected a difference in the quality of education for Black workers which could have otherwise interpreted as an effect of discrimination.