Correlation correction for attenuation

Correction for attenuation is a statistical procedure, due to Spearman (1904), to "rid a correlation coefficient from the weakening effect of measurement error" (Jensen, 1998), a phenomenon also known as regression dilution. In measurement and statistics, it is also called disattenuation. The correlation between two sets of parameters or measurements is estimated in a manner that accounts for measurement error contained within the estimates of those parameters.

Background

Correlations between parameters are diluted or weakened by measurement error. Disattenuation provides for a more accurate estimate of the correlation between the parameters by accounting for this effect.

Definition

The disattenuated estimate of the correlation between two sets of parameters or measures is therefore

\rho ={\frac {{\mbox{corr}}({\hat {\beta }},{\hat {\theta }})}{\sqrt {R_{\beta }R_{\theta }}}}.

That is, the disattenuated correlation is obtained by dividing the correlation between the estimates by the geometric mean of the separation indices of the two sets of estimates. Expressed in terms of Classical test theory, the correlation is divided by the geometric mean of the reliability coefficients of two tests.

Given two random variables $X$ and $Y$ , with correlation $r_{xy}$ , and a known reliability for each variable, $r_{xx}$ and $r_{yy}$ , the correlation between $X$ and $Y$ corrected for attenuation is $r_{x'y'}={\frac {r_{xy}}{\sqrt {r_{xx}r_{yy}}}}$ .

How well the variables are measured affects the correlation of X and Y. The correction for attenuation tells you what the correlation would be if you could measure X and Y with perfect reliability.

If $X$ and $Y$ are taken to be imperfect measurements of underlying variables $X'$ and $Y'$ with independent errors, then $r_{x'y'}$ measures the true correlation between $X'$ and $Y'$ .

Derivation of the formula

Let $\beta$ and $\theta$ be the true values of two attributes of some person or statistical unit. These values are regarded as random variables by virtue of the statistical unit being selected randomly from some population. Let ${\hat {\beta }}$ and ${\hat {\theta }}$ be estimates of $\beta$ and $\theta$ derived either directly by observation-with-error or from application of a measurement model, such as the Rasch model. Also, let

{\hat {\beta }}=\beta +\epsilon _{\beta },\quad \quad {\hat {\theta }}=\theta +\epsilon _{\theta },

where $\epsilon _{\beta }$ and $\epsilon _{\theta }$ are the measurement errors associated with the estimates ${\hat {\beta }}$ and ${\hat {\theta }}$ .

The correlation between two sets of estimates is

\operatorname {corr} ({\hat {\beta }},{\hat {\theta }})={\frac {\operatorname {cov} ({\hat {\beta }},{\hat {\theta }})}{{\sqrt {\operatorname {var} [{\hat {\beta }}]\operatorname {var} [{\hat {\theta }}}}]}}

={\frac {\operatorname {cov} (\beta +\epsilon _{\beta },\theta +\epsilon _{\theta })}{\sqrt {\operatorname {var} [\beta +\epsilon _{\beta }]\operatorname {var} [\theta +\epsilon _{\theta }]}}},

which, assuming the errors are uncorrelated with each other and with the estimates, gives

\operatorname {corr} ({\hat {\beta }},{\hat {\theta }})={\frac {\operatorname {cov} (\beta ,\theta )}{\sqrt {(\operatorname {var} [\beta ]+\operatorname {var} [\epsilon _{\beta }])(\operatorname {var} [\theta ]+\operatorname {var} [\epsilon _{\theta }])}}}

={\frac {\operatorname {cov} (\beta ,\theta )}{\sqrt {(\operatorname {var} [\beta ]\operatorname {var} [\theta ])}}}.{\frac {\sqrt {\operatorname {var} [\beta ]\operatorname {var} [\theta ]}}{\sqrt {(\operatorname {var} [\beta ]+\operatorname {var} [\epsilon _{\beta }])(\operatorname {var} [\theta ]+\operatorname {var} [\epsilon _{\theta }])}}}

=\rho {\sqrt {R_{\beta }R_{\theta }}},

where $R_{\beta }$ is the separation index of the set of estimates of $\beta$ , which is analogous to Cronbach's alpha; that is, in terms of Classical test theory, $R_{\beta }$ is analogous to a reliability coefficient. Specifically, the separation index is given as follows:

R_{\beta }={\frac {\operatorname {var} [\beta ]}{\operatorname {var} [\beta ]+\operatorname {var} [\epsilon _{\beta }]}}={\frac {\operatorname {var} [{\hat {\beta }}]-\operatorname {var} [\epsilon _{\beta }]}{\operatorname {var} [{\hat {\beta }}]}},

where the mean squared standard error of person estimate gives an estimate of the variance of the errors, $\epsilon _{\beta }$ . The standard errors are normally produced as a by-product of the estimation process (see Rasch model estimation).

References

Jensen, A.R. (1998). The g Factor: The Science of Mental Ability Praeger, Connecticut, USA. ISBN 0-275-96103-6
Spearman, C. (1904) "The Proof and Measurement of Association between Two Things". The American Journal of Psychology, 15 (1), 72–101 JSTOR 1412159

External links

Background

Definition

Derivation of the formula

See also

References

External links