Fisher transformation

In statistics, hypotheses about the value of the population correlation coefficient ρ between variables X and Y can be tested using the Fisher transformation^[1]^[2] (aka Fisher z-transformation) applied to the sample correlation coefficient.

Definition

Given a set of N bivariate sample pairs (X_i, Y_i), i = 1, ..., N, the sample correlation coefficient r is given by

r={\frac {\operatorname {cov} (X,Y)}{\sigma _{X}\sigma _{Y}}}={\frac {\sum _{i=1}^{N}(X_{i}-{\bar {X}})(Y_{i}-{\bar {Y}})}{{\sqrt {\sum _{i=1}^{N}(X_{i}-{\bar {X}})^{2}}}{\sqrt {\sum _{i=1}^{N}(Y_{i}-{\bar {Y}})^{2}}}}}.

Here $\operatorname {cov} (X,Y)$ stands for the covariance between the variables $X$ and $Y$ and $\sigma$ stands for the standard deviation of the respective variable. Fisher's z-transformation of r is defined as

z:={1 \over 2}\ln \left({1+r \over 1-r}\right)=\operatorname {arctanh} (r),

where "ln" is the natural logarithm function and "arctanh" is the inverse hyperbolic tangent function.

If (X, Y) has a bivariate normal distribution, and if the pairs (X_i, Y_i) are independent, then z is approximately normally distributed with mean

{1 \over 2}\ln \left({{1+\rho } \over {1-\rho }}\right),

and standard error

{1 \over {\sqrt {N-3}}},

where N is the sample size, and ρ is the true correlation coefficient.

This transformation, and its inverse

r={\frac {\exp(2z)-1}{\exp(2z)+1}}=\operatorname {tanh} (z),

can be used to construct a large-sample confidence interval for r using standard normal theory and derivations.

Discussion

The Fisher transformation is an approximate variance-stabilizing transformation for r when X and Y follow a bivariate normal distribution. This means that the variance of z is approximately constant for all values of the population correlation coefficient ρ. Without the Fisher transformation, the variance of r grows smaller as |ρ| gets closer to 1. Since the Fisher transformation is approximately the identity function when |r| < 1/2, it is sometimes useful to remember that the variance of r is well approximated by 1/N as long as |ρ| is not too large and N is not too small. This is related to the fact that the asymptotic variance of r is 1 for bivariate normal data.

The behavior of this transform has been extensively studied since Fisher introduced it in 1915. Fisher himself found the exact distribution of z for data from a bivariate normal distribution in 1921; Gayen in 1951^[3] determined the exact distribution of z for data from a bivariate Type A Edgeworth distribution. Hotelling in 1953 calculated the Taylor series expressions for the moments of z and several related statistics^[4] and Hawkins in 1989 discovered the asymptotic distribution of z for data from a distribution with bounded fourth moments.^[5]

Other uses

While the Fisher transformation is mainly associated with the Pearson product-moment correlation coefficient for bivariate normal observations, it can also be applied to Spearman's rank correlation coefficient in more general cases. A similar result for the asymptotic distribution applies, but with a minor adjustment factor: see the latter article for details.

References

^ Fisher, R. A. (1915). "Frequency distribution of the values of the correlation coefficient in samples of an indefinitely large population". Biometrika. 10 (4). Biometrika Trust: 507–521. doi:10.2307/2331838. JSTOR 2331838.
^ Fisher, R. A. (1921). "On the 'probable error' of a coefficient of correlation deduced from a small sample" (PDF). Metron. 1: 3–32.
^ Gayen, A. K. (1951). "The Frequency Distribution of the Product-Moment Correlation Coefficient in Random Samples of Any Size Drawn from Non-Normal Universes". Biometrika. 38 (1/2). Biometrika Trust: 219–247. doi:10.1093/biomet/38.1-2.219. JSTOR 2332329.
^ Hotelling, H (1953). "New light on the correlation coefficient and its transforms". Journal of the Royal Statistical Society, Series B. 15 (2). Blackwell Publishing: 193–225. JSTOR 2983768.
^ Hawkins, D. L. (1989). "Using U statistics to derive the asymptotic distribution of Fisher's Z statistic". The American Statistician. 43 (4). American Statistical Association: 235–237. doi:10.2307/2685369. JSTOR 2685369.

[1] Fisher, R. A. (1915). "Frequency distribution of the values of the correlation coefficient in samples of an indefinitely large population". Biometrika. 10 (4). Biometrika Trust: 507–521. doi:10.2307/2331838. JSTOR 2331838.

[2] Fisher, R. A. (1921). "On the 'probable error' of a coefficient of correlation deduced from a small sample" (PDF). Metron. 1: 3–32.

[3] Gayen, A. K. (1951). "The Frequency Distribution of the Product-Moment Correlation Coefficient in Random Samples of Any Size Drawn from Non-Normal Universes". Biometrika. 38 (1/2). Biometrika Trust: 219–247. doi:10.1093/biomet/38.1-2.219. JSTOR 2332329.

[4] Hotelling, H (1953). "New light on the correlation coefficient and its transforms". Journal of the Royal Statistical Society, Series B. 15 (2). Blackwell Publishing: 193–225. JSTOR 2983768.

[5] Hawkins, D. L. (1989). "Using U statistics to derive the asymptotic distribution of Fisher's Z statistic". The American Statistician. 43 (4). American Statistical Association: 235–237. doi:10.2307/2685369. JSTOR 2685369.

[1]

[2]

[3]

[4]

[5]

Definition

Discussion

Other uses

See also

References