# Shapiro–Wilk test

(Redirected from Wilk–Shapiro test)

The Shapiro–Wilk test is a test of normality in frequentist statistics. It was published in 1965 by Samuel Sanford Shapiro and Martin Wilk.[1]

## Theory

The Shapiro–Wilk test tests the null hypothesis that a sample x1, ..., xn came from a normally distributed population. The test statistic is

${\displaystyle W={\left(\sum _{i=1}^{n}a_{i}x_{(i)}\right)^{2} \over \sum _{i=1}^{n}(x_{i}-{\overline {x}})^{2}},}$

where

• ${\displaystyle x_{(i)}}$ (with parentheses enclosing the subscript index i) is the ith order statistic, i.e., the ith-smallest number in the sample;
• ${\displaystyle {\overline {x}}=\left(x_{1}+\cdots +x_{n}\right)/n}$ is the sample mean;
• the constants ${\displaystyle a_{i}}$ are given by[1]
${\displaystyle (a_{1},\dots ,a_{n})={m^{\mathsf {T}}V^{-1} \over (m^{\mathsf {T}}V^{-1}V^{-1}m)^{1/2}},}$
where
${\displaystyle m=(m_{1},\dots ,m_{n})^{\mathsf {T}}\,}$
and ${\displaystyle m_{1},\ldots ,m_{n}}$ are the expected values of the order statistics of independent and identically distributed random variables sampled from the standard normal distribution, and ${\displaystyle V}$ is the covariance matrix of those order statistics.

## Interpretation

The null-hypothesis of this test is that the population is normally distributed. Thus, if the p-value is less than the chosen alpha level, then the null hypothesis is rejected and there is evidence that the data tested are not from a normally distributed population; in other words, the data are not normal. On the contrary, if the p-value is greater than the chosen alpha level, then the null hypothesis that the data came from a normally distributed population cannot be rejected (e.g., for an alpha level of 0.05, a data set with a p-value of 0.02 rejects the null hypothesis that the data are from a normally distributed population).[2] However, since the test is biased by sample size,[3] the test may be statistically significant from a normal distribution in any large samples. Thus a Q–Q plot is required for verification in addition to the test.

## Power analysis

Monte Carlo simulation has found that Shapiro–Wilk has the best power for a given significance, followed closely by Anderson–Darling when comparing the Shapiro–Wilk, Kolmogorov–Smirnov, Lilliefors, and Anderson–Darling tests.[4]

## Approximation

Royston proposed an alternative method of calculating the coefficients vector by providing an algorithm for calculating values, which extended the sample size to 2000.[5] This technique is used in several software packages including Stata,[6][7] SPSS and SAS.[8] Rahman and Govidarajulu extended the sample size further up to 5000.[9]