Jump to content

Wilcoxon signed-rank test: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
mNo edit summary
Near total rewrite
Line 1: Line 1:
{{Short description|Non-parametric statistical hypothesis test used to compare two related samples to assess whether their population mean ranks differ}}
{{Short description|Non-parametric statistical hypothesis test used to compare two related samples to assess whether their population mean ranks differ}}

The '''Wilcoxon signed-rank test''' is a [[Non-parametric statistics|non-parametric]] [[statistical hypothesis testing|statistical hypothesis test]] used to compare two related samples, matched samples, or repeated measurements on a single sample to assess whether their population mean ranks differ (i.e., it is a [[paired difference test]]). It can be used as an alternative to the [[Student's t-test|paired Student's ''t''-test]] (also known as "''t''-test for matched pairs" or "''t''-test for dependent samples") when the distribution of the difference between two samples' means cannot be assumed to be [[normally distributed]].<ref>{{Cite web|url=http://www.biostathandbook.com/pairedttest.html|title=Paired t–test - Handbook of Biological Statistics|website=www.biostathandbook.com|access-date=2019-11-18}}</ref> A Wilcoxon signed-rank test is a nonparametric test that can be used to determine whether two dependent samples were selected from populations having the same distribution.
The '''Wilcoxon signed-rank test''' is a [[Non-parametric statistics|non-parametric]] [[statistical hypothesis testing|statistical hypothesis test]] used to test either the location of a set of samples, or to compare the locations of two populations using a set of matched samples.<ref name="Conover">{{cite book|last=Conover|first=W. J.|title=Practical nonparametric statistics|edition=3rd|publisher=John Wiley & Sons, Inc.|year=1999|isbn=0-471-16068-7}}, p. 350</ref> When applied to test the location of a set of samples, it serves the same purpose as the one-sample [[Student's t-test|Student's ''t''-test]].<ref>{{Cite web|url=http://www.biostathandbook.com/wilcoxonsignedrank.html|title=Wilcoxon signed-rank test - Handbook of Biological Statistics|website=www.biostathandbook.com|access-date=2021-09-02}}</ref> On a set of matched samples, it is a [[paired difference test]] like the paired Student's ''t''-test (also known as the "''t''-test for matched pairs" or "''t''-test for dependent samples"). Unlike the Student's ''t''-test, the Wilcoxon signed-rank test does not assume that the data is normally distributed, so it gives correct results on a wider variety of data sets. The cost of this applicability is that, on normally distributed data, the Wilcoxon signed-rank test has less [[statistical power]] than the Student's ''t''-test, meaning that it is less likely to detect a truly significant result.


==History==
==History==
The test is named for [[Frank Wilcoxon]] (1892–1965) who, in a single paper, proposed both it and the [[Mann-Whitney-Wilcoxon test|rank-sum test]] for two independent samples (Wilcoxon, 1945).<ref>{{cite journal|last=Wilcoxon|first=Frank|title=Individual comparisons by ranking methods|journal=Biometrics Bulletin|date=Dec 1945|volume=1|issue=6|pages=80–83|url=http://sci2s.ugr.es/keel/pdf/algorithm/articulo/wilcoxon1945.pdf|doi=10.2307/3001968|jstor=3001968|hdl=10338.dmlcz/135688}}</ref> The test was popularized by [[Sidney Siegel]] (1956) in his influential textbook on non-parametric statistics.<ref>{{cite book|last=Siegel|first=Sidney|title=Non-parametric statistics for the behavioral sciences|year=1956|publisher=McGraw-Hill|location=New York|pages=75–83|isbn=9780070573482|url=https://books.google.com/books?id=ebfRAAAAMAAJ&q=Wilcoxon}}</ref> Siegel used the symbol ''T'' for a value related to, but not the same as, <math>W</math>. In consequence, the test is sometimes referred to as the '''Wilcoxon ''T'' test''', and the test statistic is reported as a value of ''T''.
The test is named for [[Frank Wilcoxon]] (1892–1965) who, in a single paper, proposed both it and the [[Mann-Whitney-Wilcoxon test|rank-sum test]] for two independent samples.<ref name="Wilcoxon">{{cite journal|last=Wilcoxon|first=Frank|title=Individual comparisons by ranking methods|journal=Biometrics Bulletin|date=Dec 1945|volume=1|issue=6|pages=80–83|url=http://sci2s.ugr.es/keel/pdf/algorithm/articulo/wilcoxon1945.pdf|doi=10.2307/3001968|jstor=3001968|hdl=10338.dmlcz/135688}}</ref> The test was popularized by [[Sidney Siegel]] (1956) in his influential textbook on non-parametric statistics.<ref name="Siegel">{{cite book|last=Siegel|first=Sidney|title=Non-parametric statistics for the behavioral sciences|year=1956|publisher=McGraw-Hill|location=New York|pages=75–83|isbn=9780070573482|url=https://books.google.com/books?id=ebfRAAAAMAAJ&q=Wilcoxon}}</ref> Siegel used the symbol ''T'' for the test statistic, and consequently, the test is sometimes referred to as the '''Wilcoxon ''T''-test'''.

==Assumptions==
# Data are paired and come from the same population.
# Each pair is chosen randomly and independently{{Citation needed|date=February 2017}}.
# The data are measured on at least an [[interval scale]] when, as is usual, within-pair ''differences'' are calculated to perform the test (though it does suffice that within-pair comparisons are on an [[ordinal scale]]).


==Test procedure==
==Test procedure==
There are two variants of the signed-rank test. From a theoretical point of view, the one-sample test is more fundamental because the paired sample test is performed by converting the data to situation of the one-sample test. However, most practical applications of the signed-rank test arise from paired data.
Let <math>N</math> be the sample size, i.e., the number of pairs. Thus, there are a total of ''2N'' data points. For pairs <math>i = 1, ..., N</math>, let <math>x_{1,i}</math> and <math>x_{2,i}</math> denote the measurements.

For a paired sample test, the data consists of samples <math>(X_1, Y_1), \dots, (X_n, Y_n)</math>. Each sample is a pair of measurements on an [[interval scale]]. The measurements are converted to [[real number]]s, and the paired sample test is converted to a one-sample test by replacing each pair of numbers <math>(X_i, Y_i)</math> by its difference <math>X_i - Y_i</math>.<ref>Conover, p. 352</ref>

The data for a one-sample test is a set of real number samples <math>X_1, \dots, X_n</math>. Assume for simplicity that the samples are all different and that no sample equals zero. (Zeros and ties introduce several complications; see below.) The test is performed as follows:<ref>Conover, p. 353</ref><ref>{{cite book|last1=Pratt|first1=John W.|last2=Gibbons|first2=Jean D.|title=Concepts of Nonparametric Theory|year=1981|publisher=Springer-Verlag|isbn=978-1-4612-5933-6}}, p. 148</ref>

# Compute <math>|X_1|, \dots, |X_n|</math>.
# Sort these quantities. Define <math>R_1, \dots, R_n</math> so that <math>0 < |X_{R_1}|< |X_{R_2}|< \dots < |X_{R_n}|</math>.
# Let <math>\sgn</math> denote the [[sign function]]: <math>\sgn(x) = 1</math> if <math>x > 0</math> and <math>\sgn(x) = -1</math> if <math>x < 0</math>. The [[test statistic]] is the ''signed-rank sum'' <math>T</math>: <math display="block">T = \sum_{i=1}^N \sgn(X_i)R_i.</math>
# Produce a <math>p</math>-value by comparing <math>T</math> to its distribution under the null hypothesis.

The signed-rank sum <math>T</math> is closely related to two other test statistics. The ''positive-rank sum'' <math>T^+</math> and the ''negative-rank sum'' <math>T^-</math> are defined by<ref>Pratt and Gibbons, p. 148</ref>
<math display="block">\begin{align}
T^+ &= \sum_{1 \le i \le n,\ X_i > 0} R_i, \\
T^- &= \sum_{1 \le i \le n,\ X_i < 0} R_i.
\end{align}</math>
Because <math>T^+ + T^-</math> equals the sum of all the ranks, which is <math>1 + 2 + \dots + n = n(n + 1)/2</math>, these three statistics are related by:<ref>Pratt and Gibbons, p. 148</ref>
<math display="block">\begin{align}
T^+ &= \frac{n(n + 1)}{2} - T^- = \frac{n(n + 1)}{4} + \frac{T}{2}, \\
T^- &= \frac{n(n + 1)}{2} - T^+ = \frac{n(n + 1)}{4} - \frac{T}{2}, \\
T &= T^+ - T^- = 2T^+ - \frac{n(n + 1)}{2} = \frac{n(n + 1)}{2} - 2T^-.
\end{align}</math>
Because <math>T</math>, <math>T^+</math>, and <math>T^-</math> carry the same information, any of them may be used as the test statistic.

The positive-rank sum and negative-rank sum have the following alternative interpretations. Define the ''Walsh average'' <math>W_{ij}</math> to be <math>\tfrac12(X_i + X_j)</math>. Then:<ref>Pratt and Gibbons, p. 150</ref>
<math display="block">\begin{align}
T^+ = \#\{W_{ij} > 0 \colon 1 \le i \le j \le n\}, \\
T^- = \#\{W_{ij} < 0 \colon 1 \le i \le j \le n\}.
\end{align}</math>

==Null and alternative hypotheses==

===One-sample test===
The one-sample Wilcoxon signed-rank test can be used to test whether data comes from a symmetric population with a specified median.<ref>Conover, pp. 352–357</ref> If the population median is known, then it can be used to test whether data is symmetric about its center.<ref name="Hettmansperger">{{cite book|last=Hettmansperger|first=Thomas P.|title=Statistical Inference Based on Ranks|publisher=John Wiley & Sons|year=1984|isbn=0-471-88474-X}}, pp. 32, 50</ref>

To explain the null and alternative hypotheses formally, assume that the data consists of [[independent and identically distributed random variables|independent and identically distributed]] samples from a distribution <math>F</math>. If <math>X_1</math> and <math>X_2</math> are IID <math>F</math>-distributed random variables, define <math>F^{(2)}</math> to be the cumulative distribution function of <math>\tfrac12(X_1 + X_2)</math>. Set
<math display="block">p_2 = \Pr(\tfrac12(X_1 + X_2) > 0) = 1 - F^{(2)}(0).</math>
Assume that <math>F</math> is continuous. The one-sample Wilcoxon signed-rank sum test is a test for the following null hypothesis against one of the following alternative hypotheses:<ref>Pratt and Gibbons, p. 153</ref>
; Null hypothesis ''H''<sub>0</sub> : <math>p_2 = \tfrac12</math>
; One-sided alternative hypothesis ''H''<sub>1</sub> : <math>p_2 > \tfrac12</math>.
; One-sided alternative hypothesis ''H''<sub>2</sub> : <math>p_2 < \tfrac12</math>.
; Two-sided alternative hypothesis ''H''<sub>3</sub> : <math>p_2 \neq \tfrac12</math>.
The alternative hypothesis being tested depends on whether the ''p''-value is one-sided (and if so, which side) or two-sided. The test can also be used as a test for the value of <math>\Pr(\tfrac12(X_1 + X_2) > \mu)</math> by subtracting <math>\mu</math> from every data point.

The above null and alternative hypotheses are derived from the fact that <math>2T^+ / n^2</math> is a consistent estimator of <math>p_2</math>.<ref>Pratt and Gibbons, pp. 153–154</ref> It can also be derived from the description of <math>T^+</math> and <math>T^-</math> in terms of Walsh averages, since that description shows that the Wilcoxon test is the same as the sign test applied to the set of Walsh averages.<ref>Hettmansperger, pp. 38–39</ref>

Restricting the distributions of interest can lead to more interpretable null and alternative hypotheses. One mildly restrictive assumption is that <math>F^{(2)}</math> has a unique median. This median is called the [[pseudomedian]] of <math>F</math>; in general it is different from the mean and the median, even when all three exist. If the existence of a unique pseudomedian can be assumed true under both the null and alternative hypotheses, then these hypotheses can be restated as:
; Null hypothesis ''H''<sub>0</sub> : The pseudomedian of <math>F</math> is located at zero.
; One-sided alternative hypothesis ''H''<sub>1</sub> : The pseudomedian of <math>F</math> is located at <math>\mu < 0</math>.
; One-sided alternative hypothesis ''H''<sub>2</sub> : The pseudomedian of <math>F</math> is located at <math>\mu > 0</math>.
; Two-sided alternative hypothesis ''H''<sub>3</sub> : The pseudomedian of <math>F</math> is located at <math>\mu \neq 0</math>.

Most often, the null and alternative hypotheses are stated under the assumption of symmetry. Fix a real number <math>\mu</math>. Define <math>F</math> to be ''symmetric about <math>\mu</math>'' if a random variable <math>X</math> with distribution <math>F</math> satisfies <math>\Pr(X \le \mu - x) = \Pr(X \ge \mu + x)</math> for all <math>x</math>. If <math>F</math> has a density function <math>f</math>, then <math>F</math> is symmetric about <math>\mu</math> if and only if <math>f(\mu + x) = f(\mu - x)</math> for every <math>x</math>.<ref>Pratt and Gibbons, pp. 146–147</ref>

If the null and alternative distributions of <math>F</math> can be assumed symmetric, then the null and alternative hypotheses simplify to the following:<ref>Pratt and Gibbons, pp. 146–147</ref>
; Null hypothesis ''H''<sub>0</sub> : <math>F</math> is symmetric about <math>\mu = 0</math>.
; One-sided alternative hypothesis ''H''<sub>1</sub> : <math>F</math> is symmetric about <math>\mu < 0</math>.
; One-sided alternative hypothesis ''H''<sub>2</sub> : <math>F</math> is symmetric about <math>\mu > 0</math>.
; Two-sided alternative hypothesis ''H''<sub>3</sub> : <math>F</math> is symmetric about <math>\mu \neq 0</math>.
If in addition <math>\Pr(X = \mu) = 0</math>, then <math>\mu</math> is a median of <math>F</math>. If this median is unique, then the Wilcoxon signed-rank sum test becomes a test for the location of the median.<ref>Hettmansperger, pp. 30–31</ref> When the mean of <math>F</math> is defined, then the mean is <math>\mu</math>, and the test is also a test for the location of the mean.<ref>Conover, p. 353</ref>

The restriction that the alternative distribution is symmetric is highly restrictive, but for one-sided tests it can be weakened. Say that <math>F</math> is ''stochastically smaller than a distribution symmetric about zero'' if an <math>F</math>-distributed random variable <math>X</math> satsifies <math>\Pr(X < -x) \ge \Pr(X > x)</math> for all <math>x \ge 0</math>. Similarly, <math>F</math> is ''stochastically larger than a distribution symmetric about zero'' if <math>\Pr(X < -x) \le \Pr(X > x)</math> for all <math>x \ge 0</math>. Then the Wilcoxon signed-rank sum test can also be used for the following null and alternative hypotheses:<ref>Pratt and Gibbons, pp. 155–156</ref><ref>Hettmansperger, pp. 49–50</ref>
; Null hypothesis ''H''<sub>0</sub> : <math>F</math> is symmetric about <math>\mu = 0</math>.
; One-sided alternative hypothesis ''H''<sub>1</sub> : <math>F</math> is stochastically smaller than a distribution symmetric about zero.
; One-sided alternative hypothesis ''H''<sub>2</sub> : <math>F</math> is stochastically larger than a distribution symmetric about zero.

The hypothesis that the data are IID can be weakened. Each data point may be taken from a different distribution, as long as all the distributions are assumed to be continuous and symmetric about a common point <math>\mu_0</math>. The data points are not required to be independent as long as the the conditional distribution of each observation given the others is symmetric about <math>\mu_0</math>.<ref>Pratt and Gibbons, p. 155</ref>

===Paired data test===
Because the paired data test arises from taking paired differences, its null and alternative hypotheses can be derived from those of the one-sample test. In each case, they become assertions about the behavior of the differences <math>X_i - Y_i</math>.

Let <math>F(x, y)</math> be the joint cumulative distribution of the pairs <math>(X_i, Y_i)</math>. If <math>F</math> is continuous, then the most general null and alternative hypotheses are expressed in terms of
<math display="block">p_2 = Pr(\tfrac12(X_i - Y_i + X_j - Y_j) > 0)</math>
and are identical to the one-sample case:
; Null hypothesis ''H''<sub>0</sub> : <math>p_2 = \tfrac12</math>
; One-sided alternative hypothesis ''H''<sub>1</sub> : <math>p_2 > \tfrac12</math>.
; One-sided alternative hypothesis ''H''<sub>2</sub> : <math>p_2 < \tfrac12</math>.
; Two-sided alternative hypothesis ''H''<sub>3</sub> : <math>p_2 \neq \tfrac12</math>.
Like the one-sample case, under some restrictions the test can be interpreted as a test for whether the pseudomedian of the differences is located at zero.

A common restriction is to symmetric distributions of differences. In this case, the null and alternative hypotheses are:<ref>Conover, p. 354</ref><ref name="Hollander-Wolfe-Chicken">{{cite book|last1=Hollander|first1=Myles|last2=Wolfe|first2=Douglas A.|last3=Chicken|first3=Eric|title=Nonparametric Statistical Methods|edition=Third|publisher=John Wiley & Sons, Inc.|year=2014|isbn=978-0-470-38737-5}}, pp. 39–41</ref>
; Null hypothesis ''H''<sub>0</sub> : The observations <math>X_i - Y_i</math> are symmetric about <math>\mu = 0</math>.
; One-sided alternative hypothesis ''H''<sub>1</sub> : The observations <math>X_i - Y_i</math> are symmetric about <math>\mu < 0</math>.
; One-sided alternative hypothesis ''H''<sub>2</sub> : The observations <math>X_i - Y_i</math> are symmetric about <math>\mu > 0</math>.
; Two-sided alternative hypothesis ''H''<sub>3</sub> : The observations <math>X_i - Y_i</math> are symmetric about <math>\mu \neq 0</math>.
These can also be expressed more directly in terms of the original pairs:<ref>Pratt and Gibbons, p. 147</ref>
; Null hypothesis ''H''<sub>0</sub> : The observations <math>(X_i, Y_i)</math> are ''exchangeable'', meaning that <math>(X_i, Y_i)</math> and <math>(Y_i, X_i)</math> have the same distribution. Equivalently, <math>F(x, y) = F(y, x)</math>.
; One-sided alternative hypothesis ''H''<sub>1</sub> : For some <math>\mu < 0</math>, the pairs <math>(X_i, Y_i)</math> and <math>(Y_i + \mu, X_i)</math> have the same distribution.
; One-sided alternative hypothesis ''H''<sub>2</sub> : For some <math>\mu > 0</math>, the pairs <math>(X, Y)</math> and <math>(Y_i + \mu, X_i)</math> have the same distribution.
; Two-sided alternative hypothesis ''H''<sub>3</sub> : For some <math>\mu \neq 0</math>, the pairs <math>(X, Y)</math> and <math>(Y_i + \mu, X_i)</math> have the same distribution.
The null hypothesis of exchangeability can arise from a matched pair experiment with a treatment group and a control group. Randomizing the treatment and control within each pair makes the observations exchangeable. For an exchangeable distribution, <math>X_i - Y_i</math> has the same distribution as <math>Y_i - X_i</math>, and therefore, under the null hypothesis, the distribution is symmetric about zero.<ref>Pratt and Gibbons, p. 147</ref>

Symmetry of the differences is a very restrictive condition on the alternative hypothesis. However, because the one-sample test can be used as a one-sided test for stochastic dominance, the paired difference Wilcoxon test can be used to compare the following hypotheses:<ref>Hettmansperger, pp. 49–50</ref>
; Null hypothesis ''H''<sub>0</sub> : The observations <math>(X_i, Y_i)</math> are exchangeable.
; One-sided alternative hypothesis ''H''<sub>1</sub> : The differences <math>X_i - Y_i</math> are stochastically smaller than a distribution symmetric about zero, that is, for every <math>x \ge 0</math>, <math>Pr(X_i < Y_i - x) \ge \Pr(X_i > Y_i + x)</math>.
; One-sided alternative hypothesis ''H''<sub>2</sub> : The differences <math>X_i - Y_i</math> are stochastically larger than a distribution symmetric about zero, that is, for every <math>x \ge 0</math>, <math>Pr(X_i < Y_i - x) \le \Pr(X_i > Y_i + x)</math>.

==Zeros and ties==
In real data, it sometimes happens that there is a sample <math>X_i</math> which equals zero or a pair <math>(X_i, Y_i)</math> with <math>X_i = Y_i</math>. It can also happen that there are tied samples. This means that for some <math>i \neq j</math>, we have <math>X_i = X_j</math> (in the one-sample case) or <math>X_i - Y_i = X_j - Y_j</math> (in the paired sample case). This is particularly common for discrete data. When this happens, the test procedure defined above is usually undefined because there is no way to uniquely rank the data. (The sole exception is if there is a single sample <math>X_i</math> which is zero and no other zeros or ties.) Because of this, the test statistic needs to be modified.

===Zeros===
Wilcoxon's original paper did not address the question of observations (or, in the paired sample case, differences) that equal zero. However, in later surveys, he recommended removing zeros from the sample.<ref>{{cite book|last=Wilcoxon|first=Frank|title=Some Rapid Approximate Statistical Procedures|publisher=American Cynamic Co.|year=1949}}</ref> Then the standard signed-rank test could be applied to the resulting data, as long as there were no ties. This is now called the ''reduced sample procedure.''

However, Pratt<ref name="Pratt">{{cite journal|last1=Pratt|first1=J.|title=Remarks on zeros and ties in the Wilcoxon signed rank procedures|journal=Journal of the American Statistical Association|date=1959|volume=54|issue=287|pages=655–667|doi=10.1080/01621459.1959.10501526}}</ref> observed that the reduced sample procedure can lead to paradoxical behavior. He gives the following example. Suppose that we are in the one-sample situation and have the following thirteen observations:
:0, 2, 3, 4, 6, 7, 8, 9, 11, 14, 15, 17, 18.
The reduced sample procedure removes the zero. To the remaining data, it assigns the signed ranks:
:1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, &minus;12.
This has a one-sided ''p''-value of <math>55/2^{12}</math>, and therefore the sample is not significantly positive at any significance level <math>\alpha < 55/2^{12} \approx 0.0134</math>. Pratt argues that one would expect that decreasing the observations should certainly not make the data appear more positive. However, if the zero observation is decreased by an amount less than 2, or if all observations are decreased by an amount less than 1, then the signed ranks become:
:&minus;1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, &minus;13.
This has a one-sided ''p''-value of <math>109/2^{13}</math>. Therefore the sample would be judged significantly positive at any significance level <math>\alpha > 109/2^{13} \approx 0.0133</math>. The paradox is that, if <math>\alpha</math> is between <math>109/2^{13}</math> and <math>55/2^{12}</math>, then ''decreasing'' an insignificant sample causes it to appear significantly ''positive''.

Pratt therefore proposed the ''signed-rank zero procedure.'' This procedure includes the zeros when ranking the samples. However, it excludes them from the test statistic, or equivalently takes <math>\sgn(0) = 0</math>. Pratt proved that the signed-rank zero procedure has several desirable behaviors not shared by the reduced sample procedure:<ref>Pratt, p. 659</ref>
# Increasing the observed values does not make a significantly positive sample insignificant, and it does not make an insignificant sample significantly negative.
# If the distribution of the observations is symmetric, then the values of <math>\mu</math> which the test does not reject form an interval.
# A sample is significantly positive, not significant, or significantly negative, if and only if it is so when the zeros are assigned arbitrary non-zero signs, if and only if it is so when the zeros are replaced with non-zero values which are smaller in absolute value than any non-zero observation.
# For a fixed significance threshold <math>\alpha</math>, and for a test which is randomized to have level exactly <math>\alpha</math>, the probability of calling a set of observations significantly positive (respectively, significantly negative) is a non-decreasing (respectively, non-increasing) function of the observations.
Pratt remarks that, when the signed-rank zero procedure is combined with the average rank procedure for resolving ties, the resulting test is a consistent test against the alternative hypothesis that, for all <math>i \neq j</math>, <math>\Pr(X_i + X_j > 0)</math> and <math>\Pr(X_i + X_j < 0)</math> differ by at least a fixed constant that is independent of <math>i</math> and <math>j</math>.<ref>Pratt, p. 663</ref>

The signed-rank zero procedure has the disadvantage that, when zeros occur, the null distribution of the test statistic changes, so tables of ''p''-values can no longer be used.

When the data is on a Likert scale with equally spaced categories, the signed-rank zero procedure is more likely to maintain the Type I error rate than the reduced sample procedure.<ref name="IndivLikert">{{cite journal|last1=Derrick|first1=B|last2=White|first2=P| title=Comparing Two Samples from an Individual Likert Question |journal=International Journal of Mathematics and Statistics |date=2017|volume=18|issue=3|pages=1–13}}</ref>

===Ties===
When the data does not have ties, the ranks <math>R_i</math> are used to calculate the test statistic. In the presence of ties, the ranks are not defined. There are two main approaches to resolving this.

The most common procedure for handling ties, and the one originally recommended by Wilcoxon, is called the ''average rank'' or ''midrank procedure.'' This procedure assigns numbers between 1 and ''n'' to the observations, with two observations getting the same number if and only if they have the same absolute value. These numbers are conventionally called ranks even though the set of these numbers is not equal to <math>\{1, \dots, n\}</math> (except when there are no ties). The rank assigned to an observation is the average of the possible ranks it would have if the ties were broken in all possible ways. Once the ranks are assigned, the test statistic is computed in the same way as usual.

For example, suppose that the observations satisfy
<math display="block">
|X_3|< |X_2|= |X_5|< |X_6|< |X_1|= |X_4|= |X_7|.
</math>
In this case, <math>X_3</math> is assigned rank 1, <math>X_2</math> and <math>X_5</math> are assigned rank <math>(2 + 3) / 2 = 2.5</math>, <math>X_6</math> is assigned rank 4, and <math>X_1</math>, <math>X_4</math>, and <math>X_7</math> are assigned rank <math>(5 + 6 + 7) / 3 = 6</math>. Formally, suppose that there is a set of observations all having the same absolute value <math>v</math>, that <math>k - 1</math> observations have absolute value less than <math>v</math>, and that <math>\ell</math> observations have absolute value less than or equal to <math>v</math>. If the ties among the observations with absolute value <math>v</math> were broken, then these observations would occupy ranks <math>k</math> through <math>\ell</math>. The average rank procedure therefore assigns them the rank <math>(k + \ell) / 2</math>.

Under the average rank procedure, the null distribution is different in the presence of ties. The average rank procedure also has some disadvantages that are similar to those of the reduced sample procedure for zeros. It is possible that a sample can be judged significantly positive by the average rank procedure; but increasing some of the values so as to break the ties, or breaking the ties in any way whatsoever, results in a sample that the test judges to be not significant. However, increasing all the observed values by the same amount cannot turn a significantly positive result into an insignificant one, nor an insignificant one into a significantly negative one. Furthermore, if the observations are distributed symmetrically, then the values of <math>\mu</math> which the test does not reject form an interval.

The other option for handling ties is a tiebreaking procedure. In a tiebreaking procedure, the observations are assigned distinct ranks in the set <math>\{1, \dots, n\}</math>. The rank assigned to an observation depends on its absolute value and the tiebreaking rule. Observations with smaller absolute values are always given smaller ranks, just as in the standard rank-sum test. The tiebreaking rule is used to assign ranks to observations with the same absolute value.

''Random tiebreaking'' breaks the ties at random. Under random tiebreaking, the null distribution is the same as when there are no ties, but the result of the test depends not only on the data but on additional random choices. Averaging over these random choices results in the average rank procedure. ''Conservative tiebreaking'' breaks the ties in favor of the null hypothesis. When performing a one-sided test in which negative values of <math>T</math> tend to be more significant, ties are broken by assigning lower ranks to negative observations and higher ranks to positive ones. When the test makes positive values of <math>T</math> significant, ties are broken the other way, and when large absolute values of <math>T</math> are significant, ties are broken so as to make <math>|T|</math> as small as possible. Pratt observes that when ties are likely, the conservative tiebreaking procedure "presumably has low power, since it amounts to breaking all ties in favor of the null hypothesis."<ref>Pratt, p. 661</ref>

The average rank procedure can disagree with tiebreaking procedures. Pratt gives the following example. Suppose that the observations are:
:1, 1, 1, 1, 2, 3, &minus;4.
The average rank procedure assigns these the signed ranks
:2.5, 2.5, 2.5, 2.5, 5, 6, &minus;7.
This sample is significantly positive at the one-sided level <math>\alpha = 14 / 2^7</math>. On the other hand, any tiebreaking rule will assign the ranks
:1, 2, 3, 4, 5, 6, &minus;7.
At the same one-sided level <math>\alpha = 14 / 2^7</math>, this is not significant.

==Computing the null distribution==
Computing ''p''-values requires knowing the distribution of <math>T</math> under the null hypothesis. There is no closed formula for this distribution.<ref>Hettmansperger, p. 34</ref> However, for small values of <math>n</math>, the distribution may be computed exactly. Under the null hypothesis that the data is symmetric about zero, each <math>X_i</math> is exactly as likely to be positive as it is negative. Therefore the probability that <math>T = t</math> under the null hypothesis is equal to the number of sign combinations that yield <math>T = t</math> divided by the number of possible sign combinations <math>2^n</math>. This can be used to compute the exact distribution of <math>T</math> under the null hypothesis.<ref>Pratt and Gibbons, pp. 148–149</ref>

Computing the distribution of <math>T</math> by considering all possibilities requires computing <math>2^n</math> sums, which is intractable for all but the smallest <math>n</math>. However, there is an efficient recursion for the distribution of <math>T^+</math>.<ref>Pratt and Gibbons, pp. 148–149, pp. 186–187</ref><ref>Hettmansperger, p. 171</ref> Define <math>u_n(t^+)</math> to be the number of sign combinations for which <math>T^+ = t^+</math>. This is equal to the number of subsets of <math>\{1, \dots, n\}</math> which sum to <math>t^+</math>. The base cases of the recursion are <math>u_0(0) = 1</math>, <math>u_0(t^+) = 0</math> for all <math>t^+ \neq 0</math>, and <math>u_n(t^+) = 0</math> for all <math>t < 0</math> or <math>t > n(n + 1)/2</math>. The recursive formula is
<math display="block">u_n(t^+) = u_{n - 1}(t^+) + u_{n - 1}(t^+ - n).</math>
The formula is true because every subset of <math>\{1, \dots, n\}</math> which sums to <math>t^+</math> either does not contain <math>n</math>, in which case it is also a subset of <math>\{1, \dots, n - 1\}</math>, or it does contain <math>n</math>, in which case removing <math>n</math> from the subset produces a subset of <math>\{1, \dots, n - 1\}</math> which sums to <math>t^+ - n</math>. Under the null hypothesis, the probability mass function of <math>T^+</math> satisfies <math>\Pr(T^+ = t^+) = u_n(t^+) / 2^n</math>. The function <math>u_n</math> is closely related to the integer [[partition function (number theory)|partition function]].<ref>Pratt and Gibbons, p. 187</ref>

If <math>p_n(t^+)</math> is the probability that <math>T^+ = t^+</math> under the null hypothesis when there are <math>n</math> samples, then <math>p_n(t^+)</math> satisfies a similar recursion:<ref>Pratt and Gibbons, p. 187</ref>
<math display="block">2p_n(t^+) = p_{n-1}(t^+) + p_{n-1}(t^+ - n)</math>
with similar boundary conditions. There is also a recursive formula for the cumulative distribution function <math>\Pr(T^+ \le t^+)</math>.<ref>Pratt and Gibbons, p. 187</ref>

For very large <math>n</math>, even the above recursion is too slow. In this case, the null distribution can be approximated. The null distributions of <math>T</math>, <math>T^+</math>, and <math>T^-</math> are asymptotically normal with means and variances:<ref>Pratt and Gibbons, p. 149</ref>
<math display="block">\begin{align}
\mathbf{E}[T^+] &= \mathbf{E}[T^-] = \frac{n(n + 1)}{4}, \\
\mathbf{E}[T] &= 0, \\
\operatorname{Var}(T^+) &= \operatorname{Var}(T^-) = \frac{n(n + 1)(2n + 1)}{24}, \\
\operatorname{Var}(T) &= \frac{n(n + 1)(2n + 1)}{6}.
\end{align}</math>

Better approximations can be produced using Edgeworth expansions. Using a fourth-order Edgeworth expansion shows that:<ref name="Kolassa">{{cite journal|last=Kolassa|first=John E.|title=Edgeworth approximations for rank sum test statistics|journal=Statistics and Probability Letters|volume=24|year=1995|pages=169–171}}</ref><ref>Hettmansperger, p. 37</ref>
<math display="block">\Pr(T^+ \le k) \approx \Phi(t) + \phi(t)\Big(\frac{3n^2 + 3n - 1}{10n(n + 1)(2n + 1)}\Big)(t^3 - 3t),</math>
where
<math display="block">t = \frac{k + \tfrac12 - \frac{n(n + 1)}{4}}{\sqrt{\frac{n(n + 1)(2n + 1)}{6}}}.</math>
The technical underpinnings of these expansions are rather involved, because conventional Edgeworth expansions apply to sums of IID continuous random variables, while <math>T^+</math> is a sum of non-identically distributed discrete random variables. The final result, however, is that the above expansion has an error of <math>O(n^{-3/2})</math>, just like a conventional fourth-order Edgeworth expansion.<ref name="Kolassa"/>

The moment generating function of <math>T</math> has the exact formula:<ref>Hettmansperger, p. 35</ref>
<math display="block">M(t) = \frac{1}{2^n}\prod_{j=1}^n (1 + e^{jt}).</math>

When zeros are present and the signed-rank zero procedure is used, or when ties are present and the average rank procedure is used, the null distribution of <math>T</math> changes. Cureton derived a normal approximation for this situation.<ref name="Cureton">{{cite journal|last=Cureton|first=Edward E.|title=The normal approximation to the signed-rank sampling distribution when zero differences are present|journal=Journal of the American Statistical Association|year=1967|volume=62|number=319|pages=1068–1069}}</ref><ref>Pratt and Gibbons, p. 193</ref> Suppose that the original number of observations was <math>n</math> and the number of zeros was <math>z</math>. The tie correction is
<math display="block">c = \sum t^3 - t,</math>
where the sum is over all the size <math>t</math> of each group of tied observations. The expectation of <math>T</math> is still zero, while the expectation of <math>T^+</math> is
<math display="block">\mathbf{E}[T^+] = \frac{n(n + 1)}{4} - \frac{z(z + 1)}{4}.</math>
If
<math display="block">\sigma^2 = \frac{n(n + 1)(2n + 1) - z(z + 1)(2z + 1) - c/2}{6},</math>
then
<math display="block">\begin{align}
\operatorname{Var}(T) &= \sigma^2, \\
\operatorname{Var}(T^+) &= \sigma^2 / 4.
\end{align}</math>


==Alternative statistics==
: H<sub>0</sub>: difference between the pairs follows a [[symmetric distribution]] around zero
Wilcoxon<ref>Wilcoxon, p. 82</ref> originally defined the Wilcoxon rank-sum statistic to be <math>\min(T^+, T^-)</math>. Early authors such as Siegel<ref>Siegel, p. 76</ref> followed Wilcoxon. This is appropriate for two-sided hypothesis tests, but it cannot be used for one-sided tests.
: H<sub>1</sub>: difference between the pairs does not follow a symmetric distribution around zero.


Instead of assigning ranks between 1 and ''n'', it is also possible to assign ranks between 0 and <math>n - 1</math>. These are called ''modified ranks''.<ref>Pratt and Gibbons, p. 158</ref> The modified signed-rank sum <math>T_0</math>, the modified positive-rank sum <math>T_0^+</math>, and the modified negative-rank sum <math>T_0^-</math> are defined analogously to <math>T</math>, <math>T^+</math>, and <math>T^-</math> but with the modified ranks in place of the ordinary ranks. The probability that the sum of two independent <math>F</math>-distributed random variables is positive can be estimated as <math>2T_0^+/(n(n - 1))</math>.<ref>Pratt and Gibbons, p. 159</ref> When consideration is restricted to continuous distributions, this is a minimum variance unbiased estimator of <math>p_2</math>.<ref>Pratt and Gibbons, p. 191</ref>
# For <math>i = 1, ..., N</math>, calculate <math>|x_{2,i} - x_{1,i}|</math> and <math>\sgn(x_{2,i} - x_{1,i})</math>, where <math>\sgn</math> is the [[sign function]].
# Exclude pairs with <math>|x_{2,i} - x_{1,i}| = 0</math>. Let <math>N_r</math> be the reduced sample size.
# Order the remaining <math>N_r</math> pairs from smallest absolute difference to largest absolute difference, <math>|x_{2,i} - x_{1,i}|</math>.
# [[Ranking#Ranking in statistics|Rank]] the pairs, starting with the pair with the smallest non-zero absolute difference as 1. Ties receive a rank equal to the average of the ranks they span. Let <math>R_i</math> denote the rank.
# Calculate the [[test statistic]] <math>W</math>
#: <math>W = \sum_{i=1}^{N_r} [\sgn(x_{2,i} - x_{1,i}) \cdot R_i]</math>, the sum of the signed ranks.
# Under null hypothesis, <math>W</math> follows a specific distribution with no simple expression. This distribution has an [[expected value]] of 0 and a [[variance]] of <math>\frac{N_r(N_r + 1)(2N_r + 1)}{6}</math>.
#: <math>W</math> can be compared to a critical value from a reference table.<ref name=lowry>{{cite web|last=Lowry|first=Richard|title=Concepts & Applications of Inferential Statistics|url=http://vassarstats.net/textbook/ch12a.html|access-date=5 November 2018}}</ref>
#: The two-sided test consists in rejecting <math>H_0</math> if <math>|W| > W_{critical, N_r}</math>.
# As <math>N_r</math> increases, the sampling distribution of <math>W</math> converges to a normal distribution. Thus,
#: For <math>N_r \ge 20</math>, a [[Z score|z-score]] can be calculated as <math>z = \frac{W}{\sigma_W}</math>, where <math> \sigma_W = \sqrt{\frac{N_r(N_r + 1)(2N_r + 1)}{6}}</math>.
#: To perform a two-sided test, reject <math>H_0</math> if <math>|z| > z_{critical}</math>.
#: {{-}}
#: Alternatively, one-sided tests can be performed with either the exact or the approximate distribution. [[p-value]]s can also be calculated.
# For <math>N_r < 20 </math> the exact distribution needs to be used.


===Example===
==Example==
{| border="0" | style="text-align: right;"
{| border="0" | style="text-align: right;"
|
|
Line 204: Line 371:


:<math>W = 1.5+1.5-3-4-5-6+7+8+9=9 </math>
:<math>W = 1.5+1.5-3-4-5-6+7+8+9=9 </math>
:<math>|W| < W_{\operatorname{crit}(\alpha = 0.05,\ 9 \text{, two-sided})} = 15 </math><ref>{{Cite web|last=Lowry|first=Richard|title=Concepts & Applications of Inferential Statistics|url=http://vassarstats.net/textbook/ch12a.html|access-date=2020-12-17|language=en-US}}</ref>
:<math>|W| < W_{\operatorname{crit}(\alpha = 0.05,\ 9 \text{, two-sided})} = 15 </math>
:<math> \therefore \text{failed to reject } H_0</math> that the median of pairwise differences is different from zero.
:<math> \therefore \text{failed to reject } H_0</math> that the median of pairwise differences is different from zero.
:The <math>p</math>-value for this result is <math>0.6113</math>
:The <math>p</math>-value for this result is <math>0.6113</math>

===Historical ''T'' statistic===
In historical sources a different statistic, denoted by Siegel as the ''T'' statistic, was used. The ''T'' statistic is the smaller of the two sums of ranks of given sign; in the example, therefore, ''T'' would equal 3+4+5+6=18. Low values of ''T'' are required for significance. ''T'' is easier to calculate by hand than ''W'' and the test is equivalent to the two-sided test described above; however, the distribution of the statistic under <math> H_0 </math> has to be adjusted.
:<math>T > T_{\operatorname{crit}(\alpha = 0.05,\ 9 \text{, two-sided})} = 5</math>
:<math> \therefore \text{failed to reject } H_0</math> that the two medians are the same.
Note: Critical ''T'' values (<math>T_\operatorname{crit}</math>) by values of <math>N_r</math> can be found in appendices of statistics textbooks, for example in Table B-3 of Nonparametric Statistics: A Step-by-Step Approach, 2nd Edition by Dale I. Foreman and Gregory W. Corder
(https://www.oreilly.com/library/view/nonparametric-statistics-a/9781118840429/bapp02.xhtml).

Alternatively, if ''n'' is sufficiently large, the distribution of ''T'' under <math>H_0</math> can be approximated by a normal distribution with mean <math>\frac{n(n+1)}{4}</math> and variance <math>\frac{n(n+1)(2n+1)}{24}</math>.

==Limitation==
As demonstrated in the example, when the difference between the groups is zero, the observations are discarded. This is of particular concern if the samples are taken from a discrete distribution. In these scenarios the modification to the Wilcoxon test by Pratt 1959, provides an alternative which incorporates the zero differences.<ref name="Pratt">{{cite journal|last1=Pratt|first1=J| title=Remarks on zeros and ties in the Wilcoxon signed rank procedures |journal=Journal of the American Statistical Association |date=1959|volume=54|issue=287|pages=655–667|doi=10.1080/01621459.1959.10501526}}</ref><ref name="IndivLikert">{{cite journal|last1=Derrick|first1=B|last2=White|first2=P| title=Comparing Two Samples from an Individual Likert Question |journal=International Journal of Mathematics and Statistics |date=2017|volume=18|issue=3|pages=1–13}}</ref> This modification is more robust for data on an ordinal scale.<ref name="IndivLikert"/>


==Effect size==
==Effect size==
Line 225: Line 380:
To compute an [[effect size]] for the signed-rank test, one can use the [[Mann–Whitney_U_test#Rank-biserial_correlation|rank-biserial correlation]].
To compute an [[effect size]] for the signed-rank test, one can use the [[Mann–Whitney_U_test#Rank-biserial_correlation|rank-biserial correlation]].


If the test statistic ''W'' is reported, the rank correlation r is equal to the test statistic ''W'' divided by the total rank sum ''S'', or&nbsp;''r''&nbsp;=&nbsp;''W''/''S''.
If the test statistic ''T'' is reported, the rank correlation r is equal to the test statistic ''T'' divided by the total rank sum ''S'', or&nbsp;''r''&nbsp;=&nbsp;''T''/''S''.
<ref name="Kerby2014">{{Citation
<ref name="Kerby2014">{{Citation
| last = Kerby
| last = Kerby
Line 234: Line 389:
| pages = 11.IT.3.1
| pages = 11.IT.3.1
| date = 2014
| date = 2014
| doi = 10.2466/11.IT.3.1}}</ref> Using the above example, the test statistic is ''T'' = 9. The sample size of 9 has a total rank sum of ''S'' = (1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9) = 45. Hence, the rank correlation is 9/45, so ''r'' = 0.20.
| doi = 10.2466/11.IT.3.1}}</ref>
Using the above example, the test statistic is ''W'' = 9. The sample size of 9 has a total rank sum of ''S'' = (1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9) = 45. Hence, the rank correlation is 9/45, so ''r'' = 0.20.


If the test statistic ''T'' is reported, an equivalent way to compute the rank correlation is with the difference in proportion between the two rank sums, which is the Kerby (2014) simple difference formula.<ref name="Kerby2014"/> To continue with the current example, the sample size is 9, so the total rank sum is 45. ''T'' is the smaller of the two rank sums, so ''T'' is 3 + 4 + 5 + 6 = 18. From this information alone, the remaining rank sum can be computed, because it is the total sum ''S'' minus ''T'', or in this case 45 − 18 = 27. Next, the two rank-sum proportions are 27/45 = 60% and 18/45 = 40%. Finally, the rank correlation is the difference between the two proportions (.60 minus .40), hence ''r'' = .20.
If the test statistic ''T'' is reported, an equivalent way to compute the rank correlation is with the difference in proportion between the two rank sums, which is the Kerby (2014) simple difference formula.<ref name="Kerby2014"/> To continue with the current example, the sample size is 9, so the total rank sum is 45. ''T'' is the smaller of the two rank sums, so ''T'' is 3 + 4 + 5 + 6 = 18. From this information alone, the remaining rank sum can be computed, because it is the total sum ''S'' minus ''T'', or in this case 45 − 18 = 27. Next, the two rank-sum proportions are 27/45 = 60% and 18/45 = 40%. Finally, the rank correlation is the difference between the two proportions (.60 minus .40), hence ''r'' = .20.


==Software implementations==
==Software implementations==
Line 249: Line 403:


==See also==
==See also==
*[[Mann–Whitney U test|Mann–Whitney–Wilcoxon test]] (the variant for two independent samples)
*[[Mann–Whitney U test|Mann–Whitney–Wilcoxon test]]
*[[Sign test]]
*[[Sign test]] (Like Wilcoxon test, but without the assumption of symmetric distribution of the differences around the median, and without using the magnitude of the difference)


==References==
==References==

Revision as of 00:35, 5 September 2021

The Wilcoxon signed-rank test is a non-parametric statistical hypothesis test used to test either the location of a set of samples, or to compare the locations of two populations using a set of matched samples.[1] When applied to test the location of a set of samples, it serves the same purpose as the one-sample Student's t-test.[2] On a set of matched samples, it is a paired difference test like the paired Student's t-test (also known as the "t-test for matched pairs" or "t-test for dependent samples"). Unlike the Student's t-test, the Wilcoxon signed-rank test does not assume that the data is normally distributed, so it gives correct results on a wider variety of data sets. The cost of this applicability is that, on normally distributed data, the Wilcoxon signed-rank test has less statistical power than the Student's t-test, meaning that it is less likely to detect a truly significant result.

History

The test is named for Frank Wilcoxon (1892–1965) who, in a single paper, proposed both it and the rank-sum test for two independent samples.[3] The test was popularized by Sidney Siegel (1956) in his influential textbook on non-parametric statistics.[4] Siegel used the symbol T for the test statistic, and consequently, the test is sometimes referred to as the Wilcoxon T-test.

Test procedure

There are two variants of the signed-rank test. From a theoretical point of view, the one-sample test is more fundamental because the paired sample test is performed by converting the data to situation of the one-sample test. However, most practical applications of the signed-rank test arise from paired data.

For a paired sample test, the data consists of samples . Each sample is a pair of measurements on an interval scale. The measurements are converted to real numbers, and the paired sample test is converted to a one-sample test by replacing each pair of numbers by its difference .[5]

The data for a one-sample test is a set of real number samples . Assume for simplicity that the samples are all different and that no sample equals zero. (Zeros and ties introduce several complications; see below.) The test is performed as follows:[6][7]

  1. Compute .
  2. Sort these quantities. Define so that .
  3. Let denote the sign function: if and if . The test statistic is the signed-rank sum :
  4. Produce a -value by comparing to its distribution under the null hypothesis.

The signed-rank sum is closely related to two other test statistics. The positive-rank sum and the negative-rank sum are defined by[8] Because equals the sum of all the ranks, which is , these three statistics are related by:[9] Because , , and carry the same information, any of them may be used as the test statistic.

The positive-rank sum and negative-rank sum have the following alternative interpretations. Define the Walsh average to be . Then:[10]

Null and alternative hypotheses

One-sample test

The one-sample Wilcoxon signed-rank test can be used to test whether data comes from a symmetric population with a specified median.[11] If the population median is known, then it can be used to test whether data is symmetric about its center.[12]

To explain the null and alternative hypotheses formally, assume that the data consists of independent and identically distributed samples from a distribution . If and are IID -distributed random variables, define to be the cumulative distribution function of . Set Assume that is continuous. The one-sample Wilcoxon signed-rank sum test is a test for the following null hypothesis against one of the following alternative hypotheses:[13]

Null hypothesis H0
One-sided alternative hypothesis H1
.
One-sided alternative hypothesis H2
.
Two-sided alternative hypothesis H3
.

The alternative hypothesis being tested depends on whether the p-value is one-sided (and if so, which side) or two-sided. The test can also be used as a test for the value of by subtracting from every data point.

The above null and alternative hypotheses are derived from the fact that is a consistent estimator of .[14] It can also be derived from the description of and in terms of Walsh averages, since that description shows that the Wilcoxon test is the same as the sign test applied to the set of Walsh averages.[15]

Restricting the distributions of interest can lead to more interpretable null and alternative hypotheses. One mildly restrictive assumption is that has a unique median. This median is called the pseudomedian of ; in general it is different from the mean and the median, even when all three exist. If the existence of a unique pseudomedian can be assumed true under both the null and alternative hypotheses, then these hypotheses can be restated as:

Null hypothesis H0
The pseudomedian of is located at zero.
One-sided alternative hypothesis H1
The pseudomedian of is located at .
One-sided alternative hypothesis H2
The pseudomedian of is located at .
Two-sided alternative hypothesis H3
The pseudomedian of is located at .

Most often, the null and alternative hypotheses are stated under the assumption of symmetry. Fix a real number . Define to be symmetric about if a random variable with distribution satisfies for all . If has a density function , then is symmetric about if and only if for every .[16]

If the null and alternative distributions of can be assumed symmetric, then the null and alternative hypotheses simplify to the following:[17]

Null hypothesis H0
is symmetric about .
One-sided alternative hypothesis H1
is symmetric about .
One-sided alternative hypothesis H2
is symmetric about .
Two-sided alternative hypothesis H3
is symmetric about .

If in addition , then is a median of . If this median is unique, then the Wilcoxon signed-rank sum test becomes a test for the location of the median.[18] When the mean of is defined, then the mean is , and the test is also a test for the location of the mean.[19]

The restriction that the alternative distribution is symmetric is highly restrictive, but for one-sided tests it can be weakened. Say that is stochastically smaller than a distribution symmetric about zero if an -distributed random variable satsifies for all . Similarly, is stochastically larger than a distribution symmetric about zero if for all . Then the Wilcoxon signed-rank sum test can also be used for the following null and alternative hypotheses:[20][21]

Null hypothesis H0
is symmetric about .
One-sided alternative hypothesis H1
is stochastically smaller than a distribution symmetric about zero.
One-sided alternative hypothesis H2
is stochastically larger than a distribution symmetric about zero.

The hypothesis that the data are IID can be weakened. Each data point may be taken from a different distribution, as long as all the distributions are assumed to be continuous and symmetric about a common point . The data points are not required to be independent as long as the the conditional distribution of each observation given the others is symmetric about .[22]

Paired data test

Because the paired data test arises from taking paired differences, its null and alternative hypotheses can be derived from those of the one-sample test. In each case, they become assertions about the behavior of the differences .

Let be the joint cumulative distribution of the pairs . If is continuous, then the most general null and alternative hypotheses are expressed in terms of and are identical to the one-sample case:

Null hypothesis H0
One-sided alternative hypothesis H1
.
One-sided alternative hypothesis H2
.
Two-sided alternative hypothesis H3
.

Like the one-sample case, under some restrictions the test can be interpreted as a test for whether the pseudomedian of the differences is located at zero.

A common restriction is to symmetric distributions of differences. In this case, the null and alternative hypotheses are:[23][24]

Null hypothesis H0
The observations are symmetric about .
One-sided alternative hypothesis H1
The observations are symmetric about .
One-sided alternative hypothesis H2
The observations are symmetric about .
Two-sided alternative hypothesis H3
The observations are symmetric about .

These can also be expressed more directly in terms of the original pairs:[25]

Null hypothesis H0
The observations are exchangeable, meaning that and have the same distribution. Equivalently, .
One-sided alternative hypothesis H1
For some , the pairs and have the same distribution.
One-sided alternative hypothesis H2
For some , the pairs and have the same distribution.
Two-sided alternative hypothesis H3
For some , the pairs and have the same distribution.

The null hypothesis of exchangeability can arise from a matched pair experiment with a treatment group and a control group. Randomizing the treatment and control within each pair makes the observations exchangeable. For an exchangeable distribution, has the same distribution as , and therefore, under the null hypothesis, the distribution is symmetric about zero.[26]

Symmetry of the differences is a very restrictive condition on the alternative hypothesis. However, because the one-sample test can be used as a one-sided test for stochastic dominance, the paired difference Wilcoxon test can be used to compare the following hypotheses:[27]

Null hypothesis H0
The observations are exchangeable.
One-sided alternative hypothesis H1
The differences are stochastically smaller than a distribution symmetric about zero, that is, for every , .
One-sided alternative hypothesis H2
The differences are stochastically larger than a distribution symmetric about zero, that is, for every , .

Zeros and ties

In real data, it sometimes happens that there is a sample which equals zero or a pair with . It can also happen that there are tied samples. This means that for some , we have (in the one-sample case) or (in the paired sample case). This is particularly common for discrete data. When this happens, the test procedure defined above is usually undefined because there is no way to uniquely rank the data. (The sole exception is if there is a single sample which is zero and no other zeros or ties.) Because of this, the test statistic needs to be modified.

Zeros

Wilcoxon's original paper did not address the question of observations (or, in the paired sample case, differences) that equal zero. However, in later surveys, he recommended removing zeros from the sample.[28] Then the standard signed-rank test could be applied to the resulting data, as long as there were no ties. This is now called the reduced sample procedure.

However, Pratt[29] observed that the reduced sample procedure can lead to paradoxical behavior. He gives the following example. Suppose that we are in the one-sample situation and have the following thirteen observations:

0, 2, 3, 4, 6, 7, 8, 9, 11, 14, 15, 17, 18.

The reduced sample procedure removes the zero. To the remaining data, it assigns the signed ranks:

1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, −12.

This has a one-sided p-value of , and therefore the sample is not significantly positive at any significance level . Pratt argues that one would expect that decreasing the observations should certainly not make the data appear more positive. However, if the zero observation is decreased by an amount less than 2, or if all observations are decreased by an amount less than 1, then the signed ranks become:

−1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, −13.

This has a one-sided p-value of . Therefore the sample would be judged significantly positive at any significance level . The paradox is that, if is between and , then decreasing an insignificant sample causes it to appear significantly positive.

Pratt therefore proposed the signed-rank zero procedure. This procedure includes the zeros when ranking the samples. However, it excludes them from the test statistic, or equivalently takes . Pratt proved that the signed-rank zero procedure has several desirable behaviors not shared by the reduced sample procedure:[30]

  1. Increasing the observed values does not make a significantly positive sample insignificant, and it does not make an insignificant sample significantly negative.
  2. If the distribution of the observations is symmetric, then the values of which the test does not reject form an interval.
  3. A sample is significantly positive, not significant, or significantly negative, if and only if it is so when the zeros are assigned arbitrary non-zero signs, if and only if it is so when the zeros are replaced with non-zero values which are smaller in absolute value than any non-zero observation.
  4. For a fixed significance threshold , and for a test which is randomized to have level exactly , the probability of calling a set of observations significantly positive (respectively, significantly negative) is a non-decreasing (respectively, non-increasing) function of the observations.

Pratt remarks that, when the signed-rank zero procedure is combined with the average rank procedure for resolving ties, the resulting test is a consistent test against the alternative hypothesis that, for all , and differ by at least a fixed constant that is independent of and .[31]

The signed-rank zero procedure has the disadvantage that, when zeros occur, the null distribution of the test statistic changes, so tables of p-values can no longer be used.

When the data is on a Likert scale with equally spaced categories, the signed-rank zero procedure is more likely to maintain the Type I error rate than the reduced sample procedure.[32]

Ties

When the data does not have ties, the ranks are used to calculate the test statistic. In the presence of ties, the ranks are not defined. There are two main approaches to resolving this.

The most common procedure for handling ties, and the one originally recommended by Wilcoxon, is called the average rank or midrank procedure. This procedure assigns numbers between 1 and n to the observations, with two observations getting the same number if and only if they have the same absolute value. These numbers are conventionally called ranks even though the set of these numbers is not equal to (except when there are no ties). The rank assigned to an observation is the average of the possible ranks it would have if the ties were broken in all possible ways. Once the ranks are assigned, the test statistic is computed in the same way as usual.

For example, suppose that the observations satisfy In this case, is assigned rank 1, and are assigned rank , is assigned rank 4, and , , and are assigned rank . Formally, suppose that there is a set of observations all having the same absolute value , that observations have absolute value less than , and that observations have absolute value less than or equal to . If the ties among the observations with absolute value were broken, then these observations would occupy ranks through . The average rank procedure therefore assigns them the rank .

Under the average rank procedure, the null distribution is different in the presence of ties. The average rank procedure also has some disadvantages that are similar to those of the reduced sample procedure for zeros. It is possible that a sample can be judged significantly positive by the average rank procedure; but increasing some of the values so as to break the ties, or breaking the ties in any way whatsoever, results in a sample that the test judges to be not significant. However, increasing all the observed values by the same amount cannot turn a significantly positive result into an insignificant one, nor an insignificant one into a significantly negative one. Furthermore, if the observations are distributed symmetrically, then the values of which the test does not reject form an interval.

The other option for handling ties is a tiebreaking procedure. In a tiebreaking procedure, the observations are assigned distinct ranks in the set . The rank assigned to an observation depends on its absolute value and the tiebreaking rule. Observations with smaller absolute values are always given smaller ranks, just as in the standard rank-sum test. The tiebreaking rule is used to assign ranks to observations with the same absolute value.

Random tiebreaking breaks the ties at random. Under random tiebreaking, the null distribution is the same as when there are no ties, but the result of the test depends not only on the data but on additional random choices. Averaging over these random choices results in the average rank procedure. Conservative tiebreaking breaks the ties in favor of the null hypothesis. When performing a one-sided test in which negative values of tend to be more significant, ties are broken by assigning lower ranks to negative observations and higher ranks to positive ones. When the test makes positive values of significant, ties are broken the other way, and when large absolute values of are significant, ties are broken so as to make as small as possible. Pratt observes that when ties are likely, the conservative tiebreaking procedure "presumably has low power, since it amounts to breaking all ties in favor of the null hypothesis."[33]

The average rank procedure can disagree with tiebreaking procedures. Pratt gives the following example. Suppose that the observations are:

1, 1, 1, 1, 2, 3, −4.

The average rank procedure assigns these the signed ranks

2.5, 2.5, 2.5, 2.5, 5, 6, −7.

This sample is significantly positive at the one-sided level . On the other hand, any tiebreaking rule will assign the ranks

1, 2, 3, 4, 5, 6, −7.

At the same one-sided level , this is not significant.

Computing the null distribution

Computing p-values requires knowing the distribution of under the null hypothesis. There is no closed formula for this distribution.[34] However, for small values of , the distribution may be computed exactly. Under the null hypothesis that the data is symmetric about zero, each is exactly as likely to be positive as it is negative. Therefore the probability that under the null hypothesis is equal to the number of sign combinations that yield divided by the number of possible sign combinations . This can be used to compute the exact distribution of under the null hypothesis.[35]

Computing the distribution of by considering all possibilities requires computing sums, which is intractable for all but the smallest . However, there is an efficient recursion for the distribution of .[36][37] Define to be the number of sign combinations for which . This is equal to the number of subsets of which sum to . The base cases of the recursion are , for all , and for all or . The recursive formula is The formula is true because every subset of which sums to either does not contain , in which case it is also a subset of , or it does contain , in which case removing from the subset produces a subset of which sums to . Under the null hypothesis, the probability mass function of satisfies . The function is closely related to the integer partition function.[38]

If is the probability that under the null hypothesis when there are samples, then satisfies a similar recursion:[39] with similar boundary conditions. There is also a recursive formula for the cumulative distribution function .[40]

For very large , even the above recursion is too slow. In this case, the null distribution can be approximated. The null distributions of , , and are asymptotically normal with means and variances:[41]

Better approximations can be produced using Edgeworth expansions. Using a fourth-order Edgeworth expansion shows that:[42][43] where The technical underpinnings of these expansions are rather involved, because conventional Edgeworth expansions apply to sums of IID continuous random variables, while is a sum of non-identically distributed discrete random variables. The final result, however, is that the above expansion has an error of , just like a conventional fourth-order Edgeworth expansion.[42]

The moment generating function of has the exact formula:[44]

When zeros are present and the signed-rank zero procedure is used, or when ties are present and the average rank procedure is used, the null distribution of changes. Cureton derived a normal approximation for this situation.[45][46] Suppose that the original number of observations was and the number of zeros was . The tie correction is where the sum is over all the size of each group of tied observations. The expectation of is still zero, while the expectation of is If then

Alternative statistics

Wilcoxon[47] originally defined the Wilcoxon rank-sum statistic to be . Early authors such as Siegel[48] followed Wilcoxon. This is appropriate for two-sided hypothesis tests, but it cannot be used for one-sided tests.

Instead of assigning ranks between 1 and n, it is also possible to assign ranks between 0 and . These are called modified ranks.[49] The modified signed-rank sum , the modified positive-rank sum , and the modified negative-rank sum are defined analogously to , , and but with the modified ranks in place of the ordinary ranks. The probability that the sum of two independent -distributed random variables is positive can be estimated as .[50] When consideration is restricted to continuous distributions, this is a minimum variance unbiased estimator of .[51]

Example

1 125 110 1 15
2 115 122  –1 7
3 130 125 1 5
4 140 120 1 20
5 140 140   0
6 115 124  –1 9
7 140 123 1 17
8 125 137  –1 12
9 140 135 1 5
10 135 145  –1 10
order by absolute difference
5 140 140   0    
3 130 125 1 5 1.5 1.5
9 140 135 1 5 1.5 1.5
2 115 122  –1 7 3  –3
6 115 124  –1 9 4  –4
10 135 145  –1 10 5  –5
8 125 137  –1 12 6  –6
1 125 110 1 15 7 7
7 140 123 1 17 8 8
4 140 120 1 20 9 9

is the sign function, is the absolute value, and is the rank. Notice that pairs 3 and 9 are tied in absolute value. They would be ranked 1 and 2, so each gets the average of those ranks, 1.5.

that the median of pairwise differences is different from zero.
The -value for this result is

Effect size

To compute an effect size for the signed-rank test, one can use the rank-biserial correlation.

If the test statistic T is reported, the rank correlation r is equal to the test statistic T divided by the total rank sum S, or r = T/S. [52] Using the above example, the test statistic is T = 9. The sample size of 9 has a total rank sum of S = (1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9) = 45. Hence, the rank correlation is 9/45, so r = 0.20.

If the test statistic T is reported, an equivalent way to compute the rank correlation is with the difference in proportion between the two rank sums, which is the Kerby (2014) simple difference formula.[52] To continue with the current example, the sample size is 9, so the total rank sum is 45. T is the smaller of the two rank sums, so T is 3 + 4 + 5 + 6 = 18. From this information alone, the remaining rank sum can be computed, because it is the total sum S minus T, or in this case 45 − 18 = 27. Next, the two rank-sum proportions are 27/45 = 60% and 18/45 = 40%. Finally, the rank correlation is the difference between the two proportions (.60 minus .40), hence r = .20.

Software implementations

  • R includes an implementation of the test as wilcox.test(x,y, paired=TRUE), where x and y are vectors of equal length.[53]
  • ALGLIB includes implementation of the Wilcoxon signed-rank test in C++, C#, Delphi, Visual Basic, etc.
  • GNU Octave implements various one-tailed and two-tailed versions of the test in the wilcoxon_test function.
  • SciPy includes an implementation of the Wilcoxon signed-rank test in Python
  • Accord.NET includes an implementation of the Wilcoxon signed-rank test in C# for .NET applications
  • MATLAB implements this test using "Wilcoxon rank sum test" as [p,h] = signrank(x,y) also returns a logical value indicating the test decision. The result h = 1 indicates a rejection of the null hypothesis, and h = 0 indicates a failure to reject the null hypothesis at the 5% significance level
  • Julia HypothesisTests package includes the Wilcoxon signed-rank test as "value(SignedRankTest(x, y))"

See also

References

  1. ^ Conover, W. J. (1999). Practical nonparametric statistics (3rd ed.). John Wiley & Sons, Inc. ISBN 0-471-16068-7., p. 350
  2. ^ "Wilcoxon signed-rank test - Handbook of Biological Statistics". www.biostathandbook.com. Retrieved 2021-09-02.
  3. ^ Wilcoxon, Frank (Dec 1945). "Individual comparisons by ranking methods" (PDF). Biometrics Bulletin. 1 (6): 80–83. doi:10.2307/3001968. hdl:10338.dmlcz/135688. JSTOR 3001968.
  4. ^ Siegel, Sidney (1956). Non-parametric statistics for the behavioral sciences. New York: McGraw-Hill. pp. 75–83. ISBN 9780070573482.
  5. ^ Conover, p. 352
  6. ^ Conover, p. 353
  7. ^ Pratt, John W.; Gibbons, Jean D. (1981). Concepts of Nonparametric Theory. Springer-Verlag. ISBN 978-1-4612-5933-6., p. 148
  8. ^ Pratt and Gibbons, p. 148
  9. ^ Pratt and Gibbons, p. 148
  10. ^ Pratt and Gibbons, p. 150
  11. ^ Conover, pp. 352–357
  12. ^ Hettmansperger, Thomas P. (1984). Statistical Inference Based on Ranks. John Wiley & Sons. ISBN 0-471-88474-X., pp. 32, 50
  13. ^ Pratt and Gibbons, p. 153
  14. ^ Pratt and Gibbons, pp. 153–154
  15. ^ Hettmansperger, pp. 38–39
  16. ^ Pratt and Gibbons, pp. 146–147
  17. ^ Pratt and Gibbons, pp. 146–147
  18. ^ Hettmansperger, pp. 30–31
  19. ^ Conover, p. 353
  20. ^ Pratt and Gibbons, pp. 155–156
  21. ^ Hettmansperger, pp. 49–50
  22. ^ Pratt and Gibbons, p. 155
  23. ^ Conover, p. 354
  24. ^ Hollander, Myles; Wolfe, Douglas A.; Chicken, Eric (2014). Nonparametric Statistical Methods (Third ed.). John Wiley & Sons, Inc. ISBN 978-0-470-38737-5., pp. 39–41
  25. ^ Pratt and Gibbons, p. 147
  26. ^ Pratt and Gibbons, p. 147
  27. ^ Hettmansperger, pp. 49–50
  28. ^ Wilcoxon, Frank (1949). Some Rapid Approximate Statistical Procedures. American Cynamic Co.
  29. ^ Pratt, J. (1959). "Remarks on zeros and ties in the Wilcoxon signed rank procedures". Journal of the American Statistical Association. 54 (287): 655–667. doi:10.1080/01621459.1959.10501526.
  30. ^ Pratt, p. 659
  31. ^ Pratt, p. 663
  32. ^ Derrick, B; White, P (2017). "Comparing Two Samples from an Individual Likert Question". International Journal of Mathematics and Statistics. 18 (3): 1–13.
  33. ^ Pratt, p. 661
  34. ^ Hettmansperger, p. 34
  35. ^ Pratt and Gibbons, pp. 148–149
  36. ^ Pratt and Gibbons, pp. 148–149, pp. 186–187
  37. ^ Hettmansperger, p. 171
  38. ^ Pratt and Gibbons, p. 187
  39. ^ Pratt and Gibbons, p. 187
  40. ^ Pratt and Gibbons, p. 187
  41. ^ Pratt and Gibbons, p. 149
  42. ^ a b Kolassa, John E. (1995). "Edgeworth approximations for rank sum test statistics". Statistics and Probability Letters. 24: 169–171.
  43. ^ Hettmansperger, p. 37
  44. ^ Hettmansperger, p. 35
  45. ^ Cureton, Edward E. (1967). "The normal approximation to the signed-rank sampling distribution when zero differences are present". Journal of the American Statistical Association. 62 (319): 1068–1069.
  46. ^ Pratt and Gibbons, p. 193
  47. ^ Wilcoxon, p. 82
  48. ^ Siegel, p. 76
  49. ^ Pratt and Gibbons, p. 158
  50. ^ Pratt and Gibbons, p. 159
  51. ^ Pratt and Gibbons, p. 191
  52. ^ a b Kerby, Dave S. (2014), "The simple difference formula: An approach to teaching nonparametric correlation.", Comprehensive Psychology, 3: 11.IT.3.1, doi:10.2466/11.IT.3.1
  53. ^ Dalgaard, Peter (2008). Introductory Statistics with R. Springer Science & Business Media. pp. 99–100. ISBN 978-0-387-79053-4.

External links