Wald–Wolfowitz runs test

From Wikipedia, the free encyclopedia
Jump to: navigation, search

The runs test (also called Wald–Wolfowitz test after Abraham Wald and Jacob Wolfowitz) is a non-parametric statistical test that checks a randomness hypothesis for a two-valued data sequence. More precisely, it can be used to test the hypothesis that the elements of the sequence are mutually independent.

A "run" of a sequence is a maximal non-empty segment of the sequence consisting of adjacent equal elements. For example, the 22-element-long sequence "++++−−−+++−−++++++−−−−" consists of 6 runs, 3 of which consist of "+" and the others of "−". The run test is based on the null hypothesis that each element in the sequence is independently drawn from the same distribution.

Under the null hypothesis, the number of runs in a sequence of N elements[1] is a random variable whose conditional distribution given the observation of N+ positive values[2] and N negative values (N = N+ + N) is approximately normal, with: [3] [4]

These parameters do not assume that the positive and negative elements have equal probabilities of occurring, but only assume that the elements are independent and identically distributed. If the number of runs is significantly higher or lower than expected, the hypothesis of statistical independence of the elements may be rejected.

Runs tests can be used to test:

  1. the randomness of a distribution, by taking the data in the given order and marking with + the data greater than the median, and with – the data less than the median; (Numbers equalling the median are omitted.)
  2. whether a function fits well to a data set, by marking the data exceeding the function value with + and the other data with −. For this use, the runs test, which takes into account the signs but not the distances, is complementary to the chi square test, which takes into account the distances but not the signs.

The Kolmogorov–Smirnov test is more powerful, if it can be applied.[citation needed]

References[edit]

  1. ^ N is the number of elements, not the number of runs.
  2. ^ N+ is the number of elements with positive values, not the number of positive runs
  3. ^ http://www.itl.nist.gov/div898/handbook/eda/section3/eda35d.htm
  4. ^ http://support.sas.com/kb/33/092.html