Fay and Wu's H

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search

Fay and Wu's H is a statistical test created by and named after two researchers Justin Fay and Chung-I Wu.[1] The purpose of the test is to distinguish between a DNA sequence evolving randomly ("neutrally") and one evolving under positive selection. This test is an advancement over Tajima's D,[2] which is used to differentiate neutrally evolving sequences from those evolving non-randomly (through directional selection or balancing selection, demographic expansion or contraction or genetic hitchhiking). Fay and Wu's H is frequently used to identify sequences which have experienced selective sweeps in their evolutionary history.


Imagine a DNA sequence which has very few polymorphisms in its alleles across different populations. This could arise due to at least three causes:

  1. The sequence is experiencing heavy negative selection, so any new mutation in the sequence is deleterious and is purged off immediately, or
  2. The sequence just experienced a bout of selective sweep (an allele rose to fixation/near fixation), so all alleles became homogenized. The rare polymorphisms you see are very recent, or
  3. There was a population bottleneck, so all individuals in the population are derived from a small set (or one) common ancestor

Now, when you calculate Tajima's D using all the alleles across all populations, because there is an excess of rare polymorphisms, Tajima's D will show up negative and will tell you that the particular sequence was evolving non-randomly. However, you don't know whether this is because of some selection acting or whether there was some selective sweep recently or due to population expansion/contraction. To know that, you calculate Fay and Wu's H.[3]

Fay and Wu's H not only uses population polymorphism data but also data from an outgroup species. Due to the outgroup species, you can now tell what the ancestral state of the allele was before the two lineages split. If, for example, the ancestral allele was different, you can now say that there was a selective sweep in that region (could be due to linkage too). The magnitude of the selective sweep will be decided by the strength of H. If the allele was the same, it means the sequence is experiencing negative selection and the ancestral state is maintained. On the other hand, an H close to 0 means that there is no evidence of deviation from neutrality.


A significantly positive Fay and Wu's H indicates a deficit of moderate- and high-frequency derived SNPs relative to equilibrium expectations, whereas a significant negative Fay and Wu's H indicates an excess of high-frequency derived SNPs.[4]


  1. ^ Fay, JC.; Wu, CI. (July 2000). "Hitchhiking under positive Darwinian selection". Genetics. 155 (3): 1405–13. PMC 1461156. PMID 10880498.
  2. ^ Tajima F (November 1989). "Statistical method for testing the neutral mutation hypothesis by DNA polymorphism". Genetics. 123 (3): 585–95. PMC 1203831. PMID 2513255.
  3. ^ Hedrick, Philip W. (2005). Genetics of Populations. Jones & Bartlett Learning. p. 436. ISBN 978-0-7637-4772-5.
  4. ^ Sterken R, Kiekens R, Coppens E, Vercauteren I, Zabeau M, Inzé D, Flowers J, Vuylsteke M (October 2009). "A population genomics study of the Arabidopsis core cell cycle genes shows the signature of natural selection". Plant Cell. 21 (10): 2987–98. doi:10.1105/tpc.109.067017. PMC 2782269. PMID 19880799.

Further reading[edit]

  • Hartl, Daniel L.; Clark, Andrew G. (2007). Principles of Population Genetics (4th ed.). Sinauer Associates. ISBN 978-0878933082.

External links[edit]

Computational tools: