Fixation index

Fixation index (F_ST) is a measure of population differentiation due to genetic structure. It is frequently estimated from genetic polymorphism data, such as single-nucleotide polymorphisms (SNP) or microsatellites. Developed as a special case of Wright's F-statistics, it is one of the most commonly used statistics in population genetics.

Definition

Two of the most commonly used definitions for F_ST at a given locus are based on the variance of allele frequencies between populations, and on the probability of Identity by descent.

If ${\bar {p}}$ is the average frequency of an allele in the total population, $\sigma _{S}^{2}$ is the variance in the frequency of the allele between different subpopulations, weighted by the sizes of the subpopulations, and $\sigma _{T}^{2}$ is the variance of the allelic state in the total population, F_ST is defined as ^[1]

F_{ST}={\frac {\sigma _{S}^{2}}{\sigma _{T}^{2}}}={\frac {\sigma _{S}^{2}}{{\bar {p}}(1-{\bar {p}})}}

Wright's definition illustrates that F_ST measures the amount of genetic variance that can be explained by population structure. This can also be thought of as the fraction of total diversity that is not a consequence of the average diversity within subpopulations, where diversity is measured by the probability that two randomly selected alleles are different, namely $2p(1-p)$ . If the allele frequency in the $i$ th population is $p_{i}$ and the relative size of the $i$ th population is $c_{i}$ , then

F_{ST}={\frac {{\bar {p}}(1-{\bar {p}})-\sum c_{i}p_{i}(1-p_{i})}{{\bar {p}}(1-{\bar {p}})}}={\frac {{\bar {p}}(1-{\bar {p}})-{\overline {p(1-p)}}}{{\bar {p}}(1-{\bar {p}})}}

Alternatively,^[2]

F_{ST}={\frac {f_{0}-{\bar {f}}}{1-{\bar {f}}}}

where $f_{0}$ is the probability of identity by descent of two individuals given that the two individuals are in the same subpopulation, and ${\bar {f}}$ is the probability that two individuals from the total population are identical by descent. Using this definition, F_ST can be interpreted as measuring how much closer two individuals from the same subpopulation are, compared to the total population. If the mutation rate is small, this interpretation can be made more explicit by linking the probability of identity by descent to coalescent times: Let T₀ and T denote the average time to coalescence for individuals from the same subpopulation and the total population, respectively. Then,

F_{ST}\approx 1-{\frac {T_{0}}{T}}

This formulation has the advantage that the expected time to coalescence can easily be estimated from genetic data, which led to the development of various estimators for F_ST.

Estimation

In practice, none of the quantities used for the definitions can be easily measured. As a consequence, various estimators have been proposed. A particularly simple estimator applicable to DNA sequence data is:^[3]

F_{ST}={\frac {\pi _{\text{Between}}-\pi _{\text{Within}}}{\pi _{\text{Between}}}}

where $\pi _{\text{Between}}$ and $\pi _{\text{Within}}$ represent the average number of pairwise differences between two individuals sampled from different sub-populations ( $\pi _{\text{Between}}$ ) or from the same sub-population ( $\pi _{\text{Within}}$ ). The average pairwise difference within a population can be calculated as the sum of the pairwise differences divided by the number of pairs. However, this estimator is biased when sample sizes are small or if they vary between populations. Therefore, more elaborate methods are used to compute F_ST in practice. Two of the most widely used procedures are the estimator by Weir & Cockerham (1984),^[4] or performing an Analysis of molecular variance. A list of implementations is available at the end of this article.

Interpretation

This comparison of genetic variability within and between populations is frequently used in applied population genetics. The values range from 0 to 1. A zero value implies complete panmixis; that is, that the two populations are interbreeding freely. A value of one implies that all genetic variation is explained by the population structure, and that the two populations do not share any genetic diversity.

For idealized models such as Wright's finite island model, F_ST can be used to estimate migration rates. Under that model, the migration rate is

{\hat {M}}\approx {\frac {1}{2}}\left({\frac {1}{F_{ST}}}-1\right)

.

The interpretation of F_ST can be difficult when the data analyzed are highly polymorphic. In this case, the probability of identity by descent is very low and F_ST can have an arbitrarily low upper bound, which might lead to misinterpretation of the data. Also, strictly speaking F_ST is not a genetic distance, as it does not satisfy the triangle inequality. As a consequence new tools for measuring genetic differentiation continue being developed.

F_ST in humans

Autosomal genetic distances based on classical markers

In their study The History and Geography of Human Genes (1994), Cavalli-Sforza, Menozzi and Piazza provide some of the most detailed and comprehensive estimates of genetic distances between human populations, within and across continents. Their initial database contains 76,676 gene frequencies (using 120 blood polymorphisms), corresponding to 6,633 samples in different locations. By culling and pooling such samples, they restrict their analysis to 491 populations. They focus on aboriginal populations that were at their present location at the end of the 15th century when the great European migrations began.^[5] When studying genetic difference at the world level, the number is reduced to 42 representative populations, aggregating subpopulations characterized by a high level of genetic similarity. For these 42 populations, Cavalli-Sforza and coauthors report bilateral distances computed from 120 alleles. alleles. Among this set of 42 world populations, the greatest genetic distance observed is between Mbuti Pygmies and Papua New Guineans, where the Fst distance is 0.4573, while the smallest genetic distance (0.0021) is between the Danish and the English. When considering more disaggregated data for 26 European populations, the smallest genetic distance (0.0009) is between the Dutch and the Danes, and the largest (0.0667) is between the Lapps and the Sardinians. The mean genetic distance among the 861 available pairs in the world population is 0.1338. Here are some Fst calculated by Cavalli-Sforza 1994 for some populations :

Fst (Cavalli 1994)	W.African	Berber	Indian	Iranian	Near Eastern	Japanese	Basque	Lapp	Sardinian	Danish	English	Greek	Italian
W.African	0	1642	1748	1796	1454	2252	1299	1689	2062	1459	1487	1356	1794
Berber	1642	0	497	408	263	1707	392	736	619	313	273	429	315
Indian	1748	497	0	154	229	718	418	459	449	293	280	272	261
Iranian	1796	408	154	0	158	1059	285	423	314	179	197	70	133
Near Eastern	1454	263	229	158	0	1056	246	423	329	238	236	129	208
Japanese	2252	1707	718	1059	1056	0	1481	947	1558	1176	1244	1175	1145
Basque	1299	392	418	285	246	1481	0	629	348	184	119	231	141
Lapp	1689	736	459	423	423	947	629	0	667	334	404	308	339
Sardinian	2062	619	449	314	329	1558	348	667	0	348	340	190	221
Danish	1459	313	293	179	238	1176	184	334	348	0	21	191	72
English	1487	273	280	197	236	1244	119	404	340	21	0	204	51
Greek	1356	429	272	70	129	1175	231	308	190	191	204	0	77
Italian	1794	315	261	133	208	1145	141	339	221	72	51	77	0

Autosomal genetic distances based on SNPs

More recently, the International HapMap Project estimated F_ST for three human populations using SNP data. Across the autosomes, F_ST was estimated to be 0.12. The significance of this F_ST value in humans is contentious. As an F_ST of zero indicates no divergence between populations, whereas an F_ST of one indicates complete isolation of populations, Anthropologists often cite Lewontin's 1972 work which came to a similar value and interpreted this number as meaning there was little biological differences between human races.^[6] On the other hand, while an F_ST value of 0.12 is lower than that found between populations of many other species, Henry Harpending argued that this value implies on a world scale a "kinship between two individuals of the same human population is equivalent to kinship between grandparent and grandchild or between half siblings".^[7]

Intercontinental autosomal genetic distances based on SNPs^[8]
	Europe (CEU)	Sub-Saharan Africa (Yoruba)	East-Asia (Japanese)
Sub-Saharan Africa (Yoruba)	0.153
East-Asia (Japanese)	0.111	0.190
East-Asia (Chinese)	0.110	0.192	0.007

Intra-European/mediterranean autosomal genetic distances based on SNPs^[8]^[9]
	Italians	Palestinians	Swedish	Finns	Spanish	Germans	Russians
Palestinians	0.0064
Swedish	0.0064-0.0090	0.0191
Finns	0.0130-0.0230		0.0050-0.0110
Spanish	0.0010-0.0050	0.0101	0.0040-0055	0.0110-0.0170
Germans	0.0029-0.0080	0.0136	0.0007-0.0010	0.0060-0.0130	0.0015-0.0030
Russians	0.0088-0.0120	0.0202	0.0030-0.0036	0.0060-0.0120	0.0070-0.0079	0.0030-0.0037
French	0.0030-0.0050		0.0020	0.0080-0.0150	0.0010	0.0010	0.0050
Greeks	0.0000	0.0057	0.0084		0.0035	0.0039	0.0108

Programs for calculating F_ST

Modules for calculating F_ST

References

^ Holsinger, Kent E.; Bruce S. Weir (2009). "Genetics in geographically structured populations: defining, estimating and interpreting FST". Nat Rev Genet. 10 (9): 639–650. doi:10.1038/nrg2611. ISSN 1471-0056. PMID 19687804.
^ Richard Durrett (12 August 2008). Probability Models for DNA Sequence Evolution. Springer. ISBN 978-0-387-78168-6. Retrieved 25 October 2012.
^ Hudson, RR.; Slatkin, M.; Maddison, WP. (Oct 1992). "Estimation of Levels of Gene Flow from DNA Sequence Data". Genetics. 132 (2): 583–9. PMC 1205159. PMID 1427045.
^ Weir, B. S.; Cockerham, C. Clark (1984). "Estimating F-Statistics for the Analysis of Population Structure". Evolution. 38 (6): 1358. doi:10.2307/2408641. ISSN 0014-3820.
^ Cavalli-Sforza et al., 1994, p. 24
^ Lewontin, Richard C. (1972). "The apportionment of human diversity". Evolutionary biology. 6 (38): 381–398. doi:10.1007/978-1-4684-9063-3_14.
^ Harpending, Henry (2002-11-01). "Kinship and Population Subdivision" (PDF). Population & Environment. 24 (2): 141–147. doi:10.1023/A:1020815420693. JSTOR 27503827.
^ ^a ^b Nelis, Mari; et al. (2009-05-08). Fleischer, Robert C. (ed.). "Genetic Structure of Europeans: A View from the North–East". PLoS ONE. 4 (5): e5472. Bibcode:2009PLoSO...4.5472N. doi:10.1371/journal.pone.0005472. PMC 2675054. PMID 19424496.{{cite journal}}: CS1 maint: unflagged free DOI (link), see table
^ Tian, Chao; et al. (November 2009). "European Population Genetic Substructure: Further Definition of Ancestry Informative Markers for Distinguishing among Diverse European Ethnic Groups". Molecular Medicine. 15 (11–12): 371–383. doi:10.2119/molmed.2009.00094. ISSN 1076-1551. PMC 2730349. PMID 19707526., see table
^ Crawford, Nicholas G. (2010). "smogd: software for the measurement of genetic diversity". Molecular Ecology Resources. 10 (3): 556–557. doi:10.1111/j.1755-0998.2009.02801.x. PMID 21565057.

External links

BioPerl - Bio::PopGen::PopStats

[1] Holsinger, Kent E.; Bruce S. Weir (2009). "Genetics in geographically structured populations: defining, estimating and interpreting FST". Nat Rev Genet. 10 (9): 639–650. doi:10.1038/nrg2611. ISSN 1471-0056. PMID 19687804.

[Durrett2008-2] Richard Durrett (12 August 2008). Probability Models for DNA Sequence Evolution. Springer. ISBN 978-0-387-78168-6. Retrieved 25 October 2012.

[Hudson1992-3] Hudson, RR.; Slatkin, M.; Maddison, WP. (Oct 1992). "Estimation of Levels of Gene Flow from DNA Sequence Data". Genetics. 132 (2): 583–9. PMC 1205159. PMID 1427045.

[WeirCockerham1984-4] Weir, B. S.; Cockerham, C. Clark (1984). "Estimating F-Statistics for the Analysis of Population Structure". Evolution. 38 (6): 1358. doi:10.2307/2408641. ISSN 0014-3820.

[5] Cavalli-Sforza et al., 1994, p. 24

[6] Lewontin, Richard C. (1972). "The apportionment of human diversity". Evolutionary biology. 6 (38): 381–398. doi:10.1007/978-1-4684-9063-3_14.

[7] Harpending, Henry (2002-11-01). "Kinship and Population Subdivision" (PDF). Population & Environment. 24 (2): 141–147. doi:10.1023/A:1020815420693. JSTOR 27503827.

[nelis-8] Nelis, Mari; et al. (2009-05-08). Fleischer, Robert C. (ed.). "Genetic Structure of Europeans: A View from the North–East". PLoS ONE. 4 (5): e5472. Bibcode:2009PLoSO...4.5472N. doi:10.1371/journal.pone.0005472. PMC 2675054. PMID 19424496.{{cite journal}}: CS1 maint: unflagged free DOI (link), see table

[9] Tian, Chao; et al. (November 2009). "European Population Genetic Substructure: Further Definition of Ancestry Informative Markers for Distinguishing among Diverse European Ethnic Groups". Molecular Medicine. 15 (11–12): 371–383. doi:10.2119/molmed.2009.00094. ISSN 1076-1551. PMC 2730349. PMID 19707526., see table

[10] Crawford, Nicholas G. (2010). "smogd: software for the measurement of genetic diversity". Molecular Ecology Resources. 10 (3): 556–557. doi:10.1111/j.1755-0998.2009.02801.x. PMID 21565057.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

v t e Population genetics
Key concepts	Hardy–Weinberg principle Genetic linkage Identity by descent Linkage disequilibrium Fisher's fundamental theorem Neutral theory Shifting balance theory Price equation Coefficient of inbreeding Coefficient of relationship Selection coefficient Fitness Heritability Population structure Constructive neutral evolution
Selection	Natural Artificial Sexual Ecological
Effects of selection on genomic variation	Genetic hitchhiking Background selection
Genetic drift	Small population size Population bottleneck Founder effect Coalescence Balding–Nichols model
Founders	R. A. Fisher J. B. S. Haldane Sewall Wright
Related topics	Biogeography Evolution Evolutionary game theory Fitness landscape Genetic genealogy Landscape genetics and genomics Microevolution Population genomics Phylogeography Quantitative genetics
Index of evolutionary biology articles