Jump to content

Human Genetic Diversity: Lewontin's Fallacy

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by Chamaemelum (talk | contribs) at 03:20, 17 June 2023. The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Edwards' claim was that genetic variants, when considered together, are able to classify humans into population groups, but this is overlooked when considering each genetic variant individually.

"Human Genetic Diversity: Lewontin's Fallacy" is a 2003 paper by A. W. F. Edwards.[1] He criticises an argument first made in Richard Lewontin's 1972 article "The Apportionment of Human Diversity", that race is taxonomically invalid because most genetic variation is due to individual differences within populations, not between populations.[2] Edwards argued that this does not refute the biological reality of race since genetic analysis can usually make correct inferences about the perceived race of a person from whom a sample is taken, and that the rate of success increases when more genetic loci are examined.[1]

Edwards makes the argument that Lewontin's conclusions are based on analyzing data on the assumption that it contains no information beyond that revealed by a variant-by-variant analysis, and that the taxonomic significance of genetic data arises from "correlations amongst the different loci."[1]

Edwards' paper was reprinted, commented upon by experts such as Noah Rosenberg,[3] and given further context in an interview with philosopher of science Rasmus Grønfeldt Winther in a 2018 anthology.[4] Edwards' critique is discussed in a number of academic and popular science books, with varying degrees of support.[5][6][7] For example, Winther and Jonathan Marks dispute the premise of Lewontin's fallacy, arguing that Edwards' critique does not contradict Lewontin's argument.[7][8][9] A 2007 paper in Genetics by David J. Witherspoon et al. concluded that the two arguments are in fact compatible, and that Lewontin's observation about the distribution of genetic differences across ancestral population groups applies "even when the most distinct populations are considered and hundreds of loci are used".[10] Indeed, Edwards recognized that "there is nothing wrong with Lewontin’s statistical analysis of variation," only claiming that it is not "relevant to classification."[1]

Lewontin's original argument

In the 1972 study "The Apportionment of Human Diversity", Richard Lewontin performed a fixation index (FST) statistical analysis using 17 markers, including blood group proteins, from individuals across classically defined "races" (Caucasian, African, Mongoloid, South Asian Aborigines, Amerinds, Oceanians, and Australian Aborigines). He found that the majority of the total genetic variation between humans (i.e., of the 0.1% of DNA that varies between individuals), 85.4%, is found within populations, 8.3% of the variation is found between populations within a "race", and only 6.3% was found to account for the racial classification. Subsequent studies have confirmed the finding that roughly 85% of the locus-by-locus variance is found within populations.[6] Based on this analysis, Lewontin concluded, "Since such racial classification is now seen to be of virtually no genetic or taxonomic significance either, no justification can be offered for its continuance." Lewontin's argument has been used to claim that racial categories are biologically meaningless, and that behavioral differences between groups are not caused by genetic differences.[7] The figure of 6.3–10% of total genetic variation being attributed to the variation between races is calculated by averaging over the separate contributions of a 17 individual genes that were sampled in different studies.[2]

Edwards' critique

Edwards argued that Lewontin's results on variability on locus-by-locus level are technically correct, but locus-level variability has no relevance to classification, and it is nonetheless possible to classify individuals into different racial groups with an accuracy that approaches 100 percent when one takes into account the frequency of the alleles at multiple loci at the same time. This happens because differences in the frequency of alleles at different loci are correlated across populations—the alleles that are more frequent in a population at two or more loci are correlated when we consider the two populations simultaneously. Or in other words, the frequency of the alleles tends to cluster differently for different populations.[11]

Using Lewontin’s 84% of the variability within groups figure, the probability of misclassification quickly decreases as the number of gene loci increases.
Using Lewontin’s 84% figure of the variability within groups figure, Edwards' calculations shows that the probability of misclassification quickly decreases as the number of gene loci increases.

In Edwards' words, "most of the information that distinguishes populations is hidden in the correlation structure of the data". These relationships can be elucidated using commonly used ordination and cluster analysis techniques. Edwards argued that, even if the probability of misclassifying an individual based on the frequency of alleles at a single locus is as high as 30% or 50% (as Lewontin reported in 1972), the misclassification probability becomes close to zero if enough loci are studied.[12]

Edwards' paper stated that the underlying logic was discussed in the early years of the 20th century. Edwards wrote that he and Luigi Luca Cavalli-Sforza had presented a contrasting analysis to Lewontin's, using very similar data, already at the 1963 International Congress of Genetics. Lewontin participated in the conference but did not refer to this in his later paper. Edwards argued that Lewontin used his analysis to attack human classification in science for social reasons,[12] citing Lewontin's writing that "the whole history of the problem of genetic variation is a vivid illustration of the role that deeply embedded ideological assumptions play in determining scientific 'truth.'"

Principal Component Analysis

Two graphs showing the first two principal components of genetic variation. The top graph shows the first to principal components within continental populations, and the bottom graph shows the first two principal components of the genetic variation within Britain.

Principal Component Analysis (PCA) is a statistical procedure that is commonly used in data analysis to simplify complex, multivariate data sets. It reduces the dimensionality (i.e., the number of variables) of these datasets while preserving the structure of the dataset. Critically, it can be applied to multi-locus data. The first principle component explains as much of the variance as is possible with one dimension. Edwards' wrote that for the first principle component, the between-population variance "is very much greater" than the within-population variance.

The relevance of PCA to Lewontin's Fallacy comes into play when considering the role of multiple genetic markers, or loci, in classifying individuals into groups. Each genetic marker can be thought of as a dimension in a high-dimensional space. Just as PCA can reveal the 'principal components' or directions of highest variance in a complex dataset, it can similarly reveal the principal axes along which genetic information varies the most. The first principal component would account for the most genetic variance and might correspond to a major differentiation, like continental ancestry. Subsequent components would account for progressively less variance and might correspond to more subtle differentiations, such as regional ancestries or longitude and latitude. In this way, PCA can reveal patterns in genetic data that allow for an accurate classification of individuals into populations. Despite most genetic variation on a locus-by-locus level being found within populations, the variation between populations can still be statistically significant and informative when considering multiple genetic loci.

Edwards' derivation

As an illustration of the statistical error referred to as Lewontin's Fallacy, Edwards proposed a hypothetical situation involving two populations of size . He imagined that each of these populations possessed a particular gene, labeled as '+' or '-', at a single diallelic locus. For Population 1, the frequency of the '+' gene was denoted as , and for Population 2, the frequency was designated as .

The sum of the frequencies of the two genes, and , must equal one (), so the '-' gene's frequency is in Population 1 and in Population 2. Using Lewontin's figure, 84% of the gene variability occurs within populations when and because the ratio of the within-population sum of squares to the total sum of squares is .

The misclassification probability of an individual based on the '+' gene alone is . This suggests that a single locus is an inadequate indicator of an individual's population origin: for , this would result in a correct racial classification only 30% of the time with a single, typical locus.

However, Edwards extended this model by introducing loci, each having a gene frequency of in Population 1 and in Population 2. Despite the addition of loci, the within-to-total variability ratio remains unchanged at each locus (still 84%). The sum of '+' genes an individual carries will follow a binomial distribution with mean in Population 1 and in Population 2, with the variance of in both populations. If we maintain the same gene frequencies as before and assume loci, the means are 30 and 70, with variances of 21 and standard deviations of approximately 4.58. This results in extremely minimal overlap between the distributions and an almost zero probability of misclassification based solely on the count of '+' genes.

To further illustrate the power of multiple loci in discrimination, Edwards proceeded to explore a scenario where the '+' and '-' labels were randomly swapped at each locus with a probability of 0.5, and the population of origin for each individual was unknown. Edwards shows that in this case, the use of a cluster analysis, which maximizes the between-cluster sum of squares (or equivalently, minimizes the sum of the within-cluster sums of squares), could successfully separate the populations.

By calculating the pairwise distances between individuals across the loci, Edwards discovered that the probability of a match is for individuals within the same population and for individuals from different populations. With loci, the distance between individuals from the same population follows a binomial distribution with mean and variance . The distance for individuals from different populations follows a binomial distribution with mean and variance . With , and , the mean distances are 58 and 42 respectively, with variances of approximately 24.36 and standard deviations of approximately 4.936. This results in a substantial distance between the means of the two populations (over 3 standard deviations), enabling effective classification with the probability of misclassification approaching zero as the number of loci increases.

Support and criticism

Evolutionary biologist Richard Dawkins discusses genetic variation across human races in his book The Ancestor's Tale.[5] In the chapter "The Grasshopper's Tale", he characterizes the genetic variation between races as a very small fraction of the total human genetic variation, but he disagrees with Lewontin's conclusions about taxonomy, writing: "However small the racial partition of the total variation may be, if such racial characteristics as there are highly correlate with other racial characteristics, they are by definition informative, and therefore of taxonomic significance."[5] Neven Sesardić has argued that, unbeknownst to Edwards, Jeffry B. Mitton had already made the same argument about Lewontin's claim in two articles published in The American Naturalist in the late 1970s.[13][14][15]

Biological anthropologist Jonathan M. Marks agrees with Edwards that correlations between geographical areas and genetics obviously exist in human populations but goes on to write:

What is unclear is what this has to do with 'race' as that term has been used through much in the twentieth century—the mere fact that we can find groups to be different and can reliably allot people to them is trivial. Again, the point of the theory of race was to discover large clusters of people that are principally homogeneous within and heterogeneous between, contrasting groups. Lewontin's analysis shows that such groups do not exist in the human species, and Edwards' critique does not contradict that interpretation.[7]

The view that while geographic clustering of biological traits does exist, this does not lend biological validity to racial groups, was proposed by several evolutionary anthropologists and geneticists prior to the publication of Edwards' critique of Lewontin.[16][17][18][19][20]

In the 2007 paper "Genetic Similarities Within and Between Human Populations",[10] Witherspoon et al. attempt to answer the question "How often is a pair of individuals from one population genetically more dissimilar than two individuals chosen from two different populations?" The answer depends on the number of polymorphisms used to define that dissimilarity, and the populations being compared. When they analysed three geographically distinct populations (European, African, and East Asian) and measured genetic similarity over many thousands of loci, the answer to their question was "never"; however, measuring similarity using smaller numbers of loci yielded substantial overlap between these populations. Rates of between-population similarity also increased when geographically intermediate and admixed populations were included in the analysis.[10]

Witherspoon et al. write:

Since an individual's geographic ancestry can often be inferred from his or her genetic makeup, knowledge of one's population of origin should allow some inferences about individual genotypes. To the extent that phenotypically important genetic variation resembles the variation studied here, we may extrapolate from genotypic to phenotypic patterns. ... However, the typical frequencies of alleles responsible for common complex diseases remain unknown. The fact that, given enough genetic data, individuals can be correctly assigned to their populations of origin is compatible with the observation that most human genetic variation is found within populations, not between them. It is also compatible with our finding that, even when the most distinct populations are considered and hundreds of loci are used, individuals are frequently more similar to members of other populations than to members of their own population. Thus, caution should be used when using geographic or genetic ancestry to make inferences about individual phenotypes.[10]

Witherspoon et al. add: "A final complication arises when racial classifications are used as proxies for geographic ancestry. Although many concepts of race are correlated with geographic ancestry, the two are not interchangeable, and relying on racial classifications will reduce predictive power still further."[10]

In a 2014 paper, reprinted in the 2018 Edwards Cambridge University Press volume, Rasmus Grønfeldt Winther argues that "Lewontin's fallacy" is effectively a misnomer, as there really are two different sets of methods and questions at play in studying the genomic population structure of our species: "variance partitioning" and "clustering analysis". According to Winther, they are "two sides of the same mathematics coin" and neither "necessarily implies anything about the reality of human groups".[8]

See also

References

  1. ^ a b c d Edwards, A. W. F. (2003). "Human genetic diversity: Lewontin's fallacy". BioEssays. 25 (8): 798–801. doi:10.1002/bies.10315. PMID 12879450.
  2. ^ a b Lewontin, R. C. (1972). "The Apportionment of Human Diversity". Evolutionary Biology. pp. 381–398. doi:10.1007/978-1-4684-9063-3_14. ISBN 978-1-4684-9065-7. S2CID 21095796.
  3. ^ Rosenberg, N. (2018). "Variance-Partitioning and Classification in Human Population Genetics". In R.G. Winther (ed.). Phylogenetic Inference, Selection Theory, and History of Science: Selected Papers of AWF Edwards with Commentaries. pp. 399–403. ISBN 9781107111721.
  4. ^ Edwards, A.W.F. (2018). "Human Genetic Diversity: Lewontin's Fallacy". In R.G. Winther (ed.). Phylogenetic Inference, Selection Theory, and History of Science: Selected Papers of AWF Edwards with Commentaries. pp. 249–253. ISBN 9781107111721.
  5. ^ a b c Dawkins, R. (2005). The Ancestor's Tale: A Pilgrimage to the Dawn of Evolution. with additional research by Y. Wong. New York: Houghton Mifflin Harcourt. pp. 406–407. ISBN 9780618619160.
  6. ^ a b Ramachandran, S.; Tang, H.; Gutenkunst, R. N.; Bustamante, C. D. (2010). "Chapter 20: Genetics and Genomics of Human Population Structure" (PDF). In Speicher, M. R.; et al. (eds.). Vogel and Motulsky's Human Genetics: Problems and Approaches. Heidelberg: Springer. p. 596. doi:10.1007/978-3-540-37654-5. ISBN 978-3-540-37653-8. Archived from the original (PDF) on 3 December 2013. Retrieved 29 October 2013.
  7. ^ a b c d Marks, Jonathan M. (2010). "Ten Facts about Human Variation". In Muehlenbein, M. P. (ed.). Human Evolutionary Biology. Cambridge University Press. p. 270. ISBN 9781139789004.
  8. ^ a b Winther, R.G. (2018). "The Genetic Reification of "Race"? A Story of Two Mathematical Methods". In R.G. Winther (ed.). Phylogenetic Inference, Selection Theory, and History of Science: Selected Papers of AWF Edwards with Commentaries. pp. 489, 488–508. ISBN 9781107111721.
  9. ^ Winther, R.G. (2018). "Race and Biology". In Paul C. Taylor; Linda Martín Alcoff; Luvell Anderson (eds.). The Routledge Companion to the Philosophy of Race. pp. 305–320. ISBN 9781107111721.
  10. ^ a b c d e Witherspoon, David. J.; Wooding, S.; Rogers, A. R.; Marchani, E. E.; Watkins, W. S.; Batzer, M. A.; Jorde, L. B. (2007). "Genetic Similarities Within and Between Human Populations". Genetics. 176 (1): 351–359. doi:10.1534/genetics.106.067355. PMC 1893020. PMID 17339205.
  11. ^ Bhatt, C. (2010). "The spirit lives on: race and the disciplines". In Hill Collins, P.; Solomos, J. (eds.). The SAGE handbook of race and ethnic studies. London: SAGE. p. 115. ISBN 9780761942207.
  12. ^ a b McCabe, Linda L.; McCabe, Edward R. B. (2008). DNA: promise and peril. University of California Press. pp. 76–77. ISBN 9780520933934. Retrieved July 13, 2011.
  13. ^ Sesardić, Neven (2010). "Race: a social destruction of a biological concept". Biology & Philosophy. 25 (2): 143–162. CiteSeerX 10.1.1.638.939. doi:10.1007/s10539-009-9193-7. S2CID 3013094.
  14. ^ Mitton, J. B. (1977). "Genetic Differentiation of Races of Man as Judged by Single-Locus and Multilocus Analyses". The American Naturalist. 111 (978): 203–212. doi:10.1086/283155. S2CID 85018125.
  15. ^ Mitton, J. B. (1978). "Measurement of Differentiation: Reply to Lewontin, Powell, and Taylor". The American Naturalist. 112 (988): 1142–1144. doi:10.1086/283359. S2CID 86524123.
  16. ^ American Anthropological Association (1998). "American Anthropological Association Statement on 'Race'".
  17. ^ Weiss, K. M.; Fullerton, S. M. (2005). "Racing around, getting nowhere". Evolutionary Anthropology: Issues, News, and Reviews. 14 (5): 165. doi:10.1002/evan.20079. S2CID 84927946.
  18. ^ Graves, Joseph L. (2003). The Emperor's New Clothes: Biological Theories of Race at the Millennium. Rutgers University Press. ISBN 978-0-8135-2847-2.
  19. ^ Brace, C (2005). "Race" is a four-letter word : the genesis of the concept. New York: Oxford University Press. ISBN 9780195173512.
  20. ^ "RACE: Are We So Different? - Learn and Teach". www.aaanet.org.