Stepwise mutation model
This article may be too technical for most readers to understand. Please help improve it to make it understandable to non-experts, without removing the technical details. (April 2018) (Learn how and when to remove this template message)
The stepwise mutation model (SMM) is a mathematical theory, developed by Motoo Kimura and Tomoko Ohta, that allows for investigation of the equilibrium distribution of allelic frequencies in a finite population where neutral alleles are produced in step-wise fashion.
The original model assumes that if an allele has a mutation that causes it to change in state, mutations that occur in repetitive regions of the genome will increase or decrease by a single repeat unit at a fixed rate (i.e. by the addition or subtraction of one repeat unit per generation) and these changes in allele states are expressed by an integer (. . . A-1, A, A1, .. .). The model also assumes random mating and that all alleles are selectively equivalent for each locus. The SMM is distinguished from the Kimura-Crow model, also known as the infinite alleles model (IAM), in that as the population size increases to infinity, while the product of the Ne (effective population size) and the mutation rate is fixed, the mean number of different alleles in the population rapidly reaches a peak and plateaus, at which time that value is almost the same as the effective number of alleles.
Differences in the length of "simple sequence repeats" (SSRs) between individuals can thus be used to construct phylogenies (i.e. determine relatedness of individuals) or determine genetic distance between groups of individuals. For example, more genetically distant individuals would show larger differences in the size of SSRs than more closely related individuals. Given the underlying assumptions of the SMM, it has been widely adopted for use with microsatellite markers that contain repeat regions, are co-dominate, and have high rates of mutation.
The original SMM has been modified in multiple ways, including:
- taking into account the upper size limit to most microsatellites
- factoring in the likelihood of large alleles to show higher rates of mutation than small alleles
- and including variations that suggest that mutations are split between point mutations that disrupt stretches of repeats and the additions or removal of repeat units. This last assumption provides an explanation for why microsatellites do not evolve into enormous arrays of infinite size.
A number of summary statistics can be used to estimate genetic differentiation using the SMM model. These include number of alleles, observed and expected heterozygosity, and allele frequencies. The SMM model takes into account the frequency of mismatches between microsatellite loci, meaning the number of times there are no mismatches, single mismatches, 2 mismatches, etc. Variance in allele sizes are used to make inferences about the genetic distance between individuals or populations. By comparing summary statistics at different levels of organization it is possible to make inferences about population histories. For example, we can examine the variance of allele size within a subpopulation as well as within the total population to infer something about population history.
Construction of phylogenies under the SMM is, however, complicated by the fact that it is possible to either gain or lose a repeat unit, thus alleles that are identical in size are not necessarily identical by descent (i.e. they show marker-size homoplasy). Therefore the SMM cannot be used to determine the exact number of mutational events between two individuals. For example, individual A might have gained a single additional repeat (from an ancestor who had 9) whereas individual B might have lost a single repeat (from an ancestor who had 11), resulting in both individuals with identical number of microsatellite repeats (that is, 10 repeats for a particular locus).
Some important caveats and limitations to consider when choosing molecular markers for estimating the relatedness of individuals or distinguishing between populations include the following:
- There are limitations associated with various marker types and the number of markers used can heavily influence analytical results (with a higher number of markers generally showing greater ability to resolve genetic differences).
- Molecular markers provide only a “sample” of the genetic information in which to compare individuals of populations, and can differ from actual genetic differentiation. For example, it is possible that two individual are identical at a given locus, having the same mutation even from its common ancestor, but could differ at other loci that were not observed (or sequenced).
- Kimura, M., & Ohta, T. (1978). Stepwise mutation model and distribution of allelic frequencies in a finite population. Proceedings of the National Academy of Sciences, 75(6), 2868-2872.
- Valdes, A. M.; Slatkin, M.; Freimer, N. B. (1993). "Allele Frequencies at Microsatellite Loci: The Stepwise Mutation Model Revisited". Genetics. 133:3. ISSN 0016-6731.
- Chen, X., Cho, Y., & McCouch, S. (2002). Sequence divergence of rice microsatellites in Oryza and other plant species. Molecular Genetics and Genomics, 268(3), 331-343.
- Ellegren, H.(2004) Microsatellites: Simple Sequences with Complex Evolution. Nature Reviews Genetics. 5: 435-445.
- Laval, G., SanCristobal, M., Chevalet, C. (2002). Measuring genetic distances between breeds: use of some distances in various short term evolution models. Genet. Sel. Evol. 34: 481-507.
- Estoup, A., Jarne, P., & Cornuet, J. M. (2002). Homoplasy and mutation model at microsatellite loci and their consequences for population genetics analysis. Molecular ecology, 11(9), 1591-1604.