|Part of a series on|
Molecular evolution is a change in the sequence composition of cellular molecules such as DNA, RNA, and proteins over long periods of time. Molecular evolution attempts to explain biological change at the molecular and cellular level using the principles of evolutionary biology and population genetics. Major topics in molecular evolution concern the rates and impacts of single nucleotide change, neutral evolution vs. natural selection, origins of new genes, the genetic nature of complex traits, the genetic basis of speciation, evolution of development, and ways that evolutionary forces influence genomic and phenotypic changes.
- 1 Forces in molecular evolution
- 2 Genome architecture
- 3 Origins of new genes
- 4 Molecular phylogenetics
- 5 The driving forces of evolution
- 6 Journals and societies
- 7 See also
- 8 References
- 9 Further reading
Forces in molecular evolution
The content and structure of a genome is the product of the molecular and population genetic forces which act upon that genome. Novel genetic variants will arise through mutation and will spread and be maintained in populations due to genetic drift or natural selection.
Mutations are permanent, transmissible changes to the genetic material (DNA or RNA) of a cell or virus. Mutations result from errors in DNA replication during cell division and by exposure to radiation, chemicals, and other environmental stressors, or viruses and transposable elements. Most mutations that occur are Single nucleotide polymorphisms which modify single bases of the DNA sequence. Other types of mutations modify larger segments of DNA and can cause duplications, insertions, deletions, inversions, and translocations.
Most organisms display a strong bias in the types of mutations that occur with strong influence in GC content. Transitions (A ↔ G or C ↔ T) are less common than transversions (purine ↔ pyrimidine) and are less likely to alter amino acid sequences of proteins.
Mutations are stochastic and typically occur randomly across genes. Mutation rates for single nucleotide sites for most organisms are very low, roughly 10-9 to 10-8 per site per generation, though some viruses have higher mutation rates on the order of 10-6 per site per generation. Among these mutations, some will be neutral or beneficial and will remain in the genome unless lost via Genetic drift, and others will be detrimental and will be eliminated from the genome by natural selection.
Because mutations are extremely rare, they accumulate very slowly across generations. While the number of mutations which appears in any single generation may vary, over very long time periods they will appear to accumulate at a regular pace. Using the mutation rate per generation and the number of nucleotide differences between two sequences, divergence times can be estimated effectively via the molecular clock.
Recombination is a process that results in genetic exchange between chromosomes or chromosomal regions. Recombination counteracts physical linkage between adjacent genes, thereby reducing genetic hitchhiking. The resulting independent inheritance of genes results in more efficient selection, meaning that regions with higher recombination will harbor fewer detrimental mutations, more selectively favored variants, and fewer errors in replication and repair. Recombination can also generate particular types of mutations if chromosomes are misaligned.
Gene conversion is a type of recombination that is the product of DNA repair where nucleotide damage is corrected using orthologous genomic regions as a template. Damaged bases are first excised, the damaged strand is then aligned with an undamaged homolog, and DNA synthesis repairs the excised region using the undamaged strand as a guide. Gene conversion is often responsible for homogenizing sequence of duplicate genes over long time periods, reducing nucleotide divergence.
Genetic drift is the change of allele frequencies from one generation to the next due to stochastic effects of random sampling in finite populations. If organisms are selected at random to reproduce, some variants may enter the subsequent generation more or less often, simply due to chance. Genetic drift will be more severe in smaller populations. Genetic drift reduces the efficiency of natural selection and sexual selection and can result in the accumulation of detrimental mutations in spite of strong negative effects on the organism. Detrimental mutations which have smaller negative effects can be 'nearly neutral' and their prevalence in populations will be governed largely by the effects of genetic drift. Many genomic features have been ascribed to accumulation of nearly neutral detrimental mutations as a result of small population sizes .
Mutations with 4Nes < 1, where Ne is the effective population size and s is the selection coefficient, will be effectively neutral. Hence, when populations are small, a larger variety of mutations will behave as if they are neutral due to inefficiency of selection.
Selection occurs when organisms with greater fitness, i.e. greater ability to survive or reproduce, are favored in subsequent generations, thereby increasing the instance of underlying genetic variants in a population. Selection can be the product of natural selection, artificial selection, or sexual selection. Natural selection is any selective process that occurs due to the fitness of an organism to its environment. In contrast sexual selection is a product of mate choice and can favor the spread of genetic variants which act counter to natural selection but increase desirability to the opposite sex or increase mating success. Artificial selection, also known as selective breeding, is imposed by an outside entity, typically humans, in order to increase the frequency of desired traits.
The principles of population genetics apply similarly to all types of selection, though in fact each may produce distinct effects due to clustering of genes with different functions in different parts of the genome, or due to different properties of genes in particular functional classes. For instance, sexual selection could be more likely to affect molecular evolution of the sex chromosomes due to clustering of sex specific genes on the X,Y,Z or W.
Selection can operate at the gene level at the expense of organismal fitness, resulting in a selective advantage for selfish genetic elements in spite of a host cost. Examples of such selfish elements include transposable elements, meiotic drivers, killer X chromosomes, selfish mitochondria, and self-propagating introns. (See Intragenomic conflict.)
Genome size is influenced by the amount of repetitive DNA as well as number of genes in an organism. The C-value paradox refers to the lack of correlation between organism 'complexity' and genome size. Explanations for the so-called paradox are two-fold. First, repetitive genetic elements can comprise large portions of the genome for many organisms, thereby inflating DNA content of the haploid genome. Secondly, the number of genes is not necessarily indicative of the number of developmental stages or tissue types in an organism. An organism with few developmental stages or tissue types may have large numbers of genes that influence non-developmental phenotypes, inflating gene content relative to developmental gene families.
Neutral explanations for genome size suggest that when population sizes are small, many mutations become nearly neutral. Hence, in small populations repetitive content and other 'junk' DNA can accumulate without placing the organism at a competitive disadvantage. There is little evidence to suggest that genome size is under strong widespread selection in multicellular eukaryotes. Genome size, independent of gene content, correlates poorly with most physiological traits and many eukaryotes, including mammals, harbor very large amounts of repetitive DNA.
However, birds likely have experienced strong selection for reduced genome size, in response to changing energetic needs for flight. Birds, unlike humans, produce nucleated red blood cells, and larger nuclei lead to lower levels of oxygen transport. Bird metabolism is far higher than that of mammals, due largely to flight, and oxygen needs are high. Hence, most birds have small, compact genomes with few repetitive elements. Indirect evidence suggests that non-avian theropod dinosaur ancestors of modern birds  also had reduced genome sizes, consistent with endothermy and high energetic needs for running speed. Many bacteria have also experienced selection for small genome size, as time of replication and energy consumption are so tightly correlated with fitness.
Transposable elements are self-replicating, selfish genetic elements which are capable of proliferating within host genomes. Many transposable elements are related to viruses, and share several proteins in common.
DNA transposons are cut and paste transposable elements which excise DNA and move it to alternate sections of the genome.
Alu elements comprise over XX % of the human genome. They are short non-autonomous repeat sequences.
Chromosome number and organization
The number of chromosomes in an organisms genome also does not necessarily correlate with the amount of DNA an its genome. The ant Myrmecia pilosula has only a single pair of chromosomes whereas the Adders-tongue fern Ophioglossum reticulatum has up to 1260 chromosomes.  Cilliate genomes house each gene in individual chromosomes, resulting in a genome which is not physically linked. Reduced linkage through creation of additional chromosomes should effectively increase the efficiency of selection.
Changes in chromosome number can play a key role in speciation, as differing chromosome numbers can serve as a barrier to reproduction in hybrids. Humans chromosome 2 was created from a fusion of two chimpanzee chromosomes and still contains central telomeres as well as a vestigial second centromere. Polyploidy especially allopolyploidy, which occurs often in plants, can also result in reproductive incompatibilities with parental species. Agrodiatus blue butterflies have diverse chromosome numbers ranging from n=10 to n=134 and additionally have one of the highest rates of speciation identified to date. 
Gene content and distribution
Different organisms house different numbers of genes within their genomes as well as different patterns in the distribution of genes throughout the genome. Some organisms, such as most bacteria, Drosophila, and Arabidopsis have particularly compact genomes with little repetitive content or non-coding DNA. Other organisms, like mammals or maize, have large amounts of repetitive DNA, long introns, and substantial spacing between different genes. The content and distribution of genes within the genome can influence the rate at which certain types of mutations occur and can influence the subsequent evolution of different species. Genes with long introns are more likely to recombine due to increased physical distance over the coding sequence. As such, long introns may facilitate ectopic recombination, and result in higher rates of new gene formation.
In addition to the nuclear genome, endosymbiont organelles contain their own genetic material typically as circular plasmids. Mitochondrial and chloroplast DNA varies across taxa, but membrane-bound proteins, especially electron transport chain constituents are most often encoded in the organelle. Chloroplasts and mitochondria are maternally inherited in most species, as the organelles must pass through the egg. In a rare departure, some species of mussels are known to inherit mitochondria from father to son.
Origins of new genes
New genes arise from several different genetic mechanisms including gene duplication, de novo origination, retrotransposition, chimeric gene formation, recruitment of non-coding sequence, and gene truncation. In gene duplication, a gene sequence is copied to create redundancy. Duplicated gene sequences can then mutate to develop new functions or to specialize so that each new gene performs a subset of the original ancestral functions.
Retrotransposition creates new genes by copying mRNA to DNA and inserting it into the genome. Retrogenes often insert into new genomic location, and often develop new expression patterns and functions. Chimeric genes form when duplication, deletion, or incomplete retrotransposition combine portions of two different coding sequences to produce a novel gene sequence. Chimeras often cause regulatory changes and can shuffle protein domains to produce novel adaptive functions.
Novel genes can also arise from previously non-coding DNA. For instance, Levine and colleagues reported the origin of five new genes in the D. melanogaster genome from noncoding DNA. Similar de novo origin of genes has been also shown in other organisms such as yeast, rice and humans. De novo genes may evolve from transcripts that are already expressed at low levels.
Recruitment of formerly non-coding sequence can similarly form new genes which the stop codon is removed or shifted, or when a partial duplication copies the 5 prime end of a gene. This newly recruited sequence can contribute to peptide sequence, creating genes with altered functions.
Molecular systematics is a product of the traditional field of systematics and molecular genetics. It uses DNA, RNA, or protein sequences to resolve questions in systematics, i.e. about their correct scientific classification or taxonomy from the point of view of evolutionary biology.
Molecular systematics has been made possible by the availability of techniques for DNA sequencing, which allow the determination of the exact sequence of nucleotides or bases in either DNA or RNA. At present it is still a long and expensive process to sequence the entire genome of an organism, and this has been done for only a few species. However, it is quite feasible to determine the sequence of a defined area of a particular chromosome. Typical molecular systematic analyses require the sequencing of around 1000 base pairs.
The driving forces of evolution
Depending on the relative importance assigned to the various forces of evolution, three perspectives provide evolutionary explanations for molecular evolution.
While recognizing the importance of random drift for silent mutations, selectionists hypotheses argue that balancing and positive selection are the driving forces of molecular evolution. Those hypotheses are often based on the broader view called panselectionism, the idea that selection is the only force strong enough to explain evolution, relaying random drift and mutations to minor roles.
Neutralists hypotheses emphasize the importance of mutation, purifying selection and random genetic drift. The introduction of the neutral theory by Kimura, quickly followed by King and Jukes' own findings, led to a fierce debate about the relevance of neodarwinism at the molecular level. The Neutral theory of molecular evolution states that most mutations are deleterious and quickly removed by natural selection, but of the remaining ones, the vast majority are neutral with respect to fitness while the amount of advantageous mutations is vanishingly small. The fate of neutral mutations are governed by genetic drift, and contribute to both nucleotide polymorphism and fixed differences between species.
Mutationists hypotheses emphasize random drift and biases in mutation patterns. Sueoka was the first to propose a modern mutationist view. He proposed that the variation in GC content was not the result of positive selection, but a consequence of the GC mutational pressure.
Journals and societies
Journals dedicated to molecular evolution include Molecular Biology and Evolution, Journal of Molecular Evolution, and Molecular Phylogenetics and Evolution. Research in molecular evolution is also published in journals of genetics, molecular biology, genomics, systematics, or evolutionary biology. The Society for Molecular Biology and Evolution publishes the journal "Molecular Biology and Evolution" and holds an annual international meeting.
- Lynch, M. (2007). The Origins of Genome Architecture. Sinauer. ISBN 0-87893-484-7.
- Organ, C. L., A. M. Shedlock, A. Meade, M. Pagel, S. V. Edwards. (2007). Origin of avian genome size and structure in nonavian dinosaurs. Nature. 446: 180-184
- Crosland, M.W.J., Crozier, R.H. (1986). "Myrmecia pilosula, an ant with only one pair of chromosomes". Science 231 (4743): 1278. Bibcode:1986Sci...231.1278C. doi:10.1126/science.231.4743.1278. PMID 17839565.
- Gerardus J. H. Grubben (2004). Vegetables. PROTA. p. 404. ISBN 978-90-5782-147-9. Retrieved 10 March 2013.
- Nikolai P. Kandul, Vladimir A. Lukhtanov, Naomi E. Pierce (2007), KARYOTYPIC DIVERSITY AND SPECIATION IN AGRODIAETUS BUTTERFLIES, The Society for the Study of Evolution, 61(3):546-559, doi:10.1111/j.1558-5646.2007.00046.x
- Tautz, Diethard and Domazet-Lošo, Tomislav (2011). "The evolutionary origin of orphan genes". Nature Review Genetics 12 (10): 692–702. doi:10.1038/nrg3053.
- Levine MT, Jones CD, Kern AD et al (2006). "Novel genes derived from noncoding DNA in Drosophila melanogaster are frequently X-linked and exhibit testis-biased expression". Proc Natl Acad Sci USA 103 (26): 9935–9939. doi:10.1073/pnas.0509809103. PMC 1502557. PMID 16777968.
- Zhou Q, Zhang G, Zhang Y et al (2008). "On the origin of new genes in Drosophila". Genome Res 18 (9): 1446–1455. doi:10.1101/gr.076588.108. PMC 2527705. PMID 18550802.
- Cai J, Zhao R, Jiang H et al (2008). "De novo origination of a new protein-coding gene in Saccharomyces cerevisiae". Genetics 179 (1): 487–496. doi:10.1534/genetics.107.084491. PMC 2390625. PMID 18493065.
- Xiao W, Liu H, Li Y et al (2009). El-Shemy, Hany A, ed. "A rice gene of de novo origin negatively regulates pathogen- induced defense response". PLoS One 4 (2): e4603. doi:10.1371/journal.pone.0004603. PMC 2643483. PMID 19240804.
- Knowles DG, McLysaght A (2009). "Recent de novo origin of human protein-coding genes". Genome Res 19 (10): 1752–1759. doi:10.1101/gr.095026.109. PMC 2765279. PMID 19726446.
- Wilson, Ben A.; Joanna Masel (2011). "Putatively Noncoding Transcripts Show Extensive Association with Ribosomes". Genome Biology & Evolution 3: 1245–1252. doi:10.1093/gbe/evr099.
- Graur, D. and Li, W.-H. (2000). Fundamentals of molecular evolution. Sinauer. ISBN 0-87893-266-6.
- Gillespie, J. H (1991). The Causes of Molecular Evolution. Oxford University Press, New York. ISBN 0-19-506883-1.
- Kimura, M. (1983). The Neutral Theory of Molecular Evolution. Cambridge University Press, Cambridge. ISBN 0-521-23109-4.
- Kimura, Motoo (1968). "Evolutionary rate at the molecular level". Nature 217 (5129): 624–626. doi:10.1038/217624a0. PMID 5637732.
- King, J.L. and Jukes, T.H. (1969). "Non-Darwinian Evolution". Science 164 (3881): 788–798. doi:10.1126/science.164.3881.788. PMID 5767777.
- Nachman M. (2006). C.W. Fox and J.B. Wolf, ed. ""Detecting selection at the molecular level" in: Evolutionary Genetics: concepts and case studies". pp. 103–118.
- The nearly neutral theory expanded the neutralist perspective, suggesting that several mutations are nearly neutral, which means both random drift and natural selection is relevant to their dynamics.
- Ohta, T (1992). "The nearly neutral theory of molecular evolution". Annual Review of Ecology and Systematics 23 (1): 263–286. doi:10.1146/annurev.es.23.110192.001403.
- Nei, M. (2005). "Selectionism and Neutralism in Molecular Evolution". Molecular Biology and Evolution 22 (12): 2318–2342. doi:10.1093/molbev/msi242. PMC 1513187. PMID 16120807.
- Sueoka, N. (1964). "On the evolution of informational macromolecules". In In: Bryson, V. and Vogel, H.J. Evolving genes and proteins. Academic Press, New-York. pp. 479–496.
- Li, W.-H. (2006). Molecular Evolution. Sinauer. ISBN 0-87893-480-4.
- Lynch, M. (2007). The Origins of Genome Architecture. Sinauer. ISBN 0-87893-484-7.
- A. Meyer (Editor), Y. van de Peer, "Genome Evolution: Gene and Genome Duplications and the Origin of Novel Gene Functions", 2003, ISBN 978-1-4020-1021-7
- T. Ryan Gregory, "The Evolution of the Genome", 2004, YSBN 978-0123014634