A gene family is a set of homologous genes within one organism. A gene cluster is part of a gene family. A gene cluster is a group of two or more genes found within an organism's DNA that encode for similar polypeptides, or proteins, which collectively share a generalized function and are often located within a few thousand base pairs of each other. The size of gene clusters can vary significantly, from a few genes to several hundred genes. Portions of the DNA sequence of each gene within a gene cluster are found to be identical; however, the resulting protein of each gene is distinctive from the resulting protein of another gene within the cluster. Genes found in a gene cluster may be observed near one another on the same chromosome or on different, but homologous chromosomes. An example of a gene cluster is the Hox gene, which is made up of eight genes and is part of the Homeobox gene family.
Historically, four models have been proposed for the formation and persistence of gene clusters.
Gene duplication and divergence
This model has been generally accepted since the mid-1970s. It postulates that gene clusters were formed as a result of gene duplication and divergence. These gene clusters include the Hox gene cluster, the human β-globin gene cluster, and four clustered human growth hormone (hGH)/chorionic somaomammotropin genes.
Conserved gene clusters, such as Hox and the human β-globin gene cluster, may be formed as a result of the process of gene duplication and divergence. A gene is duplicated during cell division, so that its descendants have two end-to-end copies of the gene where it had one copy, initially coding for the same protein or otherwise having the same function. In the course of subsequent evolution, they diverge, so that the products they code for have different but related functions, with the genes still being adjacent on the chromosome. Ohno theorized that the origin of new genes during evolution was dependent on gene duplication. If only a single copy of a gene existed in the genome of a species, the proteins transcribed from this gene would be essential to their survival. Because there was only a single copy of the gene, they could not undergo mutations which would potentially result in new genes; however, gene duplication allows essential genes to undergo mutations in the duplicated copy, which would ultimately give rise to new genes over the course of evolution.
 Mutations in the duplicated copy were tolerated because the original copy contained genetic information for the essential gene's function. Species who have gene clusters have a selective evolutionary advantage because natural selection must keep the genes together. Over a short span of time, the new genetic information exhibited by the duplicated copy of the essential gene would not serve a practical advantage; however, over a long, evolutionary time period, the genetic information in the duplicated copy may undergo additional and drastic mutations in which the proteins of the duplicated gene served a different role than those of the original essential gene. Over the long, evolutionary time period, the two similar genes would diverge so the proteins of each gene were unique in their functions. Hox gene clusters, ranging in various sizes, are found among several phyla.
When gene duplication occurs to produce a gene cluster, one or multiple genes may be duplicated at once. In the case of the Hox gene, a shared ancestral ProtoHox cluster was duplicated, resulting in genetic clusters in the Hox gene as well as the ParaHox gene, an evolutionary sister complex of the Hox gene. It is unknown the exact number of genes contained in the duplicated Protohox cluster; however, models exist suggesting that the duplicated Protohox cluster originally contained four, three, or two genes.
In the case where a gene cluster is duplicated, some genes may be lost. Loss of genes is dependent of the number of genes originating in the gene cluster. In the four gene model, the ProtoHox cluster contained four genes which resulted in two twin clusters: the Hox cluster and the ParaHox cluster. As its name indicates, the two gene model gave rise to the Hox cluster and the ParaHox cluster as a result of the ProtoHox cluster which contained only two genes. The three gene model was originally proposed in conjunction with the four gene model; however, rather than the Hox cluster and the ParaHox cluster resulting from a cluster containing three genes, the Hox cluster and ParaHox cluster were as a result of single gene tandem duplication, identical genes found adjacent on the same chromosome. This was independent of duplication of the ancestral ProtoHox cluster.
Cis vs. trans duplication
Gene duplication may occur via cis-duplication or trans duplication. Cis-duplication, or intrachromosomal duplication, entails the duplication of genes within the same chromosome whereas trans duplication, or interchromosomal duplication, consists of duplicating genes on neighboring but separate chromosomes. The formations of the Hox cluster and of the ParaHox cluster were results of intrachromosomal duplication, although they were initially thought to be interchromosomal.
The Fisher Model was proposed in 1930 by Ronald Fisher. Under the Fisher Model, gene clusters are a result of two alleles working well with one another. In other words, gene clusters may exhibit co-adaptation. The Fisher Model was considered unlikely and later dismissed as an explanation for gene cluster formation.
Under the coregulation model, genes are organized into clusters, each consisting of a single promoter and a cluster of coding sequences, which are therefore co-regulated, showing coordinated gene expression. Coordinated gene expression was once considered to be the most common mechanism driving the formation of gene clusters. However coregulation and thus coordinated gene expression cannot drive the formation of gene clusters.
The Molarity Model considers the constraints of cell size. Transcribing and translating genes together is beneficial to the cell. thus the formation of clustered genes generates a high local concentration of cytoplasmic protein products. Spatial segregation of protein products has been observed in bacteria; however, the Molarity Model does not consider co-transcription or distribution of genes found within an operon.
Gene clusters vs. tandem arrays
Repeated genes can occur in two major patterns: gene clusters and tandem repeats, or formerly called tandemly arrayed genes. Although similar, gene clusters and tandemly arrayed genes may be distinguished from one another.
Gene clusters are found to be close to one another when observed on the same chromosome. They are dispersed randomly; however, gene clusters are normally within, at most, a few thousand bases of each other. The distance between each gene in the gene cluster can vary. The DNA found between each repeated gene in the gene cluster is non-conserved. Portions of the DNA sequence of a gene is found to be identical in genes contained in a gene cluster. Gene conversion is the only method in which gene clusters may become homogenized. Although the size of a gene cluster may vary, it rarely comprises more than 50 genes, making clusters stable in number. Gene clusters change over a long evolutionary time period, which does not result in genetic complexity.
Tandem arrays are a group of genes with the same or similar function that are repeated consecutively without space between each gene. The genes are organized in the same orientation. Unlike gene clusters, tandemly arrayed genes are found to consist of consecutive, identical repeats, separated only by a nontranscribed spacer region.
 While the genes contained in a gene cluster encode for similar proteins, identical proteins or functional RNAs are encoded by tandemly arrayed genes. Unequal recombination, which changes the number of repeats by placing duplicated genes next to the original gene. Unlike gene clusters, tandemly arrayed genes rapidly change in response to the needs of the environment, causing an increase in genetic complexity.
Gene conversion allows tandemly arrayed genes to become homogenized, or identical. Gene conversion may be allelic or ectopic. Allelic gene conversion occurs when one allele of a gene is converted to the other allele as a result of mismatch base pairing during meiosis homologous recombination. Ectopic gene conversion occurs when one homologous DNA sequence is replaced by another. Ectopic gene conversion is the driving force for concerted evolution of gene families.
Tandemly arrayed genes are essential to maintaining large gene families, such as ribosomal RNA. In the eukaryotic genome, tandemly arrayed genes make up ribosomal RNA. Tandemly repeated rRNAs are essential to maintain the RNA transcript. One RNA gene may not be able to provide a sufficient amount of RNA. In this situation, tandem repeats of the gene allow a sufficient amount of RNA to be provided. For example, human embryonic cells contain 5-10 million ribosomes and double in number within 24 hours. In order to provide a substantive number of ribosomes, multiple RNA polymerases must consecutively transcribe multiple rRNA genes.
- Yi G, Sze SH, Thon MR (May 2007). "Identifying clusters of functionally related genes in genomes". Bioinformatics. 23 (9): 1053–60. doi:10.1093/bioinformatics/btl673. PMID 17237058.
- Lawrence J (December 1999). "Selfish operons: the evolutionary impact of gene clustering in prokaryotes and eukaryotes" (PDF). Current Opinion in Genetics & Development. 9 (6): 642–8. doi:10.1016/s0959-437x(99)00025-8. PMID 10607610. Archived from the original (PDF) on 2010-05-28.
- Lawrence JG, Roth JR (August 1996). "Selfish operons: horizontal transfer may drive the evolution of gene clusters". Genetics. 143 (4): 1843–60. PMC 1207444. PMID 8844169.
- Ohno S (1970). Evolution by gene duplication. Springer-Verlag. ISBN 978-0-04-575015-3.
- Klug W, Cummings M, Spencer C, Pallodino M (2009). "Chromosome Mutations: Variation in chromosome number and arrangement". In Wilbur B (ed.). Concepts of Genetics (9 ed.). San Francisco, CA: Pearson Benjamin Cumming. pp. 213–214. ISBN 978-0-321-54098-0.
- Overbeek R, Fonstein M, D'Souza M, Pusch GD, Maltsev N (March 1999). "The use of gene clusters to infer functional coupling". Proceedings of the National Academy of Sciences of the United States of America. 96 (6): 2896–901. doi:10.1073/pnas.96.6.2896. PMC 15866. PMID 10077608.
- Garcia-Fernàndez J (February 2005). "Hox, ParaHox, ProtoHox: facts and guesses". Heredity. 94 (2): 145–52. doi:10.1038/sj.hdy.6800621. PMID 15578045.
- Garcia-Fernàndez J (December 2005). "The genesis and evolution of homeobox gene clusters". Nature Reviews. Genetics. 6 (12): 881–92. doi:10.1038/nrg1723. PMID 16341069.
- Gómez MJ, Cases I, Valencia A (2004). "Gene order in Prokaryotes: conservation and implications". In Vicente M, Tamames J, Valencia A, Mingorance J (eds.). Molecules in Time and Space: Bacterial Shape, Division, and Phylogeny. New York: Klumer Academic/Plenum Publishers. pp. 221–224. doi:10.1007/0-306-48579-6_11. ISBN 978-0-306-48578-7.
- Graham GJ (July 1995). "Tandem genes and clustered genes". Journal of Theoretical Biology. 175 (1): 71–87. doi:10.1006/jtbi.1995.0122. PMID 7564393.
- Lodish H, Berk A, Kaiser C, Krieger M, Bretscher A, Ploegh H, Amon A, Scott M (2013). "Genes, Genomics, and Chromosomes". Molecular Cell Biology (7th ed.). New York: W.H. Freeman Company. pp. 227–230. ISBN 978-1-4292-3413-9.
- Galtier N, Piganeau G, Mouchiroud D, Duret L (October 2001). "GC-content evolution in mammalian genomes: the biased gene conversion hypothesis". Genetics. 159 (2): 907–11. PMC 1461818. PMID 11693127.
- Duret L, Galtier N (2009). "Biased gene conversion and the evolution of mammalian genomic landscapes". Annual Review of Genomics and Human Genetics. 10: 285–311. doi:10.1146/annurev-genom-082908-150001. PMID 19630562.