Codon usage bias
Codon usage bias refers to differences in the frequency of occurrence of synonymous codons in coding DNA. A codon is a series of three nucleotides (a triplet) that encodes a specific amino acid residue in a polypeptide chain or for the termination of translation (stop codons).
There are 64 different codons (61 codons encoding for amino acids plus 3 stop codons) but only 20 different translated amino acids. The overabundance in the number of codons allows many amino acids to be encoded by more than one codon. Because of such redundancy it is said that the genetic code is degenerate. Different organisms often show particular preferences for one of the several codons that encode the same amino acid- that is, a greater frequency of one will be found than expected by chance. How such preferences arise is a much debated area of molecular evolution.
It is generally acknowledged that codon preferences reflect a balance between mutational biases and natural selection for translational optimization. Optimal codons in fast-growing microorganisms, like Escherichia coli or Saccharomyces cerevisiae (baker's yeast), reflect the composition of their respective genomic tRNA pool. It is thought that optimal codons help to achieve faster translation rates and high accuracy. As a result of these factors, translational selection is expected to be stronger in highly expressed genes, as is indeed the case for the above-mentioned organisms. In other organisms that do not show high growing rates or that present small genomes, codon usage optimization is normally absent, and codon preferences are determined by the characteristic mutational biases seen in that particular genome. Examples of this are Homo sapiens (human) and Helicobacter pylori. Organisms that show an intermediate level of codon usage optimization include Drosophila melanogaster (fruit fly), Caenorhabditis elegans (nematode worm), Strongylocentrotus purpuratus (sea urchin) or Arabidopsis thaliana (thale cress).
The nature of the codon usage-tRNA optimization has been fiercely debated. It is not clear whether codon usage drives tRNA evolution or vice versa. At least one mathematical model has been developed where both codon-usage and tRNA-expression co-evolve in feedback fashion (i.e., codons already present in high frequencies drive up the expression of their corresponding tRNAs, and tRNAs normally expressed at high levels drive up the frequency of their corresponding codons), however this model does not seem to yet have experimental confirmation. Another problem is that the evolution of tRNA genes has been a very inactive area of research.
Factors contributing to codon usage bias
Different factors have been proposed to be related to codon usage bias, including gene expression level (reflecting selection for optimizing translation process by tRNA abundance), %G+C composition (reflecting horizontal gene transfer or mutational bias), GC skew (reflecting strand-specific mutational bias), amino acid conservation, protein hydropathy, transcriptional selection, RNA stability, optimal growth temperature and hypersaline adaptation.
Methods of analyzing codon usage bias
In the field of bioinformatics and computational biology, many statistical methods have been proposed and used to analyze codon usage bias. Methods such as the 'frequency of optimal codons' (Fop), the Relative Codon Adaptation (RCA)  or the 'Codon Adaptation Index' (CAI)  are used to predict gene expression levels, while methods such as the 'effective number of codons' (Nc) and Shannon entropy from information theory are used to measure codon usage evenness. Multivariate statistical methods, such as correspondence analysis and principal component analysis, are widely used to analyze variations in codon usage among genes. There are many computer programs to implement the statistical analyses enumerated above, including CodonW, GCUA, INCA, etc. Codon optimization has applications in designing synthetic genes and DNA vaccines. Several software packages are available online for this purpose (refer to external links). Optimizing the occurrence of desired/undesired motifs and sequence composition in all possible reverse translated gene sequences increases the search space exponentially with respect to gene length. For those reasons, the problem could be addressed using optimization algorithms like genetic algorithms (Sandhu et al., In Silico Biol. 2008;8(2):187-92).
- Ermolaeva MD (October 2001). "Synonymous codon usage in bacteria". Curr Issues Mol Biol 3 (4): 91–7. PMID 11719972.
- Lynn DJ, Singer GA, Hickey DA (October 2002). "Synonymous codon usage is subject to selection in thermophilic bacteria". Nucleic Acids Res. 30 (19): 4272–7. doi:10.1093/nar/gkf546. PMC 140546. PMID 12364606.
- Paul S, Bag SK, Das S, Harvill ET, Dutta C (2008). "Molecular signature of hypersaline adaptation: insights from genome and proteome composition of halophilic prokaryotes". Genome Biol. 9 (4): R70. doi:10.1186/gb-2008-9-4-r70. PMC 2643941. PMID 18397532.
- Kober, K. M.; Pogson, G. H. (2013). "Genome-Wide Patterns of Codon Bias Are Shaped by Natural Selection in the Purple Sea Urchin, Strongylocentrotus purpuratus". G3: Genes|Genomes|Genetics 3 (7): 1069. doi:10.1534/g3.113.005769.
- Comeron JM, Aguadé M (September 1998). "An evaluation of measures of synonymous codon usage bias". J. Mol. Evol. 47 (3): 268–74. doi:10.1007/PL00006384. PMID 9732453.
- Ikemura T (September 1981). "Correlation between the abundance of Escherichia coli transfer RNAs and the occurrence of the respective codons in its protein genes: a proposal for a synonymous codon choice that is optimal for the E. coli translational system.". J. Mol. Biol. 151 (3): 389–409. doi:10.1016/0022-2836(81)90003-6. PMID 6175758.
- Fox JM, Erill I (June 2010). "Relative codon adaptation: a generic codon bias index for prediction of gene expression". DNA Res. 17 (3): 185–96. doi:10.1093/dnares/dsq012. PMC 2885275. PMID 20453079.
- Sharp PM, Li WH (February 1987). "The codon Adaptation Index--a measure of directional synonymous codon usage bias, and its potential applications". Nucleic Acids Res. 15 (3): 1281–95. doi:10.1093/nar/15.3.1281. PMC 340524. PMID 3547335.
- Peden J (2005-04-15). "Codon usage indices". Correspondence Analysis of Codon Usage. SourceForge. Retrieved 2010-10-20.
- Suzuki H, Brown CJ, Forney LJ, Top EM (December 2008). "Comparison of correspondence analysis methods for synonymous codon usage in bacteria". DNA Res. 15 (6): 357–65. doi:10.1093/dnares/dsn028. PMC 2608848. PMID 18940873.
- Composition Analysis Toolkit: estimating codon usage bias and its statistical significance
- Codon Usage Database
- GCUA - General Codon Usage Analysis
- Graphical Codon Usage Analyser
- JCat - Java Codon Usage Adaptation Tool
- INCA - Interactive Codon Analysis software
- ACUA - Automated Codon Usage Analysis Tool
- OPTIMIZER - Codon usage optimization
- HEG-DB - Highly Expressed Genes Database
- E-CAI - Expected value of Codon Adaptation Index
- CAIcal -Set of tools to assess codon usage adaptation
- scRCA - Automatic determination of translational codon usage bias
- Online Synonymous Codon Usage Analyses with the ade4 and seqinR packages
- Genetic Algorithm Simulation for Codon Optimization