Kozak consensus sequence
The Kozak consensus sequence, Kozak consensus or Kozak sequence is a sequence which occurs on eukaryotic mRNA and has the consensus (gcc)gccRccAUGG. The Kozak consensus sequence plays a major role in the initiation of the translation process.[1] The sequence was named after the scientist who discovered it, Marilyn Kozak.
The sequence is identified by the notation (gcc)gccRccAUGG, which summarizes data analysed by Kozak from a wide variety of sources (about 699 in all)[2] as follows:
- a lower-case letter denotes the most common base at a position where the base can nevertheless vary;
- upper-case letters indicate highly conserved bases, i.e. the 'AUGG' sequence is constant or rarely, if ever, changes, with the exception being the IUPAC ambiguity code [3] 'R' which indicates that a purine (adenine or guanine) is always observed at this position (with adenine being claimed by Kozak to be more frequent); and
- the sequence in parentheses (gcc) is of uncertain significance.
Kozak's paper was limited to a subset of vertebrates (i.e. human, cow, cat, dog, chicken, guinea pig, hamster, mouse, pig, rabbit, sheep, and xenopus).
Contents
Introduction[edit]
This sequence on an mRNA molecule is recognized by the ribosome as the translational start site, from which a protein is coded by that mRNA molecule. The ribosome requires this sequence, or a possible variation (see below) to initiate translation. The Kozak sequence is not to be confused with the ribosomal binding site (RBS), that being either the 5' cap of a messenger RNA or an Internal ribosome entry site (IRES).
In vivo, this site is often not matched exactly on different mRNAs and the amount of protein synthesized from a given mRNA is dependent on the strength of the Kozak sequence.[4] Some nucleotides in this sequence are more important than others: the AUG is most important because it is the actual initiation codon encoding a methionine amino acid at the N-terminus of the protein. (Rarely, GUG is used as an initiation codon, but methionine is still the first amino acid as it is the met-tRNA in the initiation complex that binds to the mRNA.) The A nucleotide of the "AUG" is referred to as number 1. For a 'strong' consensus, the nucleotides at positions +4 (i.e. G in the consensus) and -3 (i.e. either A or G in the consensus) relative to the number 1 nucleotide must both match the consensus (there is no number 0 position). An 'adequate' consensus has only 1 of these sites, while a 'weak' consensus has neither. The cc at -1 and -2 are not as conserved, but contribute to the overall strength.[5] There is also evidence that a G in the -6 position is important in the initiation of translation.[1]
There are examples in vivo of each of these types of Kozak consensus, and they probably evolved as yet another mechanism of gene regulation. Lmx1b is an example of a gene with a weak Kozak consensus sequence.[6] For initiation of translation from such a site, other features are required in the mRNA sequence in order for the ribosome to recognize the initiation codon.
Mutations[edit]
Research has shown that a mutation of G—>C in the -6 position of the β-globin gene (β+45; human) disrupted the haematological and biosynthetic phenotype function. This was the first mutation found in the Kozak sequence. It was found in a family from the Southeast Italy and they suffered from thalassaemia intermedia.[1]
Variations in the consensus sequence[edit]
The Kozak consensus has been variously described as:[7]
(gcc)gccRccAUGG (Kozak 1987)
AGNNAUGN
ANNAUGG
ACCAUGG (Spotts et al., 1997, mentioned in Kozak 2002)
GACACCAUGG (H. sapiens HBB, HBD, R. norvegicus Hbb, etc.)
| Biota | Phylum | Consensus sequences |
|---|---|---|
| Vertebrate (Kozak 1987) | gccRccATGG[2] | |
| Fruit fly (Drosophila spp.) | Arthropoda | atMAAMATGamc[8] |
| Budding yeast (Saccharomyces cerevisiae) | Ascomycota | aAaAaAATGTCt[9] |
| Slime mold (Dictyostelium discoideum) | Amoebozoa | aaaAAAATGRna[10] |
| Ciliate | Ciliophora | nTaAAAATGRct[10] |
| Malarial protozoa (Plasmodium spp.) | Apicomplexa | taaAAAATGAan[10] |
| Toxoplasma (Toxoplasma gondii) | Apicomplexa | gncAaaATGg[11] |
| Trypanosomatidae | Euglenozoa | nnnAnnATGnC[10] |
| Terrestrial plants | acAACAATGGC[12] | |
| Microalga (Dunaliella salina) | Chlorophyta | gccaagATGgcg[13] |
See also[edit]
- Shine-Dalgarno sequence, the ribosomal binding site of prokaryotes.
References[edit]
- ^ a b c De Angioletti M, Lacerra G, Sabato V, Carestia C (2004). "Beta+45 G --> C: a novel silent beta-thalassaemia mutation, the first in the Kozak sequence". Br J Haematol. 124 (2): 224–31. doi:10.1046/j.1365-2141.2003.04754.x. PMID 14687034.
- ^ a b Kozak M (October 1987). "An analysis of 5'-noncoding sequences from 699 vertebrate messenger RNAs". Nucleic Acids Res. 15 (20): 8125–8148. doi:10.1093/nar/15.20.8125. PMC 306349. PMID 3313277.
- ^ Nomenclature for Incompletely Specified Bases in Nucleic Acid Sequences, NC-IUB, 1984.
- ^ Kozak M (1984). "Point mutations close to the AUG initiator codon affect the efficiency of translation of rat preproinsulin in vivo". Nature. 308 (5956): 241–246. doi:10.1038/308241a0. PMID 6700727.
- ^ Kozak M (1986). "Point mutations define a sequence flanking the AUG initiator codon that modulates translation by eukaryotic ribosomes". Cell. 44 (2): 283–92. doi:10.1016/0092-8674(86)90762-2. PMID 3943125.
- ^ Dunston JA, Hamlington JD, Zaveri J, et al. (September 2004). "The human LMX1B gene: transcription unit, promoter, and pathogenic mutations". Genomics. 84 (3): 565–76. doi:10.1016/j.ygeno.2004.06.002. PMID 15498463.
- ^ Tang, Sen-Lin; Chang, Bill C.H.; Halgamuge, Saman K. (August 2010). "Gene functionality's influence on the second codon: A large-scale survey of second codon composition in three domains". Genomics. 96 (2): 92–101. doi:10.1016/j.ygeno.2010.04.001. Retrieved 3 August 2018.
- ^ Cavener DR (February 1987). "Comparison of the consensus sequence flanking translational start sites in Drosophila and vertebrates". Nucleic Acids Res. 15 (4): 1353–61. doi:10.1093/nar/15.4.1353. PMC 340553. PMID 3822832.
- ^ Hamilton R, Watanabe CK, de Boer HA (April 1987). "Compilation and comparison of the sequence context around the AUG startcodons in Saccharomyces cerevisiae mRNAs". Nucleic Acids Res. 15 (8): 3581–93. doi:10.1093/nar/15.8.3581. PMC 340751. PMID 3554144.
- ^ a b c d Yamauchi K (May 1991). "The sequence flanking translational initiation site in protozoa". Nucleic Acids Res. 19 (10): 2715–20. doi:10.1093/nar/19.10.2715. PMC 328191. PMID 2041747.
- ^ Seeber, F. (1997). "Consensus sequence of translational initiation sites from Toxoplasma gondii genes". Parasitology Research. 83 (3): 309–311. doi:10.1007/s004360050254. PMID 9089733.
- ^ Lütcke HA, Chow KC, Mickel FS, Moss KA, Kern HF, Scheele GA (January 1987). "Selection of AUG initiation codons differs in plants and animals". EMBO J. 6 (1): 43–8. PMC 553354. PMID 3556162.
- ^ Kadkhodaei, Saeid; Hashemi, Farahnaz S. Golestan; Rezaei, Morvarid Akhavan; Abbasiliasi, Sahar; Shun, Tan Joo; Memari, Hamid R. Rajabi; Moradpour, Mahdi; Ariff, Arbakariya B. (2016-07-05). "Cis/transgene optimization: systematic discovery of some key gene expression elements integrating bioinformatics and computational biology". bioRxiv 061945.
Further reading[edit]
This article includes a list of references, but its sources remain unclear because it has insufficient inline citations. (April 2009) (Learn how and when to remove this template message) |
- Kozak M (November 1990). "Downstream secondary structure facilitates recognition of initiator codons by eukaryotic ribosomes". Proc. Natl. Acad. Sci. U.S.A. 87 (21): 8301–5. doi:10.1073/pnas.87.21.8301. PMC 54943. PMID 2236042.
- Kozak M (November 1991). "An analysis of vertebrate mRNA sequences: intimations of translational control". J. Cell Biol. 115 (4): 887–903. doi:10.1083/jcb.115.4.887. PMC 2289952. PMID 1955461.
- Kozak M (October 2002). "Pushing the limits of the scanning mechanism for initiation of translation". Gene. 299 (1–2): 1–34. doi:10.1016/S0378-1119(02)01056-9. PMID 12459250.