Transcription is the first step of gene expression, in which a particular segment of DNA is copied into RNA (mRNA) by the enzyme RNA polymerase. Both DNA and RNA are nucleic acids, which use base pairs of nucleotides as a complementary language. During transcription, a DNA sequence is read by an RNA polymerase, which produces a complementary, antiparallel RNA strand called a primary transcript.
Transcription proceeds in the following general steps:
- RNA polymerase, together with one or more general transcription factors, binds to promoter DNA.
- RNA polymerase creates a transcription bubble, which separates the two strands of the DNA helix. This is done by breaking the hydrogen bonds between complementary DNA nucleotides.
- RNA polymerase adds RNA nucleotides (which are complementary to the nucleotides of one DNA strand).
- RNA sugar-phosphate backbone forms with assistance from RNA polymerase to form an RNA strand.
- Hydrogen bonds of the RNA–DNA helix break, freeing the newly synthesized RNA strand.
- If the cell has a nucleus, the RNA may be further processed. This may include polyadenylation, capping, and splicing.
- The RNA may remain in the nucleus or exit to the cytoplasm through the nuclear pore complex.
The stretch of DNA transcribed into an RNA molecule is called a transcription unit and encodes at least one gene. If the gene encodes a protein, the transcription produces messenger RNA (mRNA); the mRNA will in turn serve as a template for the protein's synthesis through translation. Alternatively, the transcribed gene may encode for either non-coding RNA (such as microRNA), ribosomal RNA (rRNA), transfer RNA (tRNA), or other enzymatic RNA molecules called ribozymes. Overall, RNA helps synthesize, regulate, and process proteins; it therefore plays a fundamental role in performing functions within a cell.
In virology, the term may also be used when referring to mRNA synthesis from an RNA molecule (i.e., RNA replication). For instance, the genome of a negative-sense single-stranded RNA (ssRNA -) virus may be template for a positive-sense single-stranded RNA (ssRNA +). This is because the positive-sense strand contains the information needed to translate the viral proteins for viral replication afterwards. This process is catalyzed by a viral RNA replicase.
- 1 Background
- 2 Major steps
- 3 Inhibitors
- 4 Endogenous inhibitors
- 5 Factories
- 6 History
- 7 Measuring and detecting
- 8 Reverse transcription
- 9 See also
- 10 References
- 11 External links
A DNA transcription unit encoding for a protein may contain both a coding sequence, which will be translated into the protein, and regulatory sequences, which direct and regulate the synthesis of that protein. The regulatory sequence before ("upstream" from) the coding sequence is called the five prime untranslated region (5'UTR); the sequence after ("downstream" from) the coding sequence is called the three prime untranslated region (3'UTR).
Only one of the two DNA strands serve as a template for transcription. The antisense strand of DNA is read by RNA polymerase from the 3' end to the 5' end during transcription (3' → 5'). The complementary RNA is created in the opposite direction, in the 5' → 3' direction, matching the sequence of the sense strand with the exception of switching uracil for thymine. This directionality is because RNA polymerase can only add nucleotides to the 3' end of the growing mRNA chain. This use of only the 3' → 5' DNA strand eliminates the need for the Okazaki fragments that are seen in DNA replication. This also removes the need for an RNA primer to initiate RNA synthesis, as is the case in DNA replication.
The non-template (sense) strand of DNA is called the coding strand, because its sequence is the same as the newly created RNA transcript (except for the substitution of uracil for thymine). This is the strand that is used by convention when presenting a DNA sequence.
Transcription has some proofreading mechanisms, but they are fewer and less effective than the controls for copying DNA; therefore, transcription has a lower copying fidelity than DNA replication.
Transcription is divided into initiation, promoter escape, elongation and termination.
Transcription begins with the binding of RNA polymerase, together with one or more general transcription factor, to a specific DNA sequence referred to as a "promoter" to form an RNA polymerase-promoter "closed complex" (called a "closed complex" because the promoter DNA is fully double-stranded).
RNA polymerase, assisted by one or more general transcription factors, then unwinds approximately 14 base pairs of DNA to form an RNA polymerase-promoter "open complex" (called an "open complex" because the promoter DNA is partly unwound and single-stranded) that contains an unwound, single-stranded DNA region of approximately 14 base pairs referred to as the "transcription bubble."
RNA polymerase, assisted by one or more general transcription factors, then selects a transcription start site in the transcription bubble, binds to an initiating NTP and an extending NTP (or a short RNA primer and an extending NTP) complementary to the transcription start site sequence, and catalyzes bond formation to yield an initial RNA product.
In bacteria, RNA polymerase core enzyme consists of five subunits: 2 α subunits, 1 β subunit, 1 β' subunit, and 1 ω subunit. In bacteria, there is one general RNA transcription factor: sigma. RNA polymerase core enzyme binds to the bacterial general transcription factor sigma to form RNA polymerase holoenzyme and then binds to a promoter.
In archaea and eukaryotes, RNA polymerase contains subunits homologous to each of the five RNA polymerase subunits in bacteria and also contains additional subunits. In archaea and eukaryotes, the functions performed by the bacterial general transcription factor sigma are performed by multiple general transcription factors that work together. In archaea, there are three general transcription factors: TBP, TFB, and TFE. In eukaryotes, in RNA polymerase II-dependent transcription, there are six general transcription factors: TFIIA, TFIIB (an ortholog of archaeal TFB), TFIID (a multisubunit factor in which the key subunit, TBP, is an ortholog of archaeal TBP), TFIIE (an ortholog of archaeal TFE), TFIIF, and TFIIH. In archaea and eukaryotes, the RNA polymerase-promoter closed complex usually is referred to as the "preinitiation complex."
Transcription initiation is regulated by additional proteins, known as activators and repressors, and, in some cases, associated coactivators or corepressors, which modulate formation and function of the transcription initiation complex.
After the first bond is synthesized, the RNA polymerase must escape the promoter. During this time there is a tendency to release the RNA transcript and produce truncated transcripts. This is called abortive initiation and is common for both eukaryotes and prokaryotes. Abortive initiation continues to occur until an RNA product of a threshold length of approximately 10 nucleotides is synthesized, at which point promoter escape occurs and a transcription elongation complex is formed.
Mechanistically, promoter escape occurs through a scrunching mechanism, where the energy built up by DNA scrunching provides the energy needed to break interactions between RNA polymerase holoenzyme and the promoter.
In bacteria, upon and following promoter clearance, the σ factor is released according to a stochastic model.
In eukaryotes, at an RNA polymerase II-dependent promoter, upon promoter clearance, TFIIH phosphorylates serine 5 on the carboxy terminal domain of RNA polymerase II, leading to the recruitment of capping enzyme (CE). The exact mechanism of how CE induces promoter clearance in eukaryotes is not yet known.
One strand of the DNA, the template strand (or noncoding strand), is used as a template for RNA synthesis. As transcription proceeds, RNA polymerase traverses the template strand and uses base pairing complementarity with the DNA template to create an RNA copy. Although RNA polymerase traverses the template strand from 3' → 5', the coding (non-template) strand and newly formed RNA can also be used as reference points, so transcription can be described as occurring 5' → 3'. This produces an RNA molecule from 5' → 3', an exact copy of the coding strand (except that thymines are replaced with uracils, and the nucleotides are composed of a ribose (5-carbon) sugar where DNA has deoxyribose (one fewer oxygen atom) in its sugar-phosphate backbone).
mRNA transcription can involve multiple RNA polymerases on a single DNA template and multiple rounds of transcription (amplification of particular mRNA), so many mRNA molecules can be rapidly produced from a single copy of a gene.
Elongation also involves a proofreading mechanism that can replace incorrectly incorporated bases. In eukaryotes, this may correspond with short pauses during transcription that allow appropriate RNA editing factors to bind. These pauses may be intrinsic to the RNA polymerase or due to chromatin structure.
Bacteria use two different strategies for transcription termination - Rho-independent termination and Rho-dependent termination. In Rho-independent transcription termination, also called intrinsic termination, RNA transcription stops when the newly synthesized RNA molecule forms a G-C-rich hairpin loop followed by a run of Us. When the hairpin forms, the mechanical stress breaks the weak rU-dA bonds, now filling the DNA–RNA hybrid. This pulls the poly-U transcript out of the active site of the RNA polymerase, in effect, terminating transcription. In the "Rho-dependent" type of termination, a protein factor called "Rho" destabilizes the interaction between the template and the mRNA, thus releasing the newly synthesized mRNA from the elongation complex.
Transcription termination in eukaryotes is less understood but involves cleavage of the new transcript followed by template-independent addition of adenines at its new 3' end, in a process called polyadenylation.
Transcription inhibitors can be used as antibiotics against, for example, pathogenic bacteria (antibacterials) and fungi (antifungals). An example of such an antibacterial is rifampicin, which inhibits prokaryotic DNA transcription into mRNA by inhibiting DNA-dependent RNA polymerase by binding its beta-subunit. 8-Hydroxyquinoline is an antifungal transcription inhibitor. The effects of histone methylation may also work to inhibit the action of transcription.
- For significant overlapping coverage, see also Promoter (genetics).
CpG islands in promoters
In humans, about 70% of promoters located near the transcription start site of a gene (proximal promoters) contain a CpG island. CpG islands are generally 200 to 2000 base pairs long, have a C:G base pair content >50%, and have regions of DNA where a cytosine nucleotide is followed by a guanine nucleotide and this occurs frequently in the linear sequence of bases along its 5' → 3' direction.
Genes may also have distant promoters (distal promoters) and these frequently contain CpG islands as well. An example is the promoter of the DNA repair gene ERCC1, where the CpG island-containing promoter is located about 5,400 nucleotides upstream of the coding region of the ERCC1 gene. CpG islands also occur frequently in promoters for functional noncoding RNAs such as microRNAs.
Transcription inhibition due to methylation of CpG islands
In humans, DNA methylation occurs at the 5' position of the pyrimidine ring of the cytosine residues within CpG sites to form 5-methylcytosines. The presence of multiple methylated CpG sites in CpG islands of promoters causes stable inhibition (silencing) of genes. Inhibition of transcription of a gene may be initiated by other mechanisms, but this is often followed by methylation of CpG sites in the promoter CpG island to cause the stable inhibition of the gene.
Transcription inhibition/activation in cancers
In cancers, loss of expression of genes occurs about 10 times more frequently by transcription inhibition (caused by promoter hypermethylation of CpG islands) than by mutations. As Vogelstein et al. point out, in a colorectal cancer there are usually about 3 to 6 driver mutations and 33 to 66 hitchhiker or passenger mutations. In contrast, in colon tumors compared to adjacent normal-appearing colonic mucosa, there are about 600 to 800 heavily methylated CpG islands in promoters of genes in the tumors while these CpG islands are not methylated in the adjacent mucosa.
Using gene set enrichment analysis, 569 out of 938 gene sets were hypermethylated and 369 were hypomethylated in cancers. Hypomethylation of CpG islands in promoters results in increased transcription of the genes or gene sets affected.
One study listed 147 specific genes with colon cancer-associated hypermethylated promoters, along with the frequency with which these hypermethylations were found in colon cancers. At least 10 of those genes had hypermethylated promoters in nearly 100% of colon cancers. They also indicated 11 microRNAs whose promoters were hypermethylated in colon cancers at frequencies between 50% and 100% of cancers. MicroRNAs (miRNAs) are small endogenous RNAs that pair with sequences in messenger RNAs to direct post-transcriptional repression. On averge, each microRNA represses or inhibits transcription of several hundred target genes. Thus microRNAs with hypermethylated promoters may be allowing enhanced transcription of hundreds to thousands of genes in a cancer.
The information above shows that, in cancers, promoter CpG hyper/hypo-methylation of genes and of microRNAs causes loss of transcription (or sometimes increased transcription) of far more genes than does mutation.
Transcription inhibition and activation by nuclear microRNAs
For more than 20 years, microRNAs have been known to act in the cytoplasm to degrade or block translation of specific target gene messenger RNAs (see microRNA history). However, recently, Gagnon et al. showed that as many as 75% of microRNAs may be shuttled back into the nucleus of cells. Some nuclear microRNAs have been shown to mediate transcriptional gene activation or transcriptional gene inhibition.
Active transcription units are clustered in the nucleus, in discrete sites called transcription factories or euchromatin. Such sites can be visualized by allowing engaged polymerases to extend their transcripts in tagged precursors (Br-UTP or Br-U) and immuno-labeling the tagged nascent RNA. Transcription factories can also be localized using fluorescence in situ hybridization or marked by antibodies directed against polymerases. There are ~10,000 factories in the nucleoplasm of a HeLa cell, among which are ~8,000 polymerase II factories and ~2,000 polymerase III factories. Each polymerase II factory contains ~8 polymerases. As most active transcription units are associated with only one polymerase, each factory usually contains ~8 different transcription units. These units might be associated through promoters and/or enhancers, with loops forming a ‘cloud’ around the factor.
A molecule that allows the genetic material to be realized as a protein was first hypothesized by François Jacob and Jacques Monod. Severo Ochoa won a Nobel Prize in Physiology or Medicine in 1959 for developing a process for synthesizing RNA in vitro with polynucleotide phosphorylase, which was useful for cracking the genetic code. RNA synthesis by RNA polymerase was established in vitro by several laboratories by 1965; however, the RNA synthesized by these enzymes had properties that suggested the existence of an additional factor needed to terminate transcription correctly.
In 1972, Walter Fiers became the first person to actually prove the existence of the terminating enzyme.
Measuring and detecting
Transcription can be measured and detected in a variety of ways:
- G-Less Cassette transcription assay: measures promoter strength
- Run-off Transcription assay: identifies transcription start sites (TSS)
- Nuclear Run-on assay: measures the relative abundance of newly formed transcripts
- RNase protection assay and ChIP-Chip of RNAP: detect active transcription sites
- RT-PCR: measures the absolute abundance of total or nuclear RNA levels, which may however differ from transcription rates
- DNA microarrays: measures the relative abundance of the global total or nuclear RNA levels; however, these may differ from transcription rates
- In situ hybridization: detects the presence of a transcript
- MS2 tagging: by incorporating RNA stem loops, such as MS2, into a gene, these become incorporated into newly synthesized RNA. The stem loops can then be detected using a fusion of GFP and the MS2 coat protein, which has a high affinity, sequence-specific interaction with the MS2 stem loops. The recruitment of GFP to the site of transcription is visualized as a single fluorescent spot. This new approach has revealed that transcription occurs in discontinuous bursts, or pulses (see Transcriptional bursting). With the notable exception of in situ techniques, most other methods provide cell population averages, and are not capable of detecting this fundamental property of genes.
- Northern blot: the traditional method, and until the advent of RNA-Seq, the most quantitative
- RNA-Seq: applies next-generation sequencing techniques to sequence whole transcriptomes, which allows the measurement of relative abundance of RNA, as well as the detection of additional variations such as fusion genes, post-transcriptional edits and novel splice sites
Some viruses (such as HIV, the cause of AIDS), have the ability to transcribe RNA into DNA. HIV has an RNA genome that is reverse transcribed into DNA. The resulting DNA can be merged with the DNA genome of the host cell. The main enzyme responsible for synthesis of DNA from an RNA template is called reverse transcriptase.
In the case of HIV, reverse transcriptase is responsible for synthesizing a complementary DNA strand (cDNA) to the viral RNA genome. The enzyme ribonuclease H then digests the RNA strand, and reverse transcriptase synthesises a complementary strand of DNA to form a double helix DNA structure ("cDNA"). The cDNA is integrated into the host cell's genome by the enzyme integrase, which causes the host cell to generate viral proteins that reassemble into new viral particles. In HIV, subsequent to this, the host cell undergoes programmed cell death, or apoptosis of T cells. However, in other retroviruses, the host cell remains intact as the virus buds out of the cell.
Some eukaryotic cells contain an enzyme with reverse transcription activity called telomerase. Telomerase is a reverse transcriptase that lengthens the ends of linear chromosomes. Telomerase carries an RNA template from which it synthesizes a repeating sequence of DNA, or "junk" DNA. This repeated sequence of DNA is called a telomere and can be thought of as a "cap" for a chromosome. It is important because every time a linear chromosome is duplicated, it is shortened. With this "junk" DNA or "cap" at the ends of chromosomes, the shortening eliminates some of the non-essential, repeated sequence rather than the protein-encoding DNA sequence, that is farther away from the chromosome end.
Telomerase is often activated in cancer cells to enable cancer cells to duplicate their genomes indefinitely without losing important protein-coding DNA sequence. Activation of telomerase could be part of the process that allows cancer cells to become immortal. The immortalizing factor of cancer via telomere lengthening due to telomerase has been proven to occur in 90% of all carcinogenic tumors in vivo with the remaining 10% using an alternative telomere maintenance route called ALT or Alternative Lengthening of Telomeres.
- Crick's central dogma - DNA is transcribed to RNA, which is translated to polypeptides (polypeptides cannot "reverse translate" into RNA or DNA)
- Eukaryotic transcription
- Gene regulation
- Bacterial transcription
- RNA Polymerase
- Reverse transcription - process viruses use to make DNA from RNA
- Splicing - process of removing introns from precursor messenger RNA (pre-mRNA) to make messenger RNA (mRNA)
- Translation - process of decoding RNA to form polypeptides
- Transcription factor
- Eldra P. Solomon, Linda R. Berg, Diana W. Martin. Biology, 8th Edition, International Student Edition. Thomson Brooks/Cole. ISBN 978-0495317142
- "Tentative identification of RNA-dependent RNA polymerases of dsRNA viruses and their relationship to positive strand RNA viral polymerases". FEBS Letters. 252: 42–46. July 1989. doi:10.1016/0014-5793(89)80886-5. PMID 2759231.
- Berg J, Tymoczko JL, Stryer L (2006). Biochemistry (6th ed.). San Francisco: W. H. Freeman. ISBN 0-7167-8724-5.
- Watson JD, Baker TA, Bell SP, Gann AA, Levine M, Losick RM (2013). Molecular Biology of the Gene (7th ed.). Pearson.
- Goldman, S.; Ebright, R.; Nickels, B. (May 2009). "Direct detection of abortive RNA transcripts in vivo". Science. 324 (5929): 927–928. doi:10.1126/science.1169237. PMC . PMID 19443781.
- Revyakin, A.; Liu, C.; Ebright, R.; Strick, T. (2006). "Abortive initiation and productive initiation by RNA polymerase involve DNA scrunching". Science. 314 (5802): 1139–1143. doi:10.1126/science.1131398. PMC . PMID 17110577.
- Raffaelle, M.; Kanin, E. I.; Vogt, J.; Burgess, R. R.; Ansari, A. Z. (2005). "Holoenzyme Switching and Stochastic Release of Sigma Factors from RNA Polymerase in Vivo". Molecular Cell. 20 (3): 357–366. doi:10.1016/j.molcel.2005.10.011. PMID 16285918.
- Mandal, S. S.; Chu, C.; Wada, T.; Handa, H.; Shatkin, A. J.; Reinberg, D. (2004). "Functional interactions of RNA-capping enzyme with factors that positively and negatively regulate promoter escape by RNA polymerase II". Proceedings of the National Academy of Sciences. 101 (20): 7572–7577. doi:10.1073/pnas.0401493101. PMC . PMID 15136722.
- Goodrich, J. A.; Tjian, R. (1994). "Transcription factors IIE and IIH and ATP hydrolysis direct promoter clearance by RNA polymerase II". Cell. 77 (1): 145–156. doi:10.1016/0092-8674(94)90242-9. PMID 8156590.
- Richardson, J (2002). "Rho-dependent termination and ATPases in transcript termination". Biochimica et Biophysica Acta. 1577 (2): 251–260. doi:10.1016/S0167-4781(02)00456-6.
- Lykke-Andersen, S; Jensen, TH (2007). "Overlapping pathways dictate termination of RNA polymerase II transcription". Biochimie. 89 (10): 1177–82. doi:10.1016/j.biochi.2007.05.007.
- 8-Hydroxyquinoline info from SIGMA-ALDRICH. Retrieved Feb 2012
- Saxonov S, Berg P, Brutlag DL (2006). "A genome-wide analysis of CpG dinucleotides in the human genome distinguishes two distinct classes of promoters". Proc. Natl. Acad. Sci. U.S.A. 103 (5): 1412–7. doi:10.1073/pnas.0510310103. PMC . PMID 16432200.
- Deaton AM, Bird A (2011). "CpG islands and the regulation of transcription". Genes Dev. 25 (10): 1010–22. doi:10.1101/gad.2037511. PMC . PMID 21576262.
- Chen HY, Shao CJ, Chen FR, Kwan AL, Chen ZP (2010). "Role of ERCC1 promoter hypermethylation in drug resistance to cisplatin in human gliomas". Int. J. Cancer. 126 (8): 1944–54. doi:10.1002/ijc.24772. PMID 19626585.
- Bird A (2002). "DNA methylation patterns and epigenetic memory". Genes Dev. 16 (1): 6–21. doi:10.1101/gad.947102. PMID 11782440.
- Vogelstein B, Papadopoulos N, Velculescu VE, Zhou S, Diaz LA, Kinzler KW (2013). "Cancer genome landscapes". Science. 339 (6127): 1546–58. doi:10.1126/science.1235122. PMC . PMID 23539594.
- Illingworth RS, Gruenewald-Schneider U, Webb S, Kerr AR, James KD, Turner DJ, Smith C, Harrison DJ, Andrews R, Bird AP (2010). "Orphan CpG islands identify numerous conserved promoters in the mammalian genome". PLoS Genet. 6 (9): e1001134. doi:10.1371/journal.pgen.1001134. PMC . PMID 20885785.
- Wei J, Li G, Dang S, Zhou Y, Zeng K, Liu M (2016). "Discovery and Validation of Hypermethylated Markers for Colorectal Cancer". Dis. Markers. 2016: 2192853. doi:10.1155/2016/2192853. PMC . PMID 27493446.
- Beggs AD, Jones A, El-Bahrawy M, El-Bahwary M, Abulafi M, Hodgson SV, Tomlinson IP (2013). "Whole-genome methylation analysis of benign and malignant colorectal tumours". J. Pathol. 229 (5): 697–704. doi:10.1002/path.4132. PMC . PMID 23096130.
- Schnekenburger M, Diederich M (2012). "Epigenetics Offer New Horizons for Colorectal Cancer Prevention". Curr Colorectal Cancer Rep. 8 (1): 66–81. doi:10.1007/s11888-011-0116-z. PMC . PMID 22389639.
- Friedman RC, Farh KK, Burge CB, Bartel DP (2009). "Most mammalian mRNAs are conserved targets of microRNAs". Genome Res. 19 (1): 92–105. doi:10.1101/gr.082701.108. PMC . PMID 18955434.
- Gagnon KT, Li L, Chu Y, Janowski BA, Corey DR (2014). "RNAi factors are present and active in human cell nuclei". Cell Rep. 6 (1): 211–21. doi:10.1016/j.celrep.2013.12.013. PMC . PMID 24388755.
- Catalanotto C, Cogoni C, Zardo G (2016). "MicroRNA in Control of Gene Expression: An Overview of Nuclear Functions". Int J Mol Sci. 17 (10). doi:10.3390/ijms17101712. PMID 27754357.
- Papantonis, A (2012-10-26). "TNFα signals through specialized factories where responsive coding and miRNA genes are transcribed". Nature EMBO J.
- "Chemistry 2006". Nobel Foundation. Retrieved March 29, 2007.
- Raj, A.; van Oudenaarden, A. (2008). "Nature, nurture, or chance: stochastic gene expression and its consequences". Cell. 135: 216–26. doi:10.1016/j.cell.2008.09.050. PMC . PMID 18957198.
- Kolesnikova I. N. (2000). "Some patterns of apoptosis mechanism during HIV-infection". Dissertation (in Russian). Retrieved February 20, 2011.
- ALT and Telomerase from Nature. Retrieved May 2010
|Wikimedia Commons has media related to Transcription (genetics).|
|Wikiversity has learning materials about Transcription (genetics)|
- Interactive Java simulation of transcription initiation. From Center for Models of Life at the Niels Bohr Institute.
- Interactive Java simulation of transcription interference--a game of promoter dominance in bacterial virus. From Center for Models of Life at the Niels Bohr Institute.
- Biology animations about this topic under Chapter 15 and Chapter 18
- Virtual Cell Animation Collection, Introducing Transcription
- Easy to use DNA transcription site