Complementarity (molecular biology)
In molecular biology, complementarity describes a relationship between two structures each following the lock-and-key principle. In nature complementarity is the base principle of DNA replication and transcription as it is a property shared between two DNA or RNA sequences, such that when they are aligned antiparallel to each other, the nucleotide bases at each position in the sequences will be complementary, much like looking in the mirror and seeing the reverse of things. This line up between the sequences is what results in the ability for cells to copy information from one generation to another and even find and repair damage to the information stored in the sequences. The degree of complementarity between two nucleic acid strands may vary, from total complementarity (each nucleotide is across from its opposite) to none (each nucleotide is not across from its opposite) and determines the stability of the sequences to be together. Furthermore various DNA repair functions as well as regulatory functions are based on base pair complementarity. In biotechnology, the principle of base pair complementarity allows the generation of DNA hybrids between RNA and DNA, and opens the door to modern tools such as cDNA libraries. While most complementarity is seen between two sequences of DNA, it is also possible for a sequence to have internal complementarity resulting in the sequence binding to itself in a folded configuration.
DNA base pair complementarity
Four nucleobases are involved in DNA complementarity: adenine, thymine (uracil in RNA), guanine and cytosine. Adenine and guanine are purines, while thymine, cytosine and uracil are pyrimidines, respectively.
|Nucleic Acid||Nucleobases||Base complement|
|DNA||adenine(A), thymine(T), guanine(G), cytosine(C)||A=T , G≡C|
|RNA||adenine(A), uracil(U), guanine(G), cytosine(C)||A=U , G≡C|
In nucleic acid, nucleobases are held together by hydrogen bonding, which only works efficiently between adenine and thymine and between guanine and cytosine. The base complement A=T shares two hydrogen bonds, while the base pair G≡C has three hydrogen bonds, respectively. All other configurations between nucleobases would hinder double helix formation. DNA strands are oriented in opposite directions, they are said to be antiparallel.
A complementary strand of DNA or RNA, respectively, may be constructed based on nucleobase complementarity. Each base pair, A=T vs. G≡C,takes up roughly the same space, thereby enabling a twisted DNA double helix formation without any spacial distortions. Hydrogen bonding between the nucleobases also stabilizes the DNA double helix.
Complementarity of DNA strands in a double helix make it possible to use one strand as a template to construct the other. This principle plays an important role in DNA replication, setting the foundation of heredity by explaining how genetic information can be passed down to the next generation. Complementarity is also utilized in DNA transcription, which generates an RNA strand from a DNA template. Following the principles of the genetic code translation allows to transform the code into cellular building blocks such as proteins.
Nucleic acids strands may also form hybrids in which single stranded DNA may readily anneal with complementary DNA or RNA. This principle is the basis of commonly performed laboratory techniques such as the polymerase chain reaction, PCR.
Genome wide studies have shown that RNA antisense transcripts occur commonly within nature. They are generally believed to increase the coding potential of the genetic code and add an overall layer of complexity to gene regulation. So far it is known that 40% of the human genome is transcribed in both directions underlining the potential significance of reverse transcription. It has been suggested that complementary regions between sense and antisense transcripts would allow to generate double stranded RNA hybrids, which may play an important role in gene regulation. For example, hypoxia-induced factor 1α mRNA and β-secretase mRNA are transcribed bidirectionally,and it has been shown that the antisense transcript acts as stabilizer to the sense script.
Kissing hairpins are formed when a single strand of nucleic acid complements with itself creating loops of RNA in the form of a hairpins. When two hairpins come into contact with each other in vivo, the complementary bases of the two strands form up and begin to unwind the hairpins until a double-stranded RNA (dsRNA) complex is formed or the complex unwinds back to two separate strands due to mismatches in the hairpins. The secondary structure of the hairpin prior to kissing allows for a stable structure with a relatively fixed change in energy. The purpose of these structures is a balancing of stability of the hairpin loop vs binding strength with a complementary strand. Too strong of an initial binding to a bad location and the strands will not unwind quickly enough. Too weak of an initial binding and the strands won't ever fully form the needed desired complex. These hairpin structures allow for the exposure of enough bases to provide a strong enough check on the initial binding and a weak enough internal binding to allow the unfolding once a favorable match has been found.
---C G--- C G ---C G--- U A C G G C U A C G G C A G C G A A A G C U A A U CUU ---CCUGCAACUUAGGCAGG--- A GAA ---GGACGUUGAAUCCGUCC--- G A U U U U U C U C G C G C C G C G A U A U G C G C ---G C--- ---G C--- Kissing hairpins meeting up at the top of the loops. The complementarity of the two heads encourages the hairpin to unfold and straighten out to become one flat sequence of two strands rather than two hairpins.
miRNAs and siRNAs
miRNAs, microRNA, are short RNA sequences that provide a means to regulate the translation of genes that have already been transcribed. Current research indicates that circulating miRNA may be utilized as novel biomarkers, hence show promising evidence to be utilized in disease diagnostic. MiRNAs are formed from longer sequences of RNA that are cut free by a Dicer enzyme from an RNA sequence that is from a regulator gene. These short strands bind to into a RISC complex. They match up with sequences in the upstream region of a transcribed gene due to their complementarity to act as a silencer for the gene in three ways. One by preventing a ribosome from binding and initiating translation. Two by degrading the mRNA that the complex has bound to. And three by providing a new double-stranded RNA (dsRNA) sequence that the Dicer can act upon to create more miRNA to find and degrade more copies of the gene. Small interfering RNAs (siRNAs) are similar in function to miRNAs, but come from other sources of RNA, but server a similar purpose to miRNAs. Given their short length, the rules for complementarity means that they can still be very discriminating in their targets of choice. Given that there are four choices for each base in the strand and 20 - 22 length for an mi/siRNA, that leads to more than 1×1012 possible combinations. Given that the human genome is ~3.1 million bases in length, this means that each miRNA should only find a match once in the entire human genome by accident.
Complementarity allows to store information found in DNA/RNA to be stored in a single strand. The complementing strand can be determined from the template and vice versa as in cDNA libraries. This also allows for analysis, like comparing the sequences of two different species. Shorthands have been developed for writing down sequences when there are mismatches (ambiguity codes) or the speed up how to read the opposite sequence in the complementation (ambigrams).
A cDNA library is a collection of DNA genes and inserts and are seen as a useful reference tool in gene identification and cloning processes. cDNA libraries are constructed from mRNA using RNA-dependent DNA polymerase reverse transcriptase RT, which transcribes an mRNA template into DNA. Therefore, a cDNA library can only contain inserts that are meant to be transcribed into mRNA. This process relies on the principle of DNA/RNA complementarity. The end product of the libraries is double stranded DNA, which may be inserted into plasmids. Hence, cDNA libraries are a powerful tool in modern research.
In systematic biology it may be necessary to complement IUPAC codes that mean "any of the two" or "any of the three". The code R (any purine) can be complemented into Y (any pyrimidine) and M (amino) to K (keto). W (weak) and S (strong) are usually not swapped while have been swapped in the past by some tools. S and W is from "weak" and "strong", a number of the hydrogen bonds that a nucleotide uses to pair with its complementing partner. A partner uses the same number of the bonds to make a complementing pair.
Code that specifically excludes one of the three nucleotides can be complemented into code that excludes the complementing nucleotide. For instance, V (A,C or G - "not T") can be complemented into B (C, G or T - "not A").
|B||not A (B comes after A)||C||G||T||3|
|D||not C (D comes after C)||A||G||T|
|H||not G (H comes after G)||A||C||T|
|V||not T (V comes after T and U)||A||C||G|
|N or -||any base (not a gap)||A||C||G||T||4|
By assigning suitable (ambigraphic) characters to complementary bases (i.e. guanine = b, cytosine = q, adenine = n, and thymine = u), it is possible to complement entire DNA sequences by simply rotating the text "upside down". For instance, with the previous alphabet, buqn (GTCA) would read as ubnq (TGAC, reverse complement) if turned upside down. This feature is enhanced by proposing custom fonts rather than ordinary ASCII or even Unicode characters.
- Watson, James, Cold Spring Harbor Laboratory, Tania A. Baker, Massachusetts Institute of Technology, Stephen P. Bell, Massachusetts Institute of Technology, Alexander Gann, Cold Spring Harbor Laboratory, Michael Levine, University of California, Berkeley, Richard Losik, Harvard University ; with Stephen C. Harrison, Harvard Medical. Molecular biology of the gene (Seventh edition. ed.). Boston: Benjamin-Cummings Publishing Company. ISBN 978-0-32176243-6.
- Pray, Leslie (2008). "Discovery of DNA structure and function: Watson and Crick". Nature Education 1 (1): 100. Retrieved 27 November 2013.
- Shankar, A; Jagota, A; Mittal, J (2012 Oct 11). "DNA base dimers are stabilized by hydrogen-bonding interactions including non-Watson-Crick pairing near graphite surfaces.". The journal of physical chemistry. B 116 (40): 12088–94. PMID 22967176.
- Hood, L; Galas, D (2003 Jan 23). "The digital code of DNA.". Nature 421 (6921): 444–8. PMID 12540920. Retrieved 27 November 2013.
- Katayama, S; Tomaru, Y; Kasukawa, T; Waki, K; Nakanishi, M; Nakamura, M; Nishida, H; Yap, CC; Suzuki, M; Kawai, J; Suzuki, H; Carninci, P; Hayashizaki, Y; Wells, C; Frith, M; Ravasi, T; Pang, KC; Hallinan, J; Mattick, J; Hume, DA; Lipovich, L; Batalov, S; Engström, PG; Mizuno, Y; Faghihi, MA; Sandelin, A; Chalk, AM; Mottagui-Tabar, S; Liang, Z; Lenhard, B; Wahlestedt, C; RIKEN Genome Exploration Research, Group; Genome Science Group (Genome Network Project Core, Group); FANTOM, Consortium (2005 Sep 2). "Antisense transcription in the mammalian transcriptome.". Science (New York, N.Y.) 309 (5740): 1564–6. PMID 16141073. Retrieved 13 November 2013.
- Faghihi, MA; Zhang, M; Huang, J; Modarresi, F; Van der Brug, MP; Nalls, MA; Cookson, MR; St-Laurent G, 3rd; Wahlestedt, C (2010). "Evidence for natural antisense transcript-mediated inhibition of microRNA function.". Genome biology 11 (5): R56. PMID 20507594. Retrieved 13 November 2013.
- Marino, JP; Gregorian RS, Jr; Csankovszki, G; Crothers, DM (1995 Jun 9). "Bent helix formation between RNA hairpins with complementary loops.". Science (New York, N.Y.) 268 (5216): 1448–54. PMID 7539549. Retrieved 27 November 2013.
- Chang, KY; Tinoco I, Jr (1997 May 30). "The structure of an RNA "kissing" hairpin complex of the HIV TAR hairpin loop and its complement.". Journal of molecular biology 269 (1): 52–66. PMID 9193000. Retrieved 22 October 2013.
- Kosaka, N; Yoshioka, Y; Hagiwara, K; Tominaga, N; Katsuda, T; Ochiya, T (2013 Sep 5). "Trash or Treasure: extracellular microRNAs and cell-to-cell communication.". Frontiers in genetics 4: 173. PMID 24046777. Retrieved 27 November 2013.
- "Ensembl genome browser 73: Homo sapiens - Assembly and Genebuild". Ensembl.org. Retrieved 27 November 2013.
- Wan, KH; Yu, C; George, RA; Carlson, JW; Hoskins, RA; Svirskas, R; Stapleton, M; Celniker, SE (2006). "High-throughput plasmid cDNA library screening.". Nature protocols 1 (2): 624–32. PMID 17406289. Retrieved 27 November 2013.
- Jeremiah Faith (2011), conversion table
- arep.med.harvard.edu A tool page with the note about the applied W-S conversion patch.
- Reverse-complement tool page with documented IUPAC code conversion, source code available.
- Nomenclature Committee of the International Union of Biochemistry (NC-IUB) (1984). "Nomenclature for Incompletely Specified Bases in Nucleic Acid Sequences". Retrieved 2008-02-04.
- Rozak DA (2006). "The practical and pedagogical advantages of an ambigraphic nucleic acid notation". Nucleosides Nucleotides Nucleic Acids 25 (7): 807–13. doi:10.1080/15257770600726109. PMID 16898419.
- Flower, R. H.; Knoll, A. H.; Yuan, X. (1955). "Status of Endoceroid Classification". Journal of Paleontology 29 (3): 329–371. doi:10.2144/000112727.