Complementarity (molecular biology)

From Wikipedia, the free encyclopedia
Jump to: navigation, search

In molecular biology, complementarity is a property shared between two nucleic acid sequences, such that when they are aligned antiparallel to each other, the nucleotide bases at each position will be complementary. Two bases are complementary if they form Watson-Crick base pairs. The degree of complementarity between two nucleic acid strands may vary, from total complementarity to none.

Contents

Rules [edit]

Left: the nucleotide base pairs that can form in double-stranded DNA. Between A and T there are two hydrogen bonds, while there are three between C and G. Right: two complementary strands of DNA.

For DNA, adenine (A) bases complement thymine (T) bases and vice versa; guanine (G) bases complement cytosine (C) bases and vice versa. With RNA, it is the same except that uracil is present in place of thymine, and therefore adenine (A) bases complement uracil (U) bases.

Since there is only one complementary base for each of the bases found in DNA and in RNA, one can reconstruct a complementary strand for any single strand. All C bases in one strand will pair with G bases in the complementary strand, etc. In a DNA double helix, the two strands of DNA are complementary; this plays an important role in DNA replication, as each strand can act as a template for the construction of the other.[1]

For example, the complementary strand of the DNA sequence

5' A G T C A T G 3'

is

3' T C A G T A C 5'

Note that the latter is often written as the reverse complement with the 5' end on the left and the 3' end on the right, as below:

5' C A T G A C T 3'

A sequence that is equal to its reverse complement is said to be a palindromic sequence.

Ambiguity codes [edit]

In systematic biology it may be necessary to complement IUPAC codes that mean "any of the two" or "any of the three". The code R (any purine) can be complemented into Y (any pyrimidine) and M (amino) to K (keto). W (weak) and S (strong) are usually not swapped[2] while have been swapped in the past by some tools.[3] S and W is from "weak" and "strong", a number of the hydrogen bonds that a nucleotide uses to pair with its complementing partner. A partner uses the same number of the bonds to make a complementing pair.[1]

Code that specifically excludes one of the three nucleotides can be complemented into code that excludes the complementing nucleotide. For instance, V (A,C or G - "not T") can be complemented into B (C, G or T - "not A").

Ambigrams [edit]

By assigning suitable (ambigraphic) characters to complementary bases (i.e. guanine = b, cytosine = q, adenine = n, and thymine = u), it is possible to complement entire DNA sequences by simply rotating the text "upside down".[4] For instance, with the previous alphabet, buqn (GTCA) would read as ubnq (TGAC, reverse complement) if turned upside down. This feature is enhanced by proposing custom fonts rather than ordinary ASCII or even Unicode characters.[5]

See also [edit]

References [edit]

  1. ^ a b Reverse-complement tool page with documented IUPAC code conversion, source code available.
  2. ^ Jeremiah Faith (2011), conversion table
  3. ^ arep.med.harvard.edu A tool page with the note about the applied W-S conversion patch.
  4. ^ Rozak DA (2006). "The practical and pedagogical advantages of an ambigraphic nucleic acid notation". Nucleosides Nucleotides Nucleic Acids 25 (7): 807–13. doi:10.1080/15257770600726109. PMID 16898419. 
  5. ^ Flower, R. H.; Knoll, A. H.; Yuan, X. (1955). "Status of Endoceroid Classification". Journal of Paleontology 29 (3): 329–371. doi:10.2144/000112727.  edit

External links [edit]