Jump to content

Canonical sequence

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by Lizia7 (talk | contribs) at 17:26, 16 September 2013 (inline citation). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

A canonical sequence is a sequence of DNA, RNA, or amino acids that reflects the most common choice of base or amino acid at each position. Many databases use or only give the canonical sequence. The UniProtKB/Swiss-Prot policy for example describes all the protein products encoded by one gene and uses the following criteria for the entry of a canonical sequence:[1]

  1. It is the most prevalent.
  2. It is the most similar to orthologous sequences found in other species.
  3. By virtue of its length or amino acid composition, it allows the clearest description of domains, isoforms, polymorphisms, post-translational modifications, etc.
  4. In the absence of any information, we choose the longest sequence.

See also

  1. ^ "What is the canonical sequence? Are all isoforms described in one entry?".