Gap penalty
Gap penalties are used during sequence alignment. Gap penalties contribute to the overall score of alignments, and therefore, the size of the gap penalty relative to the entries in the similarity matrix affects the alignment that is finally selected. Selecting a higher gap penalty will cause less favourable characters to be aligned, to avoid creating as many gaps.
Contents |
[edit] Constant gap penalty
Constant gap penalties are the simplest type of gap penalty. The only parameter, d, is added to the alignment score when the gap is first opened. This means that any gap receives the same penalty, regardless of its size.
[edit] Linear gap penalty
Linear gap penalties have only one parameter, d, which is a penalty per unit length of gap. This is almost always negative, so that the alignment with fewer gaps is favoured over the alignment with more gaps. Under a linear gap penalty, the overall penalty for one large gap is the same as for many small gaps.
[edit] Affine gap penalty
Some sequences are more likely to have a large gap, rather than many small gaps. For example, a biological sequence is much more likely to have one big gap of length 10, due to a single insertion or deletion event, than it is to have 10 small gaps of length 1. Affine gap penalties use a gap opening penalty, o, and a gap extension penalty, e. A gap of length l is then given a penalty o + (l-1)e. So that gaps are discouraged, o is almost always negative. Because a few large gaps are better than many small gaps, e, though negative, is almost always less negative than o, so as to encourage gap extension, rather than gap introduction.
[edit] Further reading
- Taylor WR, Munro RE (1997). "Multiple sequence threading: conditional gap placement". Fold Des 2 (4): S33-9.
- Taylor WR (1996). "A non-local gap-penalty for profile alignment". Bull Math Biol 58 (1): 1–18. doi:10.1007/BF02458279. PMID 8819751.
- Vingron M, Waterman MS (1994). "Sequence alignment and penalty choice. Review of concepts, case studies and implications". J Mol Biol 235 (1): 1–12. doi:10.1016/S0022-2836(05)80006-3. PMID 8289235.
- Panjukov VV (1993). "Finding steady alignments: similarity and distance". Comput Appl Biosci 9 (3): 285–90. PMID 8324629.
- Alexandrov NN (1992). "Local multiple alignment by consensus matrix". Comput Appl Biosci 8 (4): 339–45. PMID 1498689.
- Hein J (1989). "A new method that simultaneously aligns and reconstructs ancestral sequences for any number of homologous sequences, when the phylogeny is given". Mol Biol Evol 6 (6): 649–68. PMID 2488477.
- Henneke CM (1989). "A multiple sequence alignment algorithm for homologous proteins using secondary structure information and optionally keying alignments to functionally important sites". Comput Appl Biosci 5 (2): 141–50. PMID 2751764.
- Reich JG, Drabsch H, Daumler A (1984). "On the statistical assessment of similarities in DNA sequences". Nucleic Acids Res 12 (13): 5529–43. doi:10.1093/nar/12.13.5529. PMC 318937. PMID 6462914. http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=318937.