| Finished writing a draft article? Are you ready to request review of it by an experienced editor for possible inclusion in Wikipedia? Submit your draft for review! |
Probalign is a sequence alignment tool that calculates a maximum expected accuracy alignment using partition function posterior probabilities.[1] Base pair probabilities are estimated using an estimate similar to Boltzmann distribution. The partition function is calculated using a dynamic programming approach.
Algorithm[edit]
The following describes the algorithm used by probalign to determine the base pair probabilities.[2]
Alignment score[edit]
To score an alignment of two sequences two things are needed:
- a similarity function (e.g. PAM, BLOSUM,...)
- affine gap penalty:
The score of an alignment a is defined as:
Now the boltzmann weighted score of an alignment a is:
Where is a scaling factor.
The probability of an alignment assuming boltzmann distribution is given by
Where is the partition function, i.e. the sum of the boltzmann weights of all alignments.
Dynamic Programming[edit]
Let denote the partition function of the prefixes and . Three different cases are considered:
- the partition function of all alignments of the two prefixes that end in a match.
- the partition function of all alignments of the two prefixes that end in an insertion .
- the partition function of all alignments of the two prefixes that end in a deletion .
Then we have:
Initialization[edit]
The matrixes are initialized as follows:
Recursion[edit]
The partition function for the alignments of two sequences and is given by , which can be recursively computed:
- analogously
Base pair probability[edit]
Finally the probability that positions and form a base pair is given by:
References[edit]
- ^ U. Roshan and D. R. Livesay, Probalign: multiple sequence alignment using partition function posterior probabilities, Bioinformatics, 22(22):2715-21, 2006 (PDF)
- ^ Lecture "Bioinformatics II" at University of Freiburg [1]
See also[edit]
External Links[edit]
Probalign Webservice