|WikiProject Computational Biology||(Rated C-class, High-importance)|
|WikiProject Molecular and Cell Biology||(Rated C-class, Low-importance)|
BLOSUM62: more or less than 62% identity?
"The Henikoffs took a big database of trusted alignments (their BLOCKS database), and (in effect) only counted pairwise sequence alignments related by less than some threshold percentage identity. A threshold of 62% identity or less resulted in the target frequencies for the BLOSUM62 matrix. An 80% threshold gave the more highly conserved target frequencies of the BLOSUM80 matrix, and a 45% threshold gave the more divergent BLOSUM45 matrix."
Source: Sean R. Eddy, Where did the BLOSUM62 alignment score matrix come from? Nature Biotechnology 22, 1035--1036 (2004) doi:10.1038/nbt0804-1035
"In order to avoid over-weighting closely-related sequences, the Henikoffs replaced groups of proteins that have sequence identities higher than a threshold by either a single representative or a weighted average. The threshold of 62% produces the commonly used BLOSUM62 substitution matrix."
Source: Arthur M. Lesk, Introduction to Bioinformatics Oxford University Press, 2002, p.175
Winterschlaefer 15:52, 14 February 2007 (UTC)
- For what I know a BLOSUM62 matrix is good for alignements which have 62% or MORE identity XApple 00:32, 25 February 2007 (UTC)
I agree with Winterschlaefer. For the BLOSUM62, the Henikoffs weighted all the sequences with similarity 62% or more as one single sequence, thus contributing less to the matrix. As the paper reads,
"To reduce multiple contributions to amino acid pair frequencies from the most closely related members of a family, sequences are clustered within blocks and each cluster is weighted as a single sequence in counting pairs. This is done by specifying a clustering percentage in which sequence segments that are identical for at least that percentage of amino acids are grouped together."
Also, as I can read in the history of this article, the following statement used to be part of the references section: "BLOSUM62 is for sequences of 62% OR GREATER sequence identity, not less than 62% (Voet, D., Voet,J., 2005)" and this may well be what Voet & Voet claim. However, this is different from the following statement, which is now referenced with Voet & Voet: "BLOSUM62 is the matrix calculated by using the observed substitutions between proteins which have 62% or more". What I'm saying is that this reference does not support this claim. The BLOSUM62 matrix actually is calculated (primarily) from sequences which have 62% and less sequence identity. Still, IMHO, BLOSUM62 is designed for sequences with similarities around 62%, not more. If I'ld want to compare sequences with a similarity of 80%, I'ld choose BLOSUM80.
Source: Henikoff & Henikoff Amino acid substitution matrices from protein blocks PNAS 89, pp. 10915-10919 18.104.22.168 21:09, 28 May 2007 (UTC)
- It is definitely the case that the BLOSUM62 is based only on sequences that have 62% or more identity while the BLOSUM80 is based on sequences with 80% or more identity. Which one you use is up to your personal taste but as far as I know you would use a BLOSUM that is around your sequence identity where I agree with the speaker above. The error was fixed here. Greetings--hroest 03:39, 4 June 2008 (UTC)
This badly needs a picture of a typical Blosum matrix XApple 14:52, 12 February 2007 (UTC)
- It did get one. --hroest 05:50, 7 March 2008 (UTC)
"BLOSUM matrix" is correct
Some smart people say that one must say "BLOSUM" instead of "BLOSUM matrix" because the "M" in BLOSUM already means "matrix". The latter is correct, but the term BLOSUM is by now a name, not just an abbreviation. BLOSUM is a technical term. It is common sense in the scientific community to speak of "BLOSUM matrices". Just saying "BLOSUM" is counterintuitive and not colloquial.
Furthermore, if we wanted to get it linguistically really right, the article itself contained mistakes. It wrote: "To calculate a matrix for BLOSUM, ...". This is grammatically wrong, whatever opinion one has about BLOSUM matrices. 22.214.171.124 (talk) 10:49, 22 September 2010 (UTC)