Protein mass spectrometry
Protein mass spectrometry refers to the application of mass spectrometry to the study of proteins. Mass spectrometry is an important emerging method for the characterization of proteins. The two primary methods for ionization of whole proteins are electrospray ionization (ESI) and matrix-assisted laser desorption/ionization (MALDI). In keeping with the performance and mass range of available mass spectrometers, two approaches are used for characterizing proteins. In the first, intact proteins are ionized by either of the two techniques described above, and then introduced to a mass analyzer. This approach is referred to as "top-down" strategy of protein analysis. In the second, proteins are enzymatically digested into smaller peptides using a protease such as trypsin. Subsequently these peptides are introduced into the mass spectrometer and identified by peptide mass fingerprinting or tandem mass spectrometry. Hence, this latter approach (also called "bottom-up" proteomics) uses identification at the peptide level to infer the existence of proteins.
Whole protein mass analysis is primarily conducted using either time-of-flight (TOF) MS, or Fourier transform ion cyclotron resonance (FT-ICR). These two types of instrument are preferable here because of their wide mass range, and in the case of FT-ICR, its high mass accuracy. Mass analysis of proteolytic peptides is a much more popular method of protein characterization, as cheaper instrument designs can be used for characterization. Additionally, sample preparation is easier once whole proteins have been digested into smaller peptide fragments. The most widely used instrument for peptide mass analysis are the MALDI time-of-flight instruments as they permit the acquisition of peptide mass fingerprints (PMFs) at high pace (1 PMF can be analyzed in approx. 10 sec). Multiple stage quadrupole-time-of-flight and the quadrupole ion trap also find use in this application.
Protein and peptide fractionation coupled with mass spectrometry
Proteins of interest to biological researchers are usually part of a very complex mixture of other proteins and molecules that co-exist in the biological medium. This presents two significant problems. First, the two ionization techniques used for large molecules only work well when the mixture contains roughly equal amounts of constituents, while in biological samples, different proteins tend to be present in widely differing amounts. If such a mixture is ionized using electrospray or MALDI, the more abundant species have a tendency to "drown" or suppress signals from less abundant ones. The second problem is that the mass spectrum from a complex mixture is very difficult to interpret because of the overwhelming number of mixture components. This is exacerbated by the fact that enzymatic digestion of a protein gives rise to a large number of peptide products.
To contend with this problem, two methods are widely used to fractionate proteins, or their peptide products from an enzymatic digestion. The first method fractionates whole proteins and is called two-dimensional gel electrophoresis. The second method, high performance liquid chromatography is used to fractionate peptides after enzymatic digestion. In some situations, it may be necessary to combine both of these techniques.
Gel spots identified on a 2D Gel are usually attributable to one protein. If the identity of the protein is desired, usually the method of in-gel digestion is applied, where the protein spot of interest is excised, and digested proteolytically. The peptide masses resulting from the digestion can be determined by mass spectrometry using peptide mass fingerprinting. If this information does not allow unequivocal identification of the protein, its peptides can be subject to tandem mass spectrometry for de novo sequencing.
Characterization of protein mixtures using HPLC/MS is also called shotgun proteomics and MuDPIT (Multi-Dimensional Protein Identification Technology). A peptide mixture that results from digestion of a protein mixture is fractionated by one or two steps of liquid chromatography. The eluent from the chromatography stage can be either directly introduced to the mass spectrometer through electrospray ionization, or laid down on a series of small spots for later mass analysis using MALDI.
There are two main ways MS is used to identify proteins. Peptide mass fingerprinting (mentioned in the previous section) uses the masses of proteolytic peptides as input to a search of a database of predicted masses that would arise from digestion of a list of known proteins. If a protein sequence in the reference list gives rise to a significant number of predicted masses that match the experimental values, there is some evidence that this protein was present in the original sample.
Tandem MS is becoming a more popular experimental method for identifying proteins. Collision-induced dissociation is used in mainstream applications to generate a set of fragments from a specific peptide ion. The fragmentation process primarily gives rise to cleavage products that break along peptide bonds. Because of this simplicity in fragmentation, it is possible to use the observed fragment masses to match with a database of predicted masses for one of many given peptide sequences. Tandem MS of whole protein ions has been investigated recently using electron capture dissociation and has demonstrated extensive sequence information in principle but is not in common practice. This is sometimes referred to as the "top-down" approach in that it involves starting with the whole mass and then pulling it apart rather than starting with pieces (proteolytic fragments) and piecing the protein back together using de novo repeat detection (bottom-up).
De novo (peptide) sequencing
De novo (peptide) sequencing for mass spectrometry is typically performed without prior knowledge of the amino acid sequence. It is the process of assigning amino acids from peptide fragment masses of a protein. De novo sequencing has proven successful for confirming and expanding upon results from database searches.
As de novo sequencing is based on mass and some amino acids have identical masses (e.g. leucine and isoleucine), accurate manual sequencing can be difficult. Therefore it may be necessary to utilize a sequence homology search application to work in tandem between a database search and de novo sequencing to address this inherent limitation.
Database searching has the advantage of quickly identifying sequences, provided they have already been documented in a database. Other inherent limitations of database searching include:
- Sequence modifications/mutations: some database searches do not adequately account for alterations to the 'documented' sequence, thus can miss valuable information.
- The unknown: if a sequence is not documented, it will not be found
- False positives
- Incomplete and corrupted data: a common, unnoticed problem
Annotation peptide spectral library can also be used as a reference for protein/peptide identification. It offers the unique strength of reduced search space and increased specificity. The limitations include the following:
- Spectra not included in the library will not be identified.
- Spectra collected from different type of mass spectrometers can have quite distinct features.
- Reference spectra in the library may contain noise peak, which may lead to false positive identifications.
|This section requires expansion. (January 2009)|
A number of different algorithmic approaches have been described to identify peptides and proteins from tandem mass spectrometry (MS/MS), peptide de novo sequencing and sequence tag-based searching.
Several recent methods allow for the quantitation of proteins by mass spectrometry (quantitative proteomics). Typically, stable (e.g. non-radioactive) heavier isotopes of carbon (13C) or nitrogen (15N) are incorporated into one sample while the other one is labeled with corresponding light isotopes (e.g. 12C and 14N). The two samples are mixed before the analysis. Peptides derived from the different samples can be distinguished due to their mass difference. The ratio of their peak intensities corresponds to the relative abundance ratio of the peptides (and proteins). The most popular methods for isotope labeling are SILAC (stable isotope labeling by amino acids in cell culture), trypsin-catalyzed 18O labeling, ICAT (isotope coded affinity tagging), iTRAQ (isobaric tags for relative and absolute quantitation). “Semi-quantitative” mass spectrometry can be performed without labeling of samples. Typically, this is done with MALDI analysis (in linear mode). The peak intensity, or the peak area, from individual molecules (typically proteins) is here correlated to the amount of protein in the sample. However, the individual signal depends on the primary structure of the protein, on the complexity of the sample, and on the settings of the instrument. Other types of "label-free" quantitative mass spectrometry, uses the spectral counts (or peptide counts) of digested proteins as a means for determining relative protein amounts.
Characteristics indicative of the 3-dimensional structure of proteins can be probed with mass spectrometry in various ways. By using chemical crosslinking to couple parts of the protein that are close in space, but far apart in sequence, information about the overall structure can be inferred. By following the exchange of amide protons with deuterium from the solvent, it is possible to probe the solvent accessibility of various parts of the protein. Another interesting avenue in protein structural studies is laser-induced covalent labeling. In this technique, solvent-exposed sites of the protein are modified by hydroxyl radicals. Its combination with rapid mixing has been used in protein folding studies.
The FDA defines a biomarker as, “A characteristic that is objectively measured and evaluated as an indicator of normal biologic processes, pathogenic processes, or pharmacologic responses to a therapeutic intervention”. Mass spectrometry enables large-scale discovery of candidates for biomarkers.
In what is now commonly referred to as proteogenomics, peptides identified with mass spectrometry are used for improving gene annotations (for example, gene start sites) and protein annotations. Parallel analysis of the genome and the proteome facilitates discovery of post-translational modifications and proteolytic events, especially when comparing multiple species.
- P. Hernandez, M. Müller and R. D. Appel (2006). "Automated protein identification by tandem mass spectrometry: Issues and strategies". Mass Spectrometry Reviews 25 (2): 235–254. doi:10.1002/mas.20068. PMID 16284939.
- Snijders AP, de Vos MG, Wright PC (2005). "Novel approach for peptide quantitation and sequencing based on 15N and 13C metabolic labeling". J. Proteome Res. 4 (2): 578–85. doi:10.1021/pr0497733. PMID 15822937.
- M. Miyagi and K. C. S. Rao (2007). "Proteolytic 18O-labeling strategies for quantitative proteomics". Mass Spectrometry Reviews 26 (1): 121–136. doi:10.1002/mas.20116. PMID 17086517.
- Haqqani AS, Kelly JF, Stanimirovic DB (2008). "Quantitative protein profiling by mass spectrometry using label-free proteomics". Methods Mol. Biol. 439: 241–56. doi:10.1007/978-1-59745-188-8_17. PMID 18370108.
- Z. Zhang, D. L. Smith (1994). "Probing noncovalent structural features of proteins by mass spectrometry". Mass Spectrometry Reviews 13 (5–6): 411–429. doi:10.1002/mas.1280130503.
- T. E. Wales and J. R. Engen (2006). "Hydrogen exchange mass spectrometry for the analysis of protein dynamics". Mass Spectrometry Reviews 25 (1): 158–170. doi:10.1002/mas.20064. PMID 16208684.
- B. B. Stocks and L. Konermann (2009). "Structural Characterization of Short-Lived Protein Unfolding Intermediates by Laser-Induced Oxidative Labeling and Mass Spectrometry". Anal. Chem. 81 (1): 20–27.
- Gupta N., Tanner S., Jaitly N., Adkins J.N., Lipton M., Edwards R., Romine M., Osterman A., Bafna V., Smith R.D., et al. Whole proteome analysis of post-translational modifications: Applications of mass-spectrometry for proteogenomic annotation. Genome Res. 2007;17:1362–1377.
- Gupta N., Benhamida J., Bhargava V., Goodman D., Kain E., Kerman I., Nguyen N., Ollikainen N., Rodriguez J., Wang J., et al. Comparative proteogenomics: Combining mass spectrometry and comparative genomics to analyze multiple genomes. Genome Res. 2008;18:1133–1142.