Protein–protein interaction prediction

From Wikipedia, the free encyclopedia
Jump to: navigation, search

Protein–protein interaction prediction is a field combining bioinformatics and structural biology in an attempt to identify and catalog physical interactions between pairs or groups of proteins. Understanding protein–protein interactions is important for the investigation of intracellular signaling pathways, modelling of protein complex structures and for gaining insights into various biochemical processes. Experimentally, physical interactions between pairs of proteins can be inferred from a variety of experimental techniques, including yeast two-hybrid systems, protein-fragment complementation assays (PCA), affinity purification/mass spectrometry, protein microarrays, fluorescence resonance energy transfer (FRET), and Microscale Thermophoresis (MST). Efforts to experimentally determine the interactome of numerous species are ongoing, and a number of computational methods for interaction prediction have been developed in recent years.

Contents

[edit] Methods

Proteins that interact are more likely to co-evolve,[1][2][3][4] therefore, it is possible to make inferences about interactions between pairs of proteins based on their phylogenetic distances. It has also been observed in some cases that pairs of interacting proteins have fused orthologues in other organisms. In addition, a number of bound protein complexes have been structurally solved and can be used to identify the residues that mediate the interaction so that similar motifs can be located in other organisms.

[edit] Phylogenetic profiling

Phylogenetic profiling[5] finds pairs of protein families with similar patterns of presence or absence across large numbers of species. This method identifies pairs likely to act in the same biological process, but does not necessarily imply physical interaction.

[edit] Prediction of co-evolved protein pairs based on similar phylogenetic trees

This method[6] involves using a sequence search tool such as BLAST for finding homologues of a pair of proteins, then building multiple sequence alignments with alignment tools such as Clustal. From these multiple sequence alignments, phylogenetic distance matrices are calculated for each protein in the hypothesized interacting pair. If the matrices are sufficiently similar (as measured by their Pearson correlation coefficient) they are deemed likely to interact.

[edit] Identification of homologous interacting pairs

This method[7] consists of searching whether the two sequences have homologues which form a complex in a database of known structures of complexes. The identification of the domains is done by sequence searches against domain databases such as Pfam using BLAST. If more than one complex of Pfam domains is identified, then the query sequences are aligned using a hidden Markov tool called HMMER to the closest identified homologues, whose structures are known. Then the alignments are analysed to check whether the contact residues of the known complex are conserved in the alignment.

[edit] Identification of structural patterns

This method[8][9] builds a library of known protein–protein interfaces from the PDB, where the interfaces are defined as pairs of polypeptide fragments that are below a threshold slightly larger than the Van der Waals radius of the atoms involved. The sequences in the library are then clustered based on structural alignment and redundant sequences are eliminated. The residues that have a high (generally >50%) level of frequency for a given position are considered hotspots.[10] This library is then used to identify potential interactions between pairs of targets, providing that they have a known structure (i.e. present in the PDB).

[edit] Bayesian network modelling

Bayesian methods[11] integrate data from a wide variety of sources, including both experimental results and prior computational predictions, and use these features to assess the likelihood that a particular potential protein interaction is a true positive result. These methods are useful because experimental procedures, particularly the yeast two-hybrid experiments, are extremely noisy and produce many false positives, while the previously mentioned computational methods can only provide circumstantial evidence that a particular pair of proteins might interact.

[edit] 3D template-based protein complex modelling

This method[12][13][14][15] makes use of known protein complex structures to predict as well as structurally model interactions between query protein sequences. The prediction process generally starts by employing a sequence based method (e.g. Interolog) to search for protein complex structures that are homologous to the query sequences. These known complex structures are then used as templates to structurally model the interaction between query sequences. This method has the advantage of not only inferring protein interactions but also suggests models of how proteins interact structurally, which can provide some insights into the atomic level mechanism of that interaction. On the other hand, the ability for this method to makes a prediction is limited to a relatively small number of known protein complex structures.

[edit] Supervised learning problem

The problem of PPI prediction can be framed as a supervised learning problem. In this paradigm the known protein interactions supervise the estimation of a function that can predict whether an interaction exists or not between two proteins given data about the proteins (e.g., expression levels of each gene in different experimental conditions, location information, phylogenetic profile, etc.).

[edit] Relationship to docking methods

The field of protein–protein interaction prediction is closely related to the field of protein–protein docking, which attempts to use geometric and steric considerations to fit two proteins of known structure into a bound complex. This is a useful mode of inquiry in cases where both proteins in the pair have known structures and are known (or at least strongly suspected) to interact, but since so many proteins do not have experimentally determined structures, sequence-based interaction prediction methods are especially useful in conjunction with experimental studies of an organism's interactome.

[edit] See also

[edit] Servers

[edit] References

  1. ^ Dandekar T., Snel B.,Huynen M. and Bork P. (1998) "Conservation of gene order: a fingerprint of proteins that physically interact." Trends Biochem. Sci. (23),324-328
  2. ^ Enright A.J.,Iliopoulos I.,Kyripides N.C. and Ouzounis C.A. (1999) "Protein interaction maps for complete genomes based on gene fusion events." Nature (402), 86-90
  3. ^ Marcotte E.M., Pellegrini M., Ng H.L., Rice D.W., Yeates T.O., Eisenberg D. (1999) "Detecting protein function and protein-protein interactions from genome sequences." Science (285), 751-753
  4. ^ Pazos F., Valencia A. (2001). "Similarity of phylogenetic trees as indicator of protein-protein interaction." Protein Engineering, 9 (14), 609-614
  5. ^ Pellegrini M, Marcotte EM, Thompson MJ, Eisenberg D, Yeates TO. (1999) "Assigning protein functions by comparative genome analysis: protein phylogenetic profiles." Proc Natl Acad Sci U S A., 96, 4285-8
  6. ^ Tan S.H., Zhang Z., Ng S.K. (2004) "ADVICE: Automated Detection and Validation of Interaction by Co-Evolution." Nucl. Ac. Res., 32 (Web Server issue):W69-72.
  7. ^ Aloy P.,Russell R.B. "InterPreTS: Protein Interaction Prediction through Tertiary Structure." Bioinformatics, 19 (1), 161-162
  8. ^ Aytuna A. S., Keskin O., Gursoy A. (2005) "Prediction of protein-protein interactions by combining structure and sequence conservation in protein interfaces." Bioinformatics, 21 (12), 2850–2855
  9. ^ Ogmen U., Keskin O., Aytuna A.S., Nussinov R. and Gursoy A. (2005) "PRISM: protein interactions by structural matching." Nucl. Ac. Res.,33 (Web Server issue):W331-336
  10. ^ Keskin O., Ma B. and Nussinov R. (2004) "Hot regions int protein-protein interactions: The organization and contribution of structurally conserved hot spot residues" J. Mol. Biol., (345),1281–1294
  11. ^ Jansen R, Yu H, Greenbaum D, Kluger Y, Krogan NJ, Chung S, Emili A, Snyder M, Greenblatt JF, Gerstein M. (2003) A Bayesian networks approach for predicting protein-protein interactions from genomic data." Science, 302(5644):449-53.
  12. ^ Aloy P., and R. B. Russell. (2003) "InterPreTS: protein Interaction Prediction through Tertiary Structure". Bioinformatics, 19 (1), 161-162.
  13. ^ Chen YC, YS Lo, WC Hsu, and JM Yang. (2007). "3D-partner: a web server to infer interacting partners and binding models". Nucleic Acids Research, 35 (Web Server issue): 561-7.
  14. ^ Fukuhara, Naoshi, and Takeshi Kawabata. (2008) "HOMCOS: a server to predict interacting protein pairs and interacting sites by homology modeling of complex structures" Nucleic Acids Research, 36 (S2): 185-.
  15. ^ Kittichotirat W, M Guerquin, RE Bumgarner, and R Samudrala (2009) "Protinfo PPC: a web server for atomic level prediction of protein complexes" Nucleic Acids Research, 37 (Web Server issue): 519-25.


Personal tools
Namespaces
Variants
Actions
Navigation
Interaction
Toolbox
Print/export
Languages