Phyre / Phyre2

From Wikipedia, the free encyclopedia
Jump to: navigation, search
Phyre2
Developer(s)
  • Lawrence Kelley
  • Bob Maccallum
  • Benjamin Jefferys
  • Alex Herbert
  • Riccardo Bennett-Lovsey
  • Michael Sternberg
Stable release 2.0 / 23 February 2011; 3 years ago (2011-02-23)
Written in
Available in English
Type Bioinformatics tool for protein structure prediction
License Creative Commons Attribution-NonCommercial-2.0
Website www.sbg.bio.ic.ac.uk/phyre2

Phyre and Phyre2 (Protein Homology/AnalogY Recognition Engine; pronounced as 'fire') are web-based services for protein structure prediction that are free for non-commercial use.[1][2][3] Phyre is among the most popular methods for protein structure prediction having been cited over 1500 times.[4] Like other remote homology recognition techniques (see protein threading), it is able to regularly generate reliable protein models when other widely used methods such as PSI-BLAST cannot. Phyre2 has been designed (funded by the BBSRC)[citation needed] to ensure a user-friendly interface for users inexpert in protein structure prediction methods.

Description[edit]

The Phyre and Phyre2 servers predict the three-dimensional structure of a protein sequence using the principles and techniques of homology modeling. Because the structure of a protein is more conserved in evolution than its amino acid sequence, a protein sequence of interest (the target) can be modeled with reasonable accuracy on a very distantly related sequence of known structure (the template), provided that the relationship between target and template can be discerned through sequence alignment. Currently the most powerful and accurate methods for detecting and aligning remotely related sequences rely on profiles or hidden Markov models (HMMs). These profiles/HMMs capture the mutational propensity of each position in an amino acid sequence based on observed mutations in related sequences and can be thought of as an 'evolutionary fingerprint' of a particular protein.

Typically, the amino acid sequences of a representative set of all known three-dimensional protein structures is compiled, and these sequences are processed by scanning against a large protein sequence database. The result is a database of profiles or HMMs, one for each known 3D structure. A user sequence of interest is similarly processed to form a profile/HMM. This user profile is then scanned against the database of profiles using profile-profile or HMM-HMM alignment techniques. These alignments can also take into account patterns of predicted or known secondary structure elements and can be scored using various statistical models. See protein structure prediction for more information.

The first Phyre server was released in June 2005 and uses a profile-profile alignment algorithm based on each proteins position-specific scoring matrix.[5] The Phyre2 server was publicly released February 2011 as a replacement for the original Phyre server and provides extra functionality over Phyre, a more advanced interface, fully updated fold library and uses the HHpred / HHsearch package for homology detection among other improvements.

Standard usage[edit]

After pasting a protein amino acid sequence into the Phyre or Phyre2 submission form, a user will typically wait between 30 minutes and several hours (depending on factors such as sequence length, number of homologous sequences and frequency and length of insertions and deletions) for a prediction to complete. An email containing summary information and the predicted structure in PDB format are sent to the user together with a link to a web page of results. The Phyre2 results screen is divided into three main sections, described below.

Secondary structure and disorder prediction[edit]

Example Phyre2 output for secondary structure and disorder prediction

The user-submitted protein sequence is first scanned against a large sequence database using PSI-BLAST. The profile generated by PSI-BLAST is then processed by the neural network secondary structure prediction program PsiPred[6] and the protein disorder predictor Disopred.[7] The predicted presence of alpha-helices, beta-strands and disordered regions is shown graphically together with a color-coded confidence bar.

Domain analysis[edit]

Example Phyre2 output showing multiple domains and pop-up model viewer

Many proteins contain multiple protein domains. Phyre2 provides a table of template matches color-coded by confidence and indicating the region of the user sequence matched. This can aid in the determination of the domain composition of a protein.

Detailed template information[edit]

Example Phyre2 detailed template information table

The main results table in Phyre2 provides confidence estimates, images and links to the three-dimensional predicted models and information derived from either Structural Classification of Proteins database (SCOP) or the Protein Data Bank (PDB) depending on the source of the detected template. For each match a link takes the user to a detailed view of the alignment between the user sequence and the sequence of known three-dimensional structure.

Alignment view[edit]

Example Phyre2 detailed view of the alignment between a user sequence and a known protein structure.

The detailed alignment view permits a user to examine individual aligned residues, matches between predicted and known secondary structure elements and the ability to toggle information regarding patterns of sequence conservation and secondary structure confidence. In addition Jmol is used to permit interactive 3D viewing of the protein model.

Improvements in Phyre2[edit]

Phyre2 uses a fold library that is updated weekly as new structures are solved. It uses a more up-to-date interface and offers additional functionality over the Phyre server as described below.

Additional functionality[edit]

Batch processing[edit]

The batch processing feature permits users to submit more than one sequence to Phyre2 by uploading a file of sequences in FASTA format. By default, users have a limit of 100 sequences in a batch. This limit can be raised by contacting the administrator. Batch jobs are processed in the background on free computing power as it becomes available. Thus, batch jobs will often take longer than individually submitted jobs, but this is necessary to allow a fair distribution of computing resources to all Phyre2 users.

One to one threading[edit]

One to one threading allows you to upload both a sequence you wish modelled AND the template on which to model it. Users sometimes have a protein sequence that they wish to model on a specific template of their choice. This may be for example a newly solved structure that is not in the Phyre2 database or because of some additional biological information that indicates the chosen template would produce a more accurate model than the one(s) automatically chosen by Phyre2.

Backphyre[edit]

Instead of predicting the 3D structure of a protein sequence, often users have a solved structure and they are interested in determining if there is a related structure in a genome of interest. In Phyre2 an uploaded protein structure can be converted into a hidden Markov model and then scanned against a set of genomes (more than 20 genomes as of March 2011). This functionality is called "BackPhyre" to indicate how Phyre2 is being used in reverse.

Phyrealarm[edit]

Sometimes Phyre2 can't detect any confident matches to known structures. However, the fold library database increases by about 40-100 new structures each week. So even though there might be no decent templates this week, there may well be in the coming weeks. Phyrealarm allows users to submit a protein sequence to be automatically scanned against new entries added to the fold library every week. If a confident hit is detected, the user is automatically notified by email together with the results of the Phyre2 search. Users can also control the level of alignment coverage and confidence in the match required to trigger an email alert.

3DLigandSite[edit]

Phyre2 is coupled to the 3DLigandSite[8] server for protein binding site prediction. 3DLigandSite has been one of the top performing servers for binding site prediction at the Critical Assessment of Techniques for Protein Structure Prediction (CASP) in (CASP8 and CASP9). Confident models produced by Phyre2 (confidence >90%) are automatically submitted to 3DLigandSite.

Transmembrane topology prediction[edit]

The program memsat_svm[9] is used to predict the presence and topology of any transmembrane helices present in the user protein sequence.

Multi-template modelling[edit]

Phyre2 permits users to choose 'Intensive' modelling from the main submission screen. This mode:

  • Examines the list of hits and applies heuristics in order to select templates that maximise sequence coverage and confidence.
  • Constructs models for each selected template.
  • Uses these models to provide pairwise distance constraints that are input to the ab initio and multi-template modelling tool Poing.[10]
  • Poing synthesises the user protein in the context of these distance constraints, modelled by springs. Regions for which there is no template information are modelled by the ab initio simplified physics model of Poing.
  • The complete model generated by Poing is combined with the original templates as input to MODELLER.

Applications and performance[edit]

Applications of Phyre and Phyre2 include protein structure prediction, function prediction, domain prediction, domain boundary prediction, evolutionary classification of proteins, guiding site-directed mutagenesis and solving protein crystal structures by molecular replacement. In the CASP8 blind protein structure prediction experiment, Phyre_de_novo (the predecessor of Phyre2) was ranked 4th out of 71 automatic structure prediction servers. In CASP9, Phyre2 was ranked 5th on all template-based modelling (TBM) targets and 2nd on the more difficult TBM/FM (free modelling) targets out of the 79 participating servers.

History[edit]

Phyre and Phyre2 are the successors to the 3D-PSSM[11] protein structure prediction system which has over 1400 citations to date.[12] 3D-PSSM was designed and developed by Lawrence Kelley[13] and Bob MacCallum[14] in the Biomolecular modelling Lab[15] at the Cancer Research UK. Phyre and Phyre2 were Lawrence Kelley in the Structural bioinformatics group,[16] Imperial College London. Components of the Phyre and Phyre2 systems were developed by Benjamin Jefferys,[17] Alex Herbert,[18] and Riccardo Bennett-Lovsey.[19] Research and development of both servers was supervised by Michael Sternberg.

References[edit]

  1. ^ Lawrence Kelley, Riccardo Bennett-Lovsey, Alex Herbert, Kieran Fleming. "Phyre: Protein Homology/analogY Recognition Engine". Structural Bioinformatics Group, Imperial College, London. Retrieved 22 April 2011. 
  2. ^ Lawrence Kelley, Benjamin Jefferys. "Phyre2: Protein Homology/analogY Recognition Engine V 2.0". Structural Bioinformatics Group, Imperial College, London. Retrieved 22 April 2011. 
  3. ^ Kelley, L. A.; Sternberg, M. J. E. (2009). "Protein structure prediction on the Web: A case study using the Phyre server". Nature Protocols 4 (3): 363. doi:10.1038/nprot.2009.2. 
  4. ^ Number of results returned from a search on Google Scholar (Google Scholar search)
  5. ^ Bennett-Lovsey, R. M.; Herbert, A. D.; Sternberg, M. J. E.; Kelley, L. A. (2007). "Exploring the extremes of sequence/structure space with ensemble fold recognition in the program Phyre". Proteins: Structure, Function, and Bioinformatics 70 (3): 611. doi:10.1002/prot.21688. 
  6. ^ McGuffin, L. J.; Bryson, K.; Jones, D. T. (2000). "The PSIPRED protein structure prediction server". Bioinformatics 16 (4): 404. doi:10.1093/bioinformatics/16.4.404. 
  7. ^ Jones, D. T.; Ward, J. J. (2003). "Prediction of disordered regions in proteins from position specific score matrices". Proteins: Structure, Function, and Genetics 53: 573. doi:10.1002/prot.10528. 
  8. ^ Wass, M. N.; Kelley, L. A.; Sternberg, M. J. E. (2010). "3DLigand Site: Predicting ligand-binding sites using similar structures". Nucleic Acids Research 38: W469. doi:10.1093/nar/gkq406. 
  9. ^ Jones, D. T. (2007). "Improving the accuracy of transmembrane protein topology prediction using evolutionary information". Bioinformatics 23 (5): 538. doi:10.1093/bioinformatics/btl677. 
  10. ^ Jefferys, B. R.; Kelley, L. A.; Sternberg, M. J. E. (2010). "Protein Folding Requires Crowd Control in a Simulated Cell". Journal of Molecular Biology 397 (5): 1329. doi:10.1016/j.jmb.2010.01.074. 
  11. ^ Kelley, L. A.; MacCallum, R. M.; Sternberg, M. J. E. (2000). "Enhanced genome annotation using structural profiles in the program 3D-PSSM". Journal of Molecular Biology 299 (2): 501. doi:10.1006/jmbi.2000.3741. 
  12. ^ Number of results returned from a search on Google Scholar. (Google Scholar search)
  13. ^ Dr. Lawrence Kelley
  14. ^ Dr. Bob Maccallum
  15. ^ Biomolecular Modelling laboratory
  16. ^ Structural Bioinformatics Group
  17. ^ Dr. Benjamin Jefferys
  18. ^ Dr. Alex Herbert
  19. ^ Dr. Riccardo Bennett-Lovsey