Protein subcellular localization prediction: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
MetaLocGramN predictor was added.
No edit summary
Line 19: Line 19:
* [http://gpcr.biocomp.unibo.it/bacello/ BaCelLo]: Prediction of eukaryotic protein subcellular localization. Unlike other methods, the predictions are balanced among different classes and all the localizations that are predicted are considered as equiprobable, to avoid mispredictions.<ref name="pmid16873501">{{cite journal | author = Pierleoni A, Martelli PL, Fariselli P, Casadio R | title = BaCelLo: a balanced subcellular localization predictor | journal = Bioinformatics | volume = 22 | issue = 14 | pages = e408–16 | year = 2006 | month = July | pmid = 16873501 | doi = 10.1093/bioinformatics/btl222 | url = | issn = }}</ref>
* [http://gpcr.biocomp.unibo.it/bacello/ BaCelLo]: Prediction of eukaryotic protein subcellular localization. Unlike other methods, the predictions are balanced among different classes and all the localizations that are predicted are considered as equiprobable, to avoid mispredictions.<ref name="pmid16873501">{{cite journal | author = Pierleoni A, Martelli PL, Fariselli P, Casadio R | title = BaCelLo: a balanced subcellular localization predictor | journal = Bioinformatics | volume = 22 | issue = 14 | pages = e408–16 | year = 2006 | month = July | pmid = 16873501 | doi = 10.1093/bioinformatics/btl222 | url = | issn = }}</ref>
* [http://cello.life.nctu.edu.tw/ CELLO]: CELLO uses a two-level Support Vector Machine system to assign localizations to both prokaryotic and eukaryotic proteins.<ref name="pmid15096640">{{cite journal | author = Yu CS, Lin CJ, Hwang JK | title = Predicting subcellular localization of proteins for Gram-negative bacteria by support vector machines based on n-peptide compositions | journal = Protein Sci. | volume = 13 | issue = 5 | pages = 1402–6 | year = 2004 | month = May | pmid = 15096640 | pmc = 2286765 | doi = 10.1110/ps.03479604 | url = | issn = }}</ref><ref name="pmid16752418">{{cite journal | author = Yu CS, Chen YC, Lu CH, Hwang JK | title = Prediction of protein subcellular localization | journal = Proteins | volume = 64 | issue = 3 | pages = 643–51 | year = 2006 | month = August | pmid = 16752418 | doi = 10.1002/prot.21018 | url = | issn = }}</ref>
* [http://cello.life.nctu.edu.tw/ CELLO]: CELLO uses a two-level Support Vector Machine system to assign localizations to both prokaryotic and eukaryotic proteins.<ref name="pmid15096640">{{cite journal | author = Yu CS, Lin CJ, Hwang JK | title = Predicting subcellular localization of proteins for Gram-negative bacteria by support vector machines based on n-peptide compositions | journal = Protein Sci. | volume = 13 | issue = 5 | pages = 1402–6 | year = 2004 | month = May | pmid = 15096640 | pmc = 2286765 | doi = 10.1110/ps.03479604 | url = | issn = }}</ref><ref name="pmid16752418">{{cite journal | author = Yu CS, Chen YC, Lu CH, Hwang JK | title = Prediction of protein subcellular localization | journal = Proteins | volume = 64 | issue = 3 | pages = 643–51 | year = 2006 | month = August | pmid = 16752418 | doi = 10.1002/prot.21018 | url = | issn = }}</ref>
* [http://toolkit.tuebingen.mpg.de/clubsubp ClubSub-P]: ClubSub-P is a database of cluster-based subcellular localization (SCL) predictions for Archaea and Gram negative bacteria. <ref name="pmid22073040">{{cite journal | author = Nagarajan Paramasivam, Dirk Linke | title = ClubSub-P is a database of cluster-based subcellular localization (SCL) predictions for Archaea and Gram negative bacteria | journal = Frontiers in Microbiology | volume = 2| year = 2011 | pmid = 22073040 | doi = 10.3389/Ffmicb.2011.00218}}</ref>
* [http://www.csbio.sjtu.edu.cn/bioinf/euk-multi-2/ Euk-mPLoc 2.0]: Predicting the subcellular localization of eukaryotic proteins with both single and multiple sites.<ref name="chou2">{{cite journal | author = Chou KC, Shen HB | title = A new method for predicting the subcellular localization of eukaryotic proteins with both single and multiple sites: Euk-mPLoc 2.0 | journal = PLoS ONE | volume = 5 | issue = 4 | pages = e9931 | year = 2010 | pmid = 20368981 | pmc = 2848569 | doi = 10.1371/journal.pone.0009931 | url = | issn = }}</ref>
* [http://www.csbio.sjtu.edu.cn/bioinf/euk-multi-2/ Euk-mPLoc 2.0]: Predicting the subcellular localization of eukaryotic proteins with both single and multiple sites.<ref name="chou2">{{cite journal | author = Chou KC, Shen HB | title = A new method for predicting the subcellular localization of eukaryotic proteins with both single and multiple sites: Euk-mPLoc 2.0 | journal = PLoS ONE | volume = 5 | issue = 4 | pages = e9931 | year = 2010 | pmid = 20368981 | pmc = 2848569 | doi = 10.1371/journal.pone.0009931 | url = | issn = }}</ref>
* [http://www.umr6026.univ-rennes1.fr/english/home/research/basic/software/cobalten CoBaltDB]: CoBaltDB is a novel powerful platform that provides easy access to the results of multiple localization tools and support for predicting prokaryotic protein localizations.<ref name="pmid20331850">{{cite journal | author = Goudenège D, Avner S, Lucchetti-Miganeh C, Barloy-Hubler F | title = CoBaltDB: Complete bacterial and archaeal orfeomes subcellular localization database and associated resources | journal = BMC Microbiol. | volume = 10 | issue = | pages = 88 | year = 2010 | pmid = 20331850 | pmc = 2850352 | doi = 10.1186/1471-2180-10-88 | url = | issn = }}</ref>
* [http://www.umr6026.univ-rennes1.fr/english/home/research/basic/software/cobalten CoBaltDB]: CoBaltDB is a novel powerful platform that provides easy access to the results of multiple localization tools and support for predicting prokaryotic protein localizations.<ref name="pmid20331850">{{cite journal | author = Goudenège D, Avner S, Lucchetti-Miganeh C, Barloy-Hubler F | title = CoBaltDB: Complete bacterial and archaeal orfeomes subcellular localization database and associated resources | journal = BMC Microbiol. | volume = 10 | issue = | pages = 88 | year = 2010 | pmid = 20331850 | pmc = 2850352 | doi = 10.1186/1471-2180-10-88 | url = | issn = }}</ref>

Revision as of 12:31, 19 June 2012

Protein subcellular localization prediction involves the computational prediction of where a protein resides in a cell. Prediction of protein subcellular localization is an important component of bioinformatics-based prediction of protein function and genome annotation, and it can aid the identification of drug targets.

Most eukaryotic proteins are encoded in the nuclear genome and synthesized in the cytosol, but many need to be further sorted before they reach their final destination. For prokaryotes, proteins are synthesized in the cytoplasm and some must be targeted to other locations such as to a cell membrane or the extracellular environment. Proteins must be localized at their appropriate subcellular compartment to perform their desired function.

Experimentally determining the subcellular localization of a protein is a laborious and time consuming task. Through the development of new approaches in computer science, coupled with an increased dataset of proteins of known localization, computational tools can now provide fast and accurate localization predictions for many organisms. This has resulted in subcellular localization prediction becoming one of the challenges being successfully aided by bioinformatics. Many protein subcellular localization prediction methods now exceed the accuracy of some high-throughput laboratory methods for the identification of protein subcellular localization.[1]

Particularly, some predictors developed recently[2] can be used to deal with proteins that may simultaneously exist, or move between, two or more different subcellular locations.

Methods

Several computational tools for predicting the subcellular localization of a protein are publicly available, a few of which are listed below. The development of protein subcellular location prediction has been summarized in two comprehensive review articles.[3][4]

Also, the predictors were specialized for proteins in different organisms. Some was specialized for eukaryotic proteins,[5] some for human proteins,[6] and some for plant proteins.[7] Methods for the prediction of bacterial localization predictors, and their accuracy, have been recently reviewed.[8]

  • Cell-PLoc: A package of web-servers for predicting subcellular localization of proteins in various organisms.[2]
  • BaCelLo: Prediction of eukaryotic protein subcellular localization. Unlike other methods, the predictions are balanced among different classes and all the localizations that are predicted are considered as equiprobable, to avoid mispredictions.[9]
  • CELLO: CELLO uses a two-level Support Vector Machine system to assign localizations to both prokaryotic and eukaryotic proteins.[10][11]
  • ClubSub-P: ClubSub-P is a database of cluster-based subcellular localization (SCL) predictions for Archaea and Gram negative bacteria. [12]
  • Euk-mPLoc 2.0: Predicting the subcellular localization of eukaryotic proteins with both single and multiple sites.[13]
  • CoBaltDB: CoBaltDB is a novel powerful platform that provides easy access to the results of multiple localization tools and support for predicting prokaryotic protein localizations.[14]
  • HSLpred: This method allow to predict subcellular localization of human proteins. This method combines power of composition based SVM models and similarity search techniques PSI-BLAST.[15]
  • KnowPredsite: A knowledge-based approach to predict the localization site(s) of both single-localized and multi-localized proteins for all eukaryotes.[16]
  • LOCtree: Prediction based on mimicking the cellular sorting mechanism using a hierarchical implementation of support vector machines. LOCtree is a comprehensive predictor incorporating predictions based on PROSITE/PFAM signatures as well as SwissProt keywords.[17]
  • MultiLoc: An SVM-based prediction engine for a wide range of subcellular locations.[18]
  • PSORT: The first widely used method for protein subcellular localization prediction, developed under the leadership of Kenta Nakai.[19] Now researchers are also encouraged to use other PSORT programs such as WoLF PSORT and PSORTb for making predictions for certain types of organisms (see below). PSORT prediction performances are lower than those of recently developed predictors.
  • PSORTb: Prediction of bacterial protein localization.[20][21]
  • MetaLocGramN: Meta subcellular localization predictor of Gram-negative protein.[22] (submitted)
  • PredictNLS: Prediction of nuclear localization signals.[23]
  • Proteome Analyst: Prediction of protein localization for both prokaryotes and eukaryotes using a text mining approach.[24]
  • SecretomeP: Prediction of eukaryotic proteins that are secreted via a non-traditional secretory mechanism.[25]
  • SherLoc: An SVM-based predictor combining MultiLoc with text-based features derived from PubMed abstracts.[26]
  • TargetP: Prediction of N-terminal sorting signals.[27]
  • WoLF PSORT: An updated version of PSORT/PSORT II for the prediction of eukaryotic sequences.[28]

Application

Determining subcellular localization is important for understanding protein function and is a critical step in genome annotation.

Knowledge of the subcellular localization of a protein can significantly improve target identification during the drug discovery process. For example, secreted proteins and plasma membrane proteins are easily accessible by drug molecules due to their localization in the extracellular space or on the cell surface.

Bacterial cell surface and secreted proteins are also of interest for their potential as vaccine candidates or as diagnostic targets.

Aberrant subcellular localization of proteins has been observed in the cells of several diseases, such as cancer and Alzheimer’s disease.

Secreted proteins from some archaea that can survive in unusual environments have industrially important applications.

References

  1. ^ Rey S, Gardy JL, Brinkman FS (2005). "Assessing the precision of high-throughput computational and laboratory approaches for the genome-wide identification of protein subcellular localization in bacteria". BMC Genomics. 6: 162. doi:10.1186/1471-2164-6-162. PMC 1314894. PMID 16288665.{{cite journal}}: CS1 maint: multiple names: authors list (link) CS1 maint: unflagged free DOI (link)
  2. ^ a b Chou KC, Shen HB (2008). "Cell-PLoc: a package of Web servers for predicting subcellular localization of proteins in various organisms (updated version: Cell-PLoc 2.0: An improved package of web-servers for predicting subcellular localization of proteins in various organisms, Natural Science, 2010, 2, 1090-1103)". Nat Protoc. 3 (2): 153–62. doi:10.1038/nprot.2007.494. PMID 18274516.
  3. ^ Nakai, K. Protein sorting signals and prediction of subcellular localization. Adv. Protein Chem., 2000, 54, 277-344.
  4. ^ Chou, K. C.; Shen, H. B. Review: Recent progresses in protein subcellular location prediction. Anal. Biochem., 2007, 370, 1-16.
  5. ^ Chou, K. C.; Wu, Z. C.; Xiao, X. iLoc-Euk: A Multi-Label Classifier for Predicting the Subcellular Localization of Singleplex and Multiplex Eukaryotic Proteins, PLoS ONE, 2011, 6, e18258.
  6. ^ Shen HB, Chou KC (2009). "A top-down approach to enhance the power of predicting human protein subcellular localization: Hum-mPLoc 2.0". Anal. Biochem. 394 (2): 269–74. doi:10.1016/j.ab.2009.07.046. PMID 19651102. {{cite journal}}: Unknown parameter |month= ignored (help)
  7. ^ Chou KC, Shen HB (2010). "Plant-mPLoc: a top-down strategy to augment the power for predicting plant protein subcellular localization". PLoS ONE. 5 (6): e11335. doi:10.1371/journal.pone.0011335. PMC 2893129. PMID 20596258.{{cite journal}}: CS1 maint: unflagged free DOI (link)
  8. ^ Gardy JL, Brinkman FS (2006). "Methods for predicting bacterial protein subcellular localization". Nat. Rev. Microbiol. 4 (10): 741–51. doi:10.1038/nrmicro1494. PMID 16964270. {{cite journal}}: Unknown parameter |month= ignored (help)
  9. ^ Pierleoni A, Martelli PL, Fariselli P, Casadio R (2006). "BaCelLo: a balanced subcellular localization predictor". Bioinformatics. 22 (14): e408–16. doi:10.1093/bioinformatics/btl222. PMID 16873501. {{cite journal}}: Unknown parameter |month= ignored (help)CS1 maint: multiple names: authors list (link)
  10. ^ Yu CS, Lin CJ, Hwang JK (2004). "Predicting subcellular localization of proteins for Gram-negative bacteria by support vector machines based on n-peptide compositions". Protein Sci. 13 (5): 1402–6. doi:10.1110/ps.03479604. PMC 2286765. PMID 15096640. {{cite journal}}: Unknown parameter |month= ignored (help)CS1 maint: multiple names: authors list (link)
  11. ^ Yu CS, Chen YC, Lu CH, Hwang JK (2006). "Prediction of protein subcellular localization". Proteins. 64 (3): 643–51. doi:10.1002/prot.21018. PMID 16752418. {{cite journal}}: Unknown parameter |month= ignored (help)CS1 maint: multiple names: authors list (link)
  12. ^ Nagarajan Paramasivam, Dirk Linke (2011). "ClubSub-P is a database of cluster-based subcellular localization (SCL) predictions for Archaea and Gram negative bacteria". Frontiers in Microbiology. 2. doi:10.3389/Ffmicb.2011.00218. PMID 22073040.{{cite journal}}: CS1 maint: unflagged free DOI (link)
  13. ^ Chou KC, Shen HB (2010). "A new method for predicting the subcellular localization of eukaryotic proteins with both single and multiple sites: Euk-mPLoc 2.0". PLoS ONE. 5 (4): e9931. doi:10.1371/journal.pone.0009931. PMC 2848569. PMID 20368981.{{cite journal}}: CS1 maint: unflagged free DOI (link)
  14. ^ Goudenège D, Avner S, Lucchetti-Miganeh C, Barloy-Hubler F (2010). "CoBaltDB: Complete bacterial and archaeal orfeomes subcellular localization database and associated resources". BMC Microbiol. 10: 88. doi:10.1186/1471-2180-10-88. PMC 2850352. PMID 20331850.{{cite journal}}: CS1 maint: multiple names: authors list (link) CS1 maint: unflagged free DOI (link)
  15. ^ Garg A, Bhasin M, Raghava GP (2005). "Support vector machine-based method for subcellular localization of human proteins using amino acid compositions, their order, and similarity search". J. Biol. Chem. 280 (15): 14427–32. doi:10.1074/jbc.M411789200. PMID 15647269. {{cite journal}}: Unknown parameter |month= ignored (help)CS1 maint: multiple names: authors list (link) CS1 maint: unflagged free DOI (link)
  16. ^ Lin HN, Chen CT, Sung TY, Ho SY, and Hsu WL. (2009). "Protein subcellular localization prediction of eukaryotes using a knowledge-based approach". BMC Bioinformatics. 10: S8. doi:10.1186/1471-2105-10-S15-S8. PMID 19958518. {{cite journal}}: Unknown parameter |month= ignored (help)CS1 maint: multiple names: authors list (link) CS1 maint: unflagged free DOI (link)
  17. ^ Nair R, Rost B (2005). "Mimicking cellular sorting improves prediction of subcellular localization". J. Mol. Biol. 348 (1): 85–100. doi:10.1016/j.jmb.2005.02.025. PMID 15808855. {{cite journal}}: Unknown parameter |month= ignored (help)
  18. ^ Höglund A, Dönnes P, Blum T, Adolph HW, Kohlbacher O (2006). "MultiLoc: prediction of protein subcellular localization using N-terminal targeting sequences, sequence motifs and amino acid composition". Bioinformatics. 22 (10): 1158–65. doi:10.1093/bioinformatics/btl002. PMID 16428265. {{cite journal}}: Unknown parameter |month= ignored (help)CS1 maint: multiple names: authors list (link)
  19. ^ Nakai K, Kanehisa M (1991). "Expert system for predicting protein localization sites in gram-negative bacteria". Proteins. 11 (2): 95–110. doi:10.1002/prot.340110203. PMID 1946347.
  20. ^ Gardy JL, Spencer C, Wang K, Ester M, Tusnády GE, Simon I, Hua S, deFays K, Lambert C, Nakai K, Brinkman FS (2003). "PSORT-B: Improving protein subcellular localization prediction for Gram-negative bacteria". Nucleic Acids Res. 31 (13): 3613–7. doi:10.1093/nar/gkg602. PMC 169008. PMID 12824378. {{cite journal}}: Unknown parameter |month= ignored (help)CS1 maint: multiple names: authors list (link)
  21. ^ Gardy JL, Laird MR, Chen F, Rey S, Walsh CJ, Ester M, Brinkman FS (2005). "PSORTb v.2.0: expanded prediction of bacterial protein subcellular localization and insights gained from comparative proteome analysis". Bioinformatics. 21 (5): 617–23. doi:10.1093/bioinformatics/bti057. PMID 15501914. {{cite journal}}: Unknown parameter |month= ignored (help)CS1 maint: multiple names: authors list (link)
  22. ^ Magnus M, Pawlowski M, Bujnicki JM (2012). "MetaLocGramN: a meta-predictor of protein subcellular localization for Gram-negative bacteria". BBA - Proteins and Proteomics. {{cite journal}}: Cite has empty unknown parameter: |month= (help)CS1 maint: multiple names: authors list (link)
  23. ^ Nair R, Carter P, Rost B (2003). "NLSdb: database of nuclear localization signals". Nucleic Acids Res. 31 (1): 397–9. doi:10.1093/nar/gkg001. PMC 165448. PMID 12520032. {{cite journal}}: Unknown parameter |month= ignored (help)CS1 maint: multiple names: authors list (link)
  24. ^ Lu Z, Szafron D, Greiner R, Lu P, Wishart DS, Poulin B, Anvik J, Macdonell C, Eisner R (2004). "Predicting subcellular localization of proteins using machine-learned classifiers". Bioinformatics. 20 (4): 547–56. doi:10.1093/bioinformatics/bth026. PMID 14990451. {{cite journal}}: Unknown parameter |month= ignored (help)CS1 maint: multiple names: authors list (link)
  25. ^ Bendtsen JD, Jensen LJ, Blom N, Von Heijne G, Brunak S (2004). "Feature-based prediction of non-classical and leaderless protein secretion". Protein Eng. Des. Sel. 17 (4): 349–56. doi:10.1093/protein/gzh037. PMID 15115854. {{cite journal}}: Unknown parameter |month= ignored (help)CS1 maint: multiple names: authors list (link)
  26. ^ Shatkay H, Höglund A, Brady S, Blum T, Dönnes P, Kohlbacher O (2007). "SherLoc: high-accuracy prediction of protein subcellular localization by integrating text and protein sequence data". Bioinformatics. 23 (11): 1410–7. doi:10.1093/bioinformatics/btm115. PMID 17392328. {{cite journal}}: Unknown parameter |month= ignored (help)CS1 maint: multiple names: authors list (link)
  27. ^ Emanuelsson O, Nielsen H, Brunak S, von Heijne G (2000). "Predicting subcellular localization of proteins based on their N-terminal amino acid sequence". J. Mol. Biol. 300 (4): 1005–16. doi:10.1006/jmbi.2000.3903. PMID 10891285. {{cite journal}}: Unknown parameter |month= ignored (help)CS1 maint: multiple names: authors list (link)
  28. ^ Horton P, Park KJ, Obayashi T, Fujita N, Harada H, Adams-Collier CJ, Nakai K (2007). "WoLF PSORT: protein localization predictor". Nucleic Acids Res. 35 (Web Server issue): W585–7. doi:10.1093/nar/gkm259. PMC 1933216. PMID 17517783. {{cite journal}}: Unknown parameter |month= ignored (help)CS1 maint: multiple names: authors list (link)

Further reading

  • Bork P, Dandekar T, Diaz-Lazcoz Y, Eisenhaber F, Huynen M, Yuan Y (1998). "Predicting function: from genes to genomes and back". J. Mol. Biol. 283 (4): 707–25. doi:10.1006/jmbi.1998.2144. PMID 9790834. {{cite journal}}: Unknown parameter |month= ignored (help)CS1 maint: multiple names: authors list (link)
  • Nakai K (2000). "Protein sorting signals and prediction of subcellular localization". Adv. Protein Chem. 54: 277–344. PMID 10829231.
  • Emanuelsson O (2002). "Predicting protein subcellular localisation from amino acid sequence information". Brief. Bioinformatics. 3 (4): 361–76. PMID 12511065. {{cite journal}}: Unknown parameter |month= ignored (help)
  • Schneider G, Fechner U (2004). "Advances in the prediction of protein targeting signals". Proteomics. 4 (6): 1571–80. doi:10.1002/pmic.200300786. PMID 15174127. {{cite journal}}: Unknown parameter |month= ignored (help)
  • Gardy JL, Brinkman FS (2006). "Methods for predicting bacterial protein subcellular localization". Nat. Rev. Microbiol. 4 (10): 741–51. doi:10.1038/nrmicro1494. PMID 16964270. {{cite journal}}: Unknown parameter |month= ignored (help)
  • Chou KC, Shen HB (2007). "Recent progress in protein subcellular location prediction". Anal. Biochem. 370 (1): 1–16. doi:10.1016/j.ab.2007.07.006. PMID 17698024. {{cite journal}}: Unknown parameter |month= ignored (help)

External links