Protein subcellular localization prediction

From Wikipedia, the free encyclopedia
Jump to: navigation, search

Protein subcellular localization prediction (or just protein localization prediction) involves the computational prediction of where a protein resides in a cell.

In general, it is a computational method to take as input information about a protein, such as a protein sequence, and produce a prediction of a subcellular localization as output.

Prediction of protein subcellular localization is an important component of bioinformatics based prediction of protein function and genome annotation, and it can aid the identification of drug targets.


Most eukaryotic proteins are encoded in the nuclear genome and synthesized in the cytosol, but many need to be further sorted before they reach their final destination. For prokaryotes, proteins are synthesized in the cytoplasm and some must be targeted to other locations such as to a cell membrane or the extracellular environment. Proteins must be localized at their appropriate subcellular compartment to perform their desired function.

Experimentally determining the subcellular localization of a protein is a laborious and time consuming task. Through the development of new approaches in computer science, coupled with an increased dataset of proteins of known localization, computational tools can now provide fast and accurate localization predictions for many organisms. This has resulted in subcellular localization prediction becoming one of the challenges being successfully aided by bioinformatics.

Many protein subcellular localization prediction methods now exceed the accuracy of some high-throughput laboratory methods for the identification of protein subcellular localization.[1]

Particularly, some predictors have been developed[2] that can be used to deal with proteins that may simultaneously exist, or move between, two or more different subcellular locations.


Several computational tools for predicting the subcellular localization of a protein are publicly available, a few of which are listed below. Predictors can be specialized for proteins in different organisms. Some are specialized for eukaryotic proteins,[3] some for human proteins,[4] and some for plant proteins.[5]

Methods for the prediction of bacterial localization predictors, and their accuracy, have been reviewed.[6]

The development of protein subcellular location prediction has been summarized in two comprehensive review articles.[7][8]


Protein localization predicition software
Name Description References
APSLAP Prediction of apoptosis protein sub cellular Localization [9]
Cell-PLoc A package of web-servers for predicting subcellular localization of proteins in various organisms. [2]
BaCelLo Prediction of eukaryotic protein subcellular localization. Unlike other methods, the predictions are balanced among different classes and all the localizations that are predicted are considered as equiprobable, to avoid mispredictions. [10]
CELLO CELLO uses a two-level Support Vector Machine system to assign localizations to both prokaryotic and eukaryotic proteins. [11][12]
DualPred Webserver for Predicting plant Proteins Dual-Targeted to Chloroplast and Mitochondria. [13]
ClubSub-P ClubSub-P is a database of cluster-based subcellular localization (SCL) predictions for Archaea and Gram negative bacteria. [14]
Euk-mPLoc 2.0 Predicting the subcellular localization of eukaryotic proteins with both single and multiple sites. [15]
CoBaltDB CoBaltDB is a novel powerful platform that provides easy access to the results of multiple localization tools and support for predicting prokaryotic protein localizations. [16]
HSLpred This method allow to predict subcellular localization of human proteins. This method combines power of composition based SVM models and similarity search techniques PSI-BLAST. [17]
KnowPredsite A knowledge-based approach to predict the localization site(s) of both single-localized and multi-localized proteins for all eukaryotes. [18]
LOCtree Prediction based on mimicking the cellular sorting mechanism using a hierarchical implementation of support vector machines. LOCtree is a comprehensive predictor incorporating predictions based on PROSITE/Pfam signatures as well as SwissProt keywords. [19]
LocTree2/3 Subcellular localization prediction for all proteins in all domains of life. LocTree2/3 predicts 3 classes for Archaea, 6 for Bacteria and 18 for Eukaryota [20][21]
MultiLoc An SVM-based prediction engine for a wide range of subcellular locations. [22]
PSORT The first widely used method for protein subcellular localization prediction, developed under the leadership of Kenta Nakai. Now researchers are also encouraged to use other PSORT programs such as WoLF PSORT and PSORTb for making predictions for certain types of organisms (see below). PSORT prediction performances are lower than those of recently developed predictors. [23]
PSORTb Prediction of bacterial protein localization. [24][25]
MetaLocGramN Meta subcellular localization predictor of Gram-negative protein. MetaLocGramN is a gateway to a number of primary prediction methods (various types: signal peptide, beta-barrel, transmembrane helices and subcellular localization predictors). In author's benchmark, MetaLocGramN performed better in comparison to other SCL predictive methods, since the average Matthews correlation coefficient reached 0.806 that enhanced the predictive capability by 12% (compared to PSORTb3). MetaLocGramN can be run via SOAP. [26]
PredictNLS Prediction of nuclear localization signals. [27]
Proteome Analyst Prediction of protein localization for both prokaryotes and eukaryotes using a text mining approach. [28]
SCLPred SCLpred protein subcellular localization prediction by N-to-1 neural networks. [29]
SecretomeP Prediction of eukaryotic proteins that are secreted via a non-traditional secretory mechanism. [30]
SherLoc An SVM-based predictor combining MultiLoc with text-based features derived from PubMed abstracts. [31]
SCLAP An Adaptive Boosting Method for Predicting Subchloroplast Localization of Plant Proteins. [32]
TargetP Prediction of N-terminal sorting signals. [33]
TMHMM Prediction of transmembrane helices to identify transmembrane proteins.
WoLF PSORT An updated version of PSORT/PSORT II for the prediction of eukaryotic sequences. [34]


Determining subcellular localization is important for understanding protein function and is a critical step in genome annotation.

Knowledge of the subcellular localization of a protein can significantly improve target identification during the drug discovery process. For example, secreted proteins and plasma membrane proteins are easily accessible by drug molecules due to their localization in the extracellular space or on the cell surface.

Bacterial cell surface and secreted proteins are also of interest for their potential as vaccine candidates or as diagnostic targets.

Aberrant subcellular localization of proteins has been observed in the cells of several diseases, such as cancer and Alzheimer's disease.

Secreted proteins from some archaea that can survive in unusual environments have industrially important applications.


Curated protein subcellular locations can be searched in UniProtKB. There are several computationally predicted protein subcellular location databases, constructed by Dr. Min's group at Youngstown State University, including the fungal secretome and subcellular proteome knowledgebase (FunSecKB2), the plant secretome and subcellular proteome knowledgebase (PlantSecKB), MetazSecKB for human and animals, and ProtSecKB for protists. And there are also a few others, such as the lactic acid bacterial secretome database. Though there are some inaccuracies in the computational prediction, these databases provide useful resources for further characterizing the protein subcellular locations.


  1. ^ Rey S, Gardy JL, Brinkman FS (2005). "Assessing the precision of high-throughput computational and laboratory approaches for the genome-wide identification of protein subcellular localization in bacteria". BMC Genomics. 6: 162. doi:10.1186/1471-2164-6-162. PMC 1314894Freely accessible. PMID 16288665. 
  2. ^ a b Chou KC, Shen HB (2008). "Cell-PLoc: a package of Web servers for predicting subcellular localization of proteins in various organisms". Nature Protocols. 3 (2): 153–62. doi:10.1038/nprot.2007.494. PMID 18274516. 
  3. ^ Chou KC, Wu ZC, Xiao X (2011). "iLoc-Euk: a multi-label classifier for predicting the subcellular localization of singleplex and multiplex eukaryotic proteins". PLOS ONE. 6 (3): e18258. doi:10.1371/journal.pone.0018258. PMC 3068162Freely accessible. PMID 21483473. 
  4. ^ Shen HB, Chou KC (Nov 2009). "A top-down approach to enhance the power of predicting human protein subcellular localization: Hum-mPLoc 2.0". Analytical Biochemistry. 394 (2): 269–74. doi:10.1016/j.ab.2009.07.046. PMID 19651102. 
  5. ^ Chou KC, Shen HB (2010). "Plant-mPLoc: a top-down strategy to augment the power for predicting plant protein subcellular localization". PLOS ONE. 5 (6): e11335. doi:10.1371/journal.pone.0011335. PMC 2893129Freely accessible. PMID 20596258. 
  6. ^ Gardy JL, Brinkman FS (Oct 2006). "Methods for predicting bacterial protein subcellular localization". Nature Reviews. Microbiology. 4 (10): 741–51. doi:10.1038/nrmicro1494. PMID 16964270. 
  7. ^ Nakai, K. Protein sorting signals and prediction of subcellular localization. Adv. Protein Chem., 2000, 54, 277-344.
  8. ^ Chou, K. C.; Shen, H. B. Review: Recent progresses in protein subcellular location prediction" Anal. Biochem 2007, 370, 1-16.
  9. ^ Saravanan V, Lakshmi PT (Dec 2013). "APSLAP: an adaptive boosting technique for predicting subcellular localization of apoptosis protein". Acta Biotheoretica. 61 (4): 481–97. doi:10.1007/s10441-013-9197-1. PMID 23982307. 
  10. ^ Pierleoni A, Martelli PL, Fariselli P, Casadio R (Jul 2006). "BaCelLo: a balanced subcellular localization predictor". Bioinformatics. 22 (14): e408–16. doi:10.1093/bioinformatics/btl222. PMID 16873501. 
  11. ^ Yu CS, Lin CJ, Hwang JK (May 2004). "Predicting subcellular localization of proteins for Gram-negative bacteria by support vector machines based on n-peptide compositions". Protein Science. 13 (5): 1402–6. doi:10.1110/ps.03479604. PMC 2286765Freely accessible. PMID 15096640. 
  12. ^ Yu CS, Chen YC, Lu CH, Hwang JK (Aug 2006). "Prediction of protein subcellular localization". Proteins. 64 (3): 643–51. doi:10.1002/prot.21018. PMID 16752418. 
  13. ^ Vijayakumar S (2015). "Dualpred: A Webserver for Predicting Plant Proteins Dual-Targeted to Chloroplast and Mitochondria Using Split Protein-Relatedness-Measure Feature". Current Bioinformatics. 10 (3): 323–331. doi:10.2174/1574893609666140226000041. 
  14. ^ Paramasivam N, Linke D (2011). "ClubSub-P: Cluster-Based Subcellular Localization Prediction for Gram-Negative Bacteria and Archaea". Frontiers in Microbiology. 2: 218. doi:10.3389/fmicb.2011.00218. PMC 3210502Freely accessible. PMID 22073040. 
  15. ^ Chou KC, Shen HB (2010). "A new method for predicting the subcellular localization of eukaryotic proteins with both single and multiple sites: Euk-mPLoc 2.0". PLOS ONE. 5 (4): e9931. doi:10.1371/journal.pone.0009931. PMC 2848569Freely accessible. PMID 20368981. 
  16. ^ Goudenège D, Avner S, Lucchetti-Miganeh C, Barloy-Hubler F (2010). "CoBaltDB: Complete bacterial and archaeal orfeomes subcellular localization database and associated resources". BMC Microbiology. 10: 88. doi:10.1186/1471-2180-10-88. PMC 2850352Freely accessible. PMID 20331850. 
  17. ^ Garg A, Bhasin M, Raghava GP (Apr 2005). "Support vector machine-based method for subcellular localization of human proteins using amino acid compositions, their order, and similarity search". The Journal of Biological Chemistry. 280 (15): 14427–32. doi:10.1074/jbc.M411789200. PMID 15647269. 
  18. ^ Lin HN, Chen CT, Sung TY, Ho SY, Hsu WL (December 2009). "Protein subcellular localization prediction of eukaryotes using a knowledge-based approach". BMC Bioinformatics. 10 Suppl 15: S8. doi:10.1186/1471-2105-10-S15-S8. PMC 2788359Freely accessible. PMID 19958518. 
  19. ^ Nair R, Rost B (Apr 2005). "Mimicking cellular sorting improves prediction of subcellular localization". Journal of Molecular Biology. 348 (1): 85–100. doi:10.1016/j.jmb.2005.02.025. PMID 15808855. 
  20. ^ Goldberg T, Hamp T, Rost B (Sep 2012). "LocTree2 predicts localization for all domains of life". Bioinformatics. 28 (18): i458–i465. doi:10.1093/bioinformatics/bts390. PMC 3436817Freely accessible. PMID 22962467. 
  21. ^ Goldberg T, Hecht M, Hamp T, Karl T, Yachdav G, Ahmed N, Altermann U, Angerer P, Ansorge S, Balasz K, Bernhofer M, Betz A, Cizmadija L, Do KT, Gerke J, Greil R, Joerdens V, Hastreiter M, Hembach K, Herzog M, Kalemanov M, Kluge M, Meier A, Nasir H, Neumaier U, Prade V, Reeb J, Sorokoumov A, Troshani I, Vorberg S, Waldraff S, Zierer J, Nielsen H, Rost B (Jul 2014). "LocTree3 prediction of localization". Nucleic Acids Research. 42 (Web Server issue): W350–5. doi:10.1093/nar/gku396. PMID 24848019. 
  22. ^ Höglund A, Dönnes P, Blum T, Adolph HW, Kohlbacher O (May 2006). "MultiLoc: prediction of protein subcellular localization using N-terminal targeting sequences, sequence motifs and amino acid composition". Bioinformatics. 22 (10): 1158–65. doi:10.1093/bioinformatics/btl002. PMID 16428265. 
  23. ^ Nakai K, Kanehisa M (1991). "Expert system for predicting protein localization sites in gram-negative bacteria". Proteins. 11 (2): 95–110. doi:10.1002/prot.340110203. PMID 1946347. 
  24. ^ Gardy JL, Spencer C, Wang K, Ester M, Tusnády GE, Simon I, Hua S, deFays K, Lambert C, Nakai K, Brinkman FS (Jul 2003). "PSORT-B: Improving protein subcellular localization prediction for Gram-negative bacteria". Nucleic Acids Research. 31 (13): 3613–7. doi:10.1093/nar/gkg602. PMC 169008Freely accessible. PMID 12824378. 
  25. ^ Gardy JL, Laird MR, Chen F, Rey S, Walsh CJ, Ester M, Brinkman FS (Mar 2005). "PSORTb v.2.0: expanded prediction of bacterial protein subcellular localization and insights gained from comparative proteome analysis". Bioinformatics. 21 (5): 617–23. doi:10.1093/bioinformatics/bti057. PMID 15501914. 
  26. ^ Magnus M, Pawlowski M, Bujnicki JM (Dec 2012). "MetaLocGramN: A meta-predictor of protein subcellular localization for Gram-negative bacteria". Biochimica et Biophysica Acta. 1824 (12): 1425–33. doi:10.1016/j.bbapap.2012.05.018. PMID 22705560. 
  27. ^ Nair R, Carter P, Rost B (Jan 2003). "NLSdb: database of nuclear localization signals". Nucleic Acids Research. 31 (1): 397–9. doi:10.1093/nar/gkg001. PMC 165448Freely accessible. PMID 12520032. 
  28. ^ Lu Z, Szafron D, Greiner R, Lu P, Wishart DS, Poulin B, Anvik J, Macdonell C, Eisner R (Mar 2004). "Predicting subcellular localization of proteins using machine-learned classifiers". Bioinformatics. 20 (4): 547–56. doi:10.1093/bioinformatics/btg447. PMID 14990451. 
  29. ^ Mooney C, Wang YH, Pollastri G (Oct 2011). "SCLpred: protein subcellular localization prediction by N-to-1 neural networks". Bioinformatics. 27 (20): 2812–9. doi:10.1093/bioinformatics/btr494. PMID 21873639. 
  30. ^ Bendtsen JD, Jensen LJ, Blom N, Von Heijne G, Brunak S (Apr 2004). "Feature-based prediction of non-classical and leaderless protein secretion". Protein Engineering, Design & Selection. 17 (4): 349–56. doi:10.1093/protein/gzh037. PMID 15115854. 
  31. ^ Shatkay H, Höglund A, Brady S, Blum T, Dönnes P, Kohlbacher O (Jun 2007). "SherLoc: high-accuracy prediction of protein subcellular localization by integrating text and protein sequence data". Bioinformatics. 23 (11): 1410–7. doi:10.1093/bioinformatics/btm115. PMID 17392328. 
  32. ^ Saravanan V, Lakshmi PT (Feb 2013). "SCLAP: an adaptive boosting method for predicting subchloroplast localization of plant proteins". Omics. 17 (2): 106–15. doi:10.1089/omi.2012.0070. PMID 23289782. 
  33. ^ Emanuelsson O, Nielsen H, Brunak S, von Heijne G (Jul 2000). "Predicting subcellular localization of proteins based on their N-terminal amino acid sequence". Journal of Molecular Biology. 300 (4): 1005–16. doi:10.1006/jmbi.2000.3903. PMID 10891285. 
  34. ^ Horton P, Park KJ, Obayashi T, Fujita N, Harada H, Adams-Collier CJ, Nakai K (Jul 2007). "WoLF PSORT: protein localization predictor". Nucleic Acids Research. 35 (Web Server issue): W585–7. doi:10.1093/nar/gkm259. PMC 1933216Freely accessible. PMID 17517783. 

Further reading[edit]

  • Bork P, Dandekar T, Diaz-Lazcoz Y, Eisenhaber F, Huynen M, Yuan Y (Nov 1998). "Predicting function: from genes to genomes and back". Journal of Molecular Biology. 283 (4): 707–25. doi:10.1006/jmbi.1998.2144. PMID 9790834. 
  • Nakai K (2000). "Protein sorting signals and prediction of subcellular localization". Advances in Protein Chemistry. 54: 277–344. doi:10.1016/s0065-3233(00)54009-1. PMID 10829231. 
  • Emanuelsson O (Dec 2002). "Predicting protein subcellular localisation from amino acid sequence information". Briefings in Bioinformatics. 3 (4): 361–76. doi:10.1093/bib/3.4.361. PMID 12511065. 
  • Schneider G, Fechner U (Jun 2004). "Advances in the prediction of protein targeting signals". Proteomics. 4 (6): 1571–80. doi:10.1002/pmic.200300786. PMID 15174127. 
  • Gardy JL, Brinkman FS (Oct 2006). "Methods for predicting bacterial protein subcellular localization". Nature Reviews. Microbiology. 4 (10): 741–51. doi:10.1038/nrmicro1494. PMID 16964270. 
  • Chou KC, Shen HB (Nov 2007). "Recent progress in protein subcellular location prediction". Analytical Biochemistry. 370 (1): 1–16. doi:10.1016/j.ab.2007.07.006. PMID 17698024. 
  • Lum G, Meinken J, Orr J, Frazier S, Min XJ (2014). "PlantSecKB: the plant secretome and subcellular proteome knowledgebase". Computational Molecular Biology. 4 (1): 1–17. doi:10.5376/cmb.2014.04.0001.