Proteome Analyst

From Wikipedia, the free encyclopedia
Proteome Analyst
DescriptionFor predicting protein subcellular localizations
Data types
Data input: Protein sequence in FASTA format. Data output: Localization predictions in tab delimited format.
Research centerUniversity of Alberta
LaboratoryDavid S. Wishart
Primary citation[1]
Release date2004
Data release
Last updated on 2014
Curation policyManually curated

Proteome Analyst (PA) is a freely available web server and online toolkit for predicting protein subcellular localization, or where a protein resides in a cell.[1][2] In the field of proteomics, accurately predicting a protein's subcellular localization, or where a specific protein is located inside a cell, is an important step in the large scale study of proteins. This computational prediction problem is known as Protein subcellular localization prediction. Over the last decade, more than a dozen web servers and computer programs have been developed to attempt to solve this problem. Proteome Analyst is an example of one of the better performing subcellular prediction tools. Proteome Analyst makes predictions for both prokaryotic eukaryotic proteins using a text mining approach.[1][3] Proteome Analyst was originally developed by the Proteome Analyst Research Group at the University of Alberta, and was initially released in March 2004. It was recently updated in January 2014.

Input/Output and Method[edit]

Users can submit requests to the Proteome Analyst web server by selecting the organism type and then uploading a text file containing the protein sequence in a FASTA format. Proteome Analyst then uses BLAST to look for similar proteins in the Uniprot database with annotation on subcellular localization information. Proteome Analyst then uses a machine-learned classifier to analyze the annotation text fields of the most similar proteins identified in Uniprot search to make the final subcellular localization predictions. Users can view and download Proteome Analyst's results or ask Proteome Analyst to explain its predictions.


Proteome Analyst consists of >30,000 lines of Java code and can be deployed on computer cluster to accelerate its speed and performance using multiple CPUs. The initial release of Proteome Analyst used Naïve Bayes classifier to perform its predictions. The current version of Proteome Analyst uses Support Vector Machine classifiers. Currently Proteome Analyst supports subcellular predictions for five organism types (Eurkayotes including animal, plant, fungi, and prokaryotes including gram-positive and gram-negative bacteria).

See also[edit]


  1. ^ a b c Lu, Zhiyong; Duane Szafron; Russell Greiner; Paul Lu; David S. Wishart; Brett Poulin; John Anvik; Cam Macdonell; Roman Eisner (2004). "Predicting Subcellular Localization of Proteins using Machine-Learned Classifiers". Bioinformatics. 20 (4): 547–556. doi:10.1093/bioinformatics/btg447. PMID 14990451.
  2. ^ Szafron, Duane; Paul Lu; Russell Greiner; David S. Wishart; Brett Poulin; Roman Eisner; Zhiyong Lu; John Anvik; Cam Macdonell; Alona Fyshe; David Meeuwis (2004). "Proteome Analyst: Custom Predictions with Explanations in a Web-based Tool for High-throughput Proteome Annotations". Nucleic Acids Res. 32 (Web Server issue): W365–71. doi:10.1093/nar/gkh485. PMC 441623. PMID 15215412.
  3. ^ Fyshe, Alona; Yifeng Liu; Duane Szafron; Russell Greiner; Paul Lu (2008). "Improving Subcellular Localization Prediction using Text Classification and the Gene Ontology". Bioinformatics. 24 (21): 2512–7. doi:10.1093/bioinformatics/btn463. PMID 18728042.