Proteomics Identifications Database
The PRIDE (PRoteomics IDEntifications database) is a public data repository of mass spectrometry (MS) based proteomics data, and is maintained by the European Bioinformatics Institute as part of the Proteomics Services Team.
PRIDE stores three different kinds of information: peptide and protein identifications derived from MS or MS/MS experiments, MS and MS/MS mass spectra as peak lists, and any and all associated metadata. Peptide sequences should be captured as parts of identifications.
By September 2010, PRIDE contained more than 13,000 experiments, 4 million protein identifications, 20 million peptide identifications and more than 104 million spectra. A typical PRIDE dataset or project contains more than one experiment (accession numbers or MS runs). As mass spectrometry is increasingly used for capturing details of posttranslational modification PRIDE contains modification data in case of the peptides which were chemically modified.
Originally designed by Lennart Martens in 2003 during a stay at the European Bioinformatics Institute as a Marie Curie fellow of the European Commission, PRIDE was established as a production service in 2005. Several other proteomics databases have been established over the past few years like GPMDB, PeptideAtlas, Proteinpedia and the NCBI Peptidome. Together with the NCBI Peptidome, the PRIDE database constitutes an actual structured data repository, storing the original experimental data from the researchers, and does not assume any editorial control over submitted data. NCBI Peptidome has since been discontinued, and all Peptidome data has been transferred to PRIDE. In total, PRIDE contains data from about 60 species, the biggest fraction of it coming from human samples, followed by the fruit fly Drosophila melanogaster and mouse.
Formats and the submission process
Since detailed proteomics data currently cannot be curated from the existing literature, the source of PRIDE data is solely submissions by academic researchers.
PRIDE is a standards-compliant public repository, meaning that its own XML-based data exchange format for submissions, PRIDE XML, was built around the Proteomics Standards Initiative mzData standard for mass spectrometry. PRIDE is committed to implementing relevant new Proteomics Standards Initiative standards as soon as possible.
As there are many types of different mass spectrometry instruments and software formats are currently on the market, wet-lab scientists without a strong bioinformatics background or informatics support were having problems converting their data to PRIDE XML. The development of PRIDE Converter helped to tackle this situation. PRIDE Converter is a tool, written in the Java programming language, that converts 15 different input mass spectrometry data formats into PRIDE XML via a wizard-like graphical user interface. It is freely available and is open source under the permissive Apache License.
Browsing, searching & data mining PRIDE
Currently, data can be queried from PRIDE via the PRIDE web interface, and through the stand-alone Java client PRIDE Inspector.
Additionally, one can build complex queries with the PRIDE BioMart using BioMart which is a query-oriented data management system. The extensive use of controlled vocabularies (CVs) and ontologies for flexible yet context-sensitive annotation of data, along with the ability to perform intelligent queries by these annotations, are key features of PRIDE.
- Vizcaíno JA, Côté R, Reisinger F, Barsnes H, Foster JM, Rameseder J, Hermjakob H, Martens L. The Proteomics Identifications database: 2010 update. Nucleic Acids Res. 2010 Jan;38 (Database issue):D736-42. Epub 2009 November 11. PMID 19906717.
- Martens L, Hermjakob H, Jones P, Adamski M, Taylor C, States D, Gevaert K, Vandekerckhove J, Apweiler R. PRIDE: The PRoteomics IDEntifications database. PROTEOMICS 2005 Aug;5(13):3537-45. PMID 16041671
- Barsnes H, Vizcaíno JA, Eidhammer I, Martens L. PRIDE Converter: making proteomics data-sharing easy. Nat Biotechnol. 2009 Jul;27(7):598-9. PMID 19587657.
- Wang R, Fabregat A, Ríos D, Ovelleiro D, Foster JM, Côté RG, Griss J, Csordas A, Perez-Riverol Y, Reisinger F, Hermjakob H, Martens L, Vizcaíno JA. PRIDE Inspector: a tool to visualize and validate MS proteomics data. Nature Biotechnology 2012 Feb 8;30(2):135-7. PMID 22318026
- Vizcaíno JA, Côté R, Reisinger F, Mueller, M, Foster JM, Rameseder J, Hermjakob H, Martens L. A guide to the Proteomics Identifications database proteomics data repository. Sep;9(18):4276-83. PMID 19662629.