A biocurator is a professional scientist who curates, collects, annotates, and validates information that is disseminated by biological and model organism databases. The role of a biocurator encompasses quality control of primary biological research data intended for publication, extracting and organizing data from original scientific literature, and describing the data with standard annotation protocols and vocabularies that enable powerful queries and biological database inter-operability. Biocurators communicate with researchers to ensure the accuracy of curated information and to foster data exchanges with research laboratories.
Curation and annotation
In genome annotation for example, biocurators commonly employ—and take part in the creation and development of—shared biomedical ontologies: structured, controlled vocabularies that encompass many biological and medical knowledge domains, such as the Open Biomedical Ontologies found in the OBO Foundry. These domains include genomics and proteomics, anatomy, animal and plant development, biochemistry, metabolic pathways, taxonomic classification, and mutant phenotypes.
Biocurators enforce the consistent use of gene nomenclature guidelines and participate in the genetic nomenclature committees of various model organisms, often in collaboration with the HUGO Gene Nomenclature Committee (HGNC). They also enforce other nomenclature guidelines like those provided by the Nomenclature Committee of the International Union of Biochemistry and Molecular Biology (IUBMB), one example of which is the Enzyme Commission EC number.
International Society for Biocuration (ISB)
The International Society for Biocuration (ISB) was founded in 2008; the non-profit organisation "promotes the field of biocuration and provides a forum for information exchange through meetings and workshops." International Biocurator Conferences have been held in Pacific Grove, California (2005), San José, CA (2007), Berlin (2009), Tokyo, Japan (2010), Washington, DC (2012), and Cambridge, UK (2013).
Biocurators and Wikipedia
There is some overlap between the work of biocurators and Wikipedia, with boundaries between scientific databases and Wikipedia becoming increasingly blurred. Databases like Rfam and the Protein Data Bank for example make heavy use of Wikipedia and its editors to curate information.
Text mining assisted Curation
There has been also recent interest in exploring the use of natural language processing and text mining technologies to enable a more systematic extraction of candidate information for manual literature curation. Therefore the definition of the main literature curation stages of a ‘canonical’ biocuration workflow has been examined. The use of text mining techniques for these various stages, from the initial detection of curation relevant articles (triage) to the extraction of annotations and entity relationships has been attempted by various specialized systems.
Expert Curation vs. Community Curation
Traditionally, biological knowledge has been aggregated through expert curation, conducted manually by dedicated experts. However, with the burgeoning volume of biological data and increasingly diverse densely informative published literatures, expert curation becomes more and more laborious and time-consuming, increasingly lagging behind knowledge creation.
Community Curation harnesses community intelligence in knowledge curation, bears great promise in dealing with the flood of biological knowledge. To exploit the full potential of the scientific community for knowledge curation, multiple biological wikis (bio-wikis) have been built to date.
To increase community curation in bio-wikis, AuthorReward, an extension to MediaWiki, is developed for rewarding community-curated efforts in knowledge curation. AuthorReward provides bio-wikis with an authorship metric; it quantifies researchers’ contributions by properly factoring both edit quantity and quality and yields automated explicit authorship according to their quantitative contributions.
RiceWiki, a wiki-based database for community curation of rice genes, is a living demo equipped with AuthorReward, available at http://ricewiki.big.ac.cn/index.php/Os01g0883800.
Another community based approach to analyze biological data is called Systems Biology Verification (SBV) IMPROVER. Biological networks with a structured syntax are a powerful way of representing biological information generated from high density data; however, they can become unwieldy to manage as their size and complexity increase. SBV IMPROVER presents a crowd-verification approach for the visualization and expansion of biological networks.
AuthorReward is freely available at http://cbb.big.ac.cn/software.
- Bateman, A. (2010). "Curators of the world unite: The International Society of Biocuration". Bioinformatics 26 (8): 991–991. doi:10.1093/bioinformatics/btq101. PMID 20305270.
- Salimi, N.; Vita, R. (2006). "The Biocurator: Connecting and Enhancing Scientific Data". PLoS Computational Biology 2 (10): e125. doi:10.1371/journal.pcbi.0020125. PMC 1626147. PMID 17069454.
- Bourne, P. E.; McEntyre, J. (2006). "Biocurators: Contributors to the World of Science". PLoS Computational Biology 2 (10): e142. doi:10.1371/journal.pcbi.0020142. PMC 1626157. PMID 17411327.
- Burge, S.; Attwood, T. K.; Bateman, A.; Berardini, T. Z.; Cherry, M.; O'Donovan, C.; Xenarios, L.; Gaudet, P. (2012). "Biocurators and Biocuration: Surveying the 21st century challenges". Database 2012: bar059. doi:10.1093/database/bar059. PMC 3308150. PMID 22434828.
- Wodak, S. J.; Mietchen, D.; Collings, A. M.; Russell, R. B.; Bourne, P. E. (2012). "Topic Pages: PLoS Computational Biology Meets Wikipedia". PLoS Computational Biology 8 (3): e1002446. doi:10.1371/journal.pcbi.1002446. PMC 3315447. PMID 22479174.
- Finn, R. D.; Gardner, P. P.; Bateman, A. (2011). "Making your database available through Wikipedia: The pros and cons". Nucleic Acids Research 40 (Database issue): D9–12. doi:10.1093/nar/gkr1195. PMC 3245093. PMID 22144683.
- Page, R. D. M. (2011). "Linking NCBI to Wikipedia: A wiki-based approach". PLoS Currents 3: RRN1228. doi:10.1371/currents.RRN1228. PMC 3080707. PMID 21516242.
- Gardner, P. P.; Daub, J.; Tate, J.; Moore, B. L.; Osuch, I. H.; Griffiths-Jones, S.; Finn, R. D.; Nawrocki, E. P.; Kolbe, D. L.; Eddy, S. R.; Bateman, A. (2010). "Rfam: Wikipedia, clans and the "decimal" release". Nucleic Acids Research 39 (Database issue): D141–D145. doi:10.1093/nar/gkq1129. PMC 3013711. PMID 21062808.
- Daub, J.; Gardner, P. P.; Tate, J.; Ramsköld, D.; Manske, M.; Scott, W. G.; Weinberg, Z.; Griffiths-Jones, S.; Bateman, A. (2008). "The RNA WikiProject: Community annotation of RNA families". RNA 14 (12): 2462–2464. doi:10.1261/rna.1200508. PMC 2590952. PMID 18945806.
- Burkhardt, K.; Schneider, B.; Ory, J. (2006). "A Biocurator Perspective: Annotation at the Research Collaboratory for Structural Bioinformatics Protein Data Bank". PLoS Computational Biology 2 (10): e99. doi:10.1371/journal.pcbi.0020099. PMC 1626146. PMID 17069453.
- Logan, D. W.; Sandal, M.; Gardner, P. P.; Manske, M.; Bateman, A. (2010). "Ten Simple Rules for Editing Wikipedia". PLoS Computational Biology 6 (9): e1000941. doi:10.1371/journal.pcbi.1000941. PMC 2947980. PMID 20941386.
- Butler, D. (2008). "Publish in Wikipedia or perish: Journal to require authors to post in the free online encyclopaedia". Nature. doi:10.1038/news.2008.1312.
- Hirschman, L; Burns, G. A.; Krallinger, M; Arighi, C; Cohen, K. B.; Valencia, A; Wu, C. H.; Chatr-Aryamontri, A; Dowell, K. G.; Huala, E; Lourenço, A; Nash, R; Veuthey, A. L.; Wiegers, T; Winter, A. G. (2012). "Text mining for the biocuration workflow". Database 2012: bas020. doi:10.1093/database/bas020. PMC 3328793. PMID 22513129.
- Dai, L.; Tian, M.; Wu, J.; Xiao, J.; Wang, X.; Townsend, J. P.; Zhang, Z. (2013). "AuthorReward: Increasing community curation in biological knowledge wikis through automated authorship quantification". Bioinformatics 29 (14): 1837–1839. doi:10.1093/bioinformatics/btt284. PMC 3702255. PMID 23732274.
- Stolovitzky, Gustavo; Sam Ansari, Jean Binder, Stephanie Boue, Anselmo Di Fabio, William Hayes, Julia Hoeng, Anita Iskandar, Robin Kleiman, Raquel Norel, Bruce O'Neel, Manuel C. Peitsch, Carine Poussin, Dexter Pratt, Kahn Rhrissorrakrai, Walter K. Schlage, Marja Talikka (10 Oct 2013). "On Crowd-verification of Biological Networks". Bioinformatics and Biology Insights. doi:10.4137/BBI.S12932.