Critical Assessment of Function Annotation
The Critical Assessment of Functional Annotation (CAFA) is an experiment designed to provide a large-scale assessment of computational methods dedicated to predicting protein function. Different algorithms are evaluated by their ability to predict the Gene Ontology (GO) terms in the categories of Molecular Function, Biological Process, and Cellular Component.
The experiment consists of two tracks: (i) the eukaryotic track, (ii) the prokaryotic track. In each track, a set of targets is provided by the organizers. Participants are expected to submit their predictions by the submission deadline, after which they are assessed according to a set of specific metrics.
The genome of an organism may consist of hundreds to tens of thousands of genes, which encode for hundreds of thousands of different protein sequences. Due to the relatively low cost of genome sequencing, determining gene and protein sequences is fast and inexpensive. Thousands of species have been sequenced so far, yet many of the proteins are not well characterized. The process of experimentally determining the role of a protein in the cell, is an expensive and time consuming task. Further, even when functional assays are performed they are unlikely to provide complete insight into protein function. Therefore it has become important to use computational tools in order to functionally annotate proteins. There are several computational methods of protein function prediction that can infer protein function using a variety of biological and evolutionary data, but there is significant room for improvement. Accurate prediction of protein function can have longstanding implications on biomedical and pharmaceutical research.
The CAFA experiment is designed to provide unbiased assessment of computational methods, to stimulate research in computational function prediction, and provide insights into the overall state-of-the-art in function prediction.
The experiment consists of three phases:
- Prediction phase: ~4 months
Organizers provide protein sequences with unknown or incomplete function to community and set the deadline for the submission of predictions
- Target accumulation: 6–12 months
After all predictions are stored and the experiment enters a waiting period in which protein functions are expected to accumulate in public databases
- Analysis Phase: 1 month
Predictors are ranked according to their performance. The results are publicly shared in scientific meetings and published after peer review.
The CAFA experiment is conducted by the Automated Function Prediction (AFP) Special Interest Group (AFP/SIG). An AFP/SIG meeting has been held alongside the Intelligent Systems for Molecular Biology conference in 2005, 2006, 2008, 2011, and 2012. The first CAFA experiment was organized between fall 2010 and spring 2012. The organizers provided 48,000 sequences for the community with the task to prediction Gene Ontology annotations for each of these sequences. Of those 48,000 proteins, 866 were experimentally annotated during target accumulation phase. The results showed that current function prediction algorithms perform significantly better than a simple domain assignment or a straightforward use of BLAST package. However, they also revealed that accurate prediction of a protein's biological function is still an open and challenging problem.
The first CAFA experiment was organized between fall 2010 and spring 2012. The organizers provided 48,000 sequences for the community with the task to prediction Gene Ontology annotations for each of these sequences. Of those 48,000 proteins, 866 were experimentally annotated during target accumulation phase. The results showed that current function prediction algorithms perform significantly better than a simple domain assignment or a straightforward use of BLAST package. However, they also revealed that accurate prediction of a protein's biological function is still an open and challenging problem.
The second CAFA experiment kicked off in fall 2013. Starting in August, interested parties could download more than 100,000 target sequences in 27 species. Registered teams are challenged to annotate the sequences with Gene Ontology terms, with an additional challenge to annotate human sequences with Human Phenotype Ontology terms. The submission deadline was January 15, 2014. The assessment of predictions will take place in June 2014.
- Predrag, Radivojac et al. (2013). "A large-scale evaluation of computational protein function prediction". Nature Methods 10 (3): 221–227. doi:10.1038/nmeth.2340. PMC 3584181. PMID 23353650.
- Bernal, Axel; Uy Ear; Nikos Kyrpides (2001). "Genomes OnLine Database (GOLD): a monitor of genome projects world-wide". Nucleic Acids Research 29 (1): 126–127. doi:10.1093/nar/29.1.126.
- Friedberg, Iddo; Martin Jambon; Adam Godzik (June 2006). "New avenues in protein function prediction". Protein Science 15: 1527–1529. doi:10.1110/ps.062158406.
- Rodrigues, Ana; Barry Grant; Adam Godzik; Iddo Friedberg (2007). "The 2006 Automated Function Prediction Meeting". Bioinformatics 8 (Suppl 4): S1. doi:10.1186/1471-2105-8-s4-s1.
- Gillis, Jesse; Paul Pavlidis (April 2013). "Characterizing the state of the art in the computational assignment of gene function: lessons from the first critical assessment of functional annotation (CAFA)". BMC Bioinformatics 14 (Suppl 3): S15. doi:10.1186/1471-2105-14-s3-s15.
- Automated Function Prediction Special Interest Group - CAFA Challenge participation information