This article has multiple issues. Please help improve it or discuss these issues on the talk page. (Learn how and when to remove these template messages)(Learn how and when to remove this template message)
Big Mechanism is a $45 million DARPA research program, begun in 2014, aimed at developing software that will read cancer research papers, integrate them into a cancer model and frame new hypotheses by the end of 2017 through the automated collection of big data and integrating across various disciplines such as knowledge-based NLP, curation and ontology, systems and mathematical biology by reading research abstracts and papers to extract pieces of causal mechanisms.
The program focuses on mutations in the Ras gene family, which underlie some one-third of human cancers. Currently, a rough road map shows interaction sequences among proteins affecting cell replication and death. However, the causal relations are poorly understood.
The program is to occur in three stages. The first is to read literature and convert it into formal representations. Second is to integrate the knowledge into computational models. Third is to produce experimentally testable explanations and predictions. Research teams are developing four separate systems targeting all three tasks.
In February 2015 an evaluation meeting reviewed progress on the first stage. Multiple tasks were considered. One was extraction of experimental procedure details and evaluating statements such as “we demonstrate” and “we suggest.” Another worked to map sentence meaning and relationships. The best machine-reading system extracted 40% of relevant information from a small corpus and correctly determined how each passage related to the model.
The second stage is to become active in summer 2015, when members attempt to produce a single reference model. The third stage is the most challenging, because the artificial intelligence community has had limited success at developing hypothesis generators. Molecular biology may be more amenable, because most domain knowledge is technical and available in written form.