National Centre for Text Mining

From Wikipedia, the free encyclopedia
Jump to: navigation, search

The National Centre for Text Mining (NaCTeM) [1] is a publicly funded text mining (TM) centre. It was established to provide support, advice, and information on TM technologies and to disseminate information from the larger TM community, while also providing tailored services and tools in response to the requirements of the United Kingdom academic community.

The software tools and services which NaCTeM supplies allow researchers to apply text mining techniques to problems within their specific areas of interest - examples of these tools are highlighted below. In addition to providing services, the Centre is also involved in, and makes significant contributions to, the text mining research community both nationally and internationally in initiatives such as Europe PubMed Central.

The Centre is located in the Manchester Institute of Biotechnology and is operated and organized by the University of Manchester School of Computer Science. NaCTeM contributes expertise in information extraction, natural language processing and parallel and distributed data mining systems in biomedical and clinical applications.


TerMine is a domain independent method for automatic term recognition which can be used to help locate the most important terms in a document and automatically ranks them. [2]

AcroMine finds all known expanded forms of acronyms as they have appeared in Medline entries or conversely, it can be used to find possible acronyms of expanded forms as they have previously appeared in Medline and disambiguates them.[3]

Medie is an intelligent search engine, for semantic retrieval of sentences containing biomedical correlations from Medline abstracts.

Facta+ is a MEDLINE search engine for finding associations between biomedical concepts.[4]

KLEIO is a faceted semantic information retrieval system based on MEDLINE.

Info-PubMed provides information and graphical representation of biomedical interactions extracted from Medline using deep semantic parsing technology. This is supplemented with a term dictionary consisting of over 200,000 protein/gene names and identification of disease types and organisms.


BioLexicon a large-scale terminological resource for the biomedical domain

GENIA a collection of reference materials for the development of biomedical text mining systems


  1. ^ Ananiadou S (2007). "The National Centre for Text Mining: A Vision for the Future". Ariadne (53). 
  2. ^ Frantzi, K., Ananiadou, S. and Mima, H. (2007). "Automatic recognition of multi-word terms". International Journal of Digital Libraries 3 (2): 117–132. 
  3. ^ Okazaki N, Ananiadou S (2006). "Building an abbreviation dictionary using a term recognition approach.". Bioinformatics 22 (24): 3089–95. doi:10.1093/bioinformatics/btl534. PMID 17050571. 
  4. ^ Tsuruoka Y, Tsujii J, Ananiadou S (2008). "FACTA: a text search engine for finding associated biomedical concepts". Bioinformatics 24 (21): 2559–60. doi:10.1093/bioinformatics/btn469. PMC 2572701. PMID 18772154. 

External links[edit]