Entity linking

From Wikipedia, the free encyclopedia
Jump to: navigation, search

In natural language processing, entity linking, named entity disambiguation (NED), named entity recognition and disambiguation (NERD) or named entity normalization (NEN)[1] is the task of determining the identity of entities mentioned in text. It is distinct from named entity recognition (NER) in that it identifies not the occurrence of names (and a limited classification of those), but their reference.

Entity linking needs a knowledge base of entities to which names can be linked. A popular choice for entity linking on open domain text is Wikipedia,[1][2] and when that is used, the process may be called wikification (as in the Wikify! program, an early entity linking system).[3] In a closed domain setting, a knowledge base may be induced automatically from training text.[4]

Any entity linking algorithm must battle the inherent ambiguity that even names have. Various approaches to tackle this problem have been tried. In the seminal approach of Milne and Witten, supervised learning is employed using the anchor texts of Wikipedia itself as the training data.[5] Kulkarni et al. exploited the common property that topically coherent documents refer to entities belonging to strongly related types.[6] The training data also can be collected by an automatic approach based on unambiguous synonyms.[7]

Entity linking has been suggested as a way to automate the construction of a semantic web.[3] It has been used to improve the performance of information retrieval systems.[1]

Another aspect is author name disambiguation in digital libraries where the goal is to cluster and link the same authors both in papers and in citations together. [8]

Entity Linking evaluation campaigns are organized by the U.S. National Institute of Standards and Technology (NIST) in the context of the Knowledge Base Population task of the Text Analysis Conference.

See also[edit]


  1. ^ a b c M. A. Khalid, V. Jijkoun and M. de Rijke (2008). The impact of named entity normalization on information retrieval for question answering. Proc. ECIR.
  2. ^ Xianpei Han, Le Sun and Jun Zhao (2011). Collective entity linking in web text: a graph-based method. Proc. SIGIR.
  3. ^ a b Rada Mihalcea and Andras Csomai (2007). Wikify! Linking Documents to Encyclopedic Knowledge. Proc. CIKM.
  4. ^ Aaron M. Cohen (2005). Unsupervised gene/protein named entity normalization using automatically extracted dictionaries. Proc. ACL-ISMB Workshop on Linking Biological Literature, Ontologies and Databases: Mining Biological Semantics, pp. 17–24.
  5. ^ David Milne and Ian H. Witten (2008). Learning to link with Wikipedia. Proc. CIKM.
  6. ^ Kulkarni, Sayali; Singh, Amit; Ramakrishnan, Ganesh; Chakrabarti, Soumen (2009). Collective annotation of Wikipedia entities in web text. Proc. 15th ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining (KDD). doi:10.1145/1557019.1557073. ISBN 9781605584959.  edit
  7. ^ Zhang, Wei; Jian Su; Chew Lim Tan (2010). "Entity Linking Leveraging Automatically Generated Annotation". Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010). 
  8. ^ Hui Han, Hongyuan Zha, C. Lee Giles, "Name disambiguation in author citations using a K-way spectral clustering method," ACM/IEEE Joint Conference on Digital Libraries 2005 (JCDL 2005): 334-343, 2005