Jump to content

Collocation extraction

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by 86.85.60.94 (talk) at 20:23, 23 October 2016 (External links: Spelling error fixed). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Collocation extraction is the task of extracting collocations automatically from a corpus using a computer.

Within the area of corpus linguistics, collocation is defined as a sequence of words or terms which co-occur more often than would be expected by chance. 'Crystal clear', 'middle management', 'nuclear family', and 'cosmetic surgery' are examples of collocated pairs of words. Some words are often found together because they make up a compound noun, for example 'riding boots' or 'motor cyclist'.

The traditional method of performing collocation extraction is to find a formula based on the statistical quantities of those words to calculate a score associated to every word pairs. Proposed formulas are mutual information, t-test, z test, chi-squared test and likelihood ratio.[1]

See also

References

  1. ^ Manning, C. D.; Schütze, H. (1999). Foundations of statistical natural language processing. Cambridge, MA: MIT Press. ISBN 978-0-262-13360-9.