Link grammar

From Wikipedia, the free encyclopedia
Jump to: navigation, search

Link grammar (LG) is a theory of syntax by Davy Temperley and Daniel Sleator which builds relations between pairs of words, rather than constructing constituents in a tree-like hierarchy. There are two basic parameters: directionality and distance. Link grammar is similar to dependency grammar, but dependency grammar includes a head-dependent relationship, whereas Link Grammar makes the head-dependent relationship optional (Links need not indicate direction). Colored Multiplanar Link Grammar (CMLG) is an extension of LG allowing crossing relations between pairs of words.[1] The relationship between words is indicated with link types, thus making the Link grammar closely related to certain categorial grammars.

For example, in a subject–verb–object language like English, the verb would look left to form a subject link, and right to form an object link. Nouns would look right to complete the subject link, or left to complete the object link.

In a subject–object–verb language like Persian, the verb would look left to form an object link, and a more distant left to form a subject link. Nouns would look to the right for both subject and object links.

Syntax[edit]

Rightward links are represented as a +, and leftward links with a . Optional links are contained in curly brackets {}. Undesirable links are contained in any number of square brackets []. Multiple links are joined either by a conjunction & or a disjunction or. Each rule ends with a semicolon ;.

The square brackets effectively give different parses a (floating-point) cost, making the most-likely parse "cheap", and the less likely parses "expensive". Costs are additive, and thus behave like the logarithm of the probability (since log-liklihoods are additive), or equivalently, somewhat like the entropy (since entropies are additive). This makes the Link Grammar compatible with machine learning techniques such as hidden Markov models and the Viterbi algorithm, because the link costs coresspond to the link weights in Markov networks or Bayesian networks.

Newer versions include an extension to this syntax that allows the head and dependent words to be indicated, thus enabling a more traditional dependency-grammar-like parse to be generated. Another exension simplifies the specification of parse rules for languages that have little or no restrictions on word-order, such as Lithuanian. There are also extensions to make it easier to support languages with concatenative morphologies.

The Link Grammar link types can be understood to be the types in the sense of type theory. In effect, the Link Grammar can be used to model the internal language of certain (non-symmetric) compact closed categories, such as pregroup grammars. In this sense, Link Grammar appears to be isomorphic or homomorphic to some categorial grammars.

Examples[edit]

Example 1[edit]

A basic rule file for an SVO language might look like:

<determiner>:      D+;
<noun-subject>:   {D−} & S+;
<noun-object>:    {D−} & O−;
<verb>:               S−   &   {O+};

Thus the English sentence, “The boy painted a picture” would appear as:

           +-----O-----+
 +-D-+--S--+     +--D--+
 |   |     |     |     |
The boy painted  a  picture

Example 2[edit]

Conversely, a rule file for a null subject SOV language might consist of the following links:

<noun-subject>:   S+;
<noun-object>:     O+;
<verb>:            {O−}   &   {S−};

And a simple Persian sentence, man nAn xordam (من نان خوردم) 'I ate bread' would look like:

 +-----S-----+
 |     +--O--+
 |     |     |
man   nAn xordam

Example 3 (Morphology)[edit]

In many languages with a concatenative morphology, the stem plays no grammatical role; the grammar is determined by the suffixes. Thus, in Russian, the sentence 'вверху плыли редкие облачка' might have the parse:

    +------------Wd-----------+---------------SIp---------------+
    |         +-------EI------+              +--------Api-------+
    |         |      +--LLCZD-+       +-LLAQZ+         +--LLCAO-+
    |         |      |        |       |      |         |        |
LEFT-WALL вверху.e плы.= =ли.vnndpp ре.= =дкие.api облачк.= =а.ndnpi 

The subscripts, such as '.vnndpp', are used to indicate the grammatical category. The primary links: Wd, EI, SIp and Api connect together the suffixes, as, in principle, other stems could appear here, without altering the structure of the sentence. The Api link indicates the adjective; SIp denotes subject-verb inversion; EI is a modifier. The Wd link is used to indicate the head noun; the head verb is not indicated in this sentence. The LLXXX links serve only to attach stems to suffixes.

Example 4 (Phonology)[edit]

The link-grammar can also indicate phonological agreement between neighboring words. For example:

                     +---------Ost--------+
    +------>WV------>+   +------Ds**x-----+
    +----Wd---+-Ss*b-+   +--PHv-+----A----+
    |         |      |   |      |         |
LEFT-WALL that.j-p is.v an abstract.a concept.n 

Here, the connector 'PH' is used to constrain the determiners that can appear before the word 'abstract'. It effectively blocks (makes it costly) to use the determiner 'a' in this sentence, while the link to 'an' becomes cheap. The other links are roughly as in previous examples: S denoting subject, O denoting object, D denoting determiner. The 'WV' link indicates the head verb, and the 'W' link indicates the head noun. The lower-case letters following the upper-case link types serve to refine the type; so for example, Ds can only connect to a singular noun; Ss only to a singular subject, Os to a singular object. The lower-case v in PHv denotes 'vowel'; the lower-case d in Wd denotes a declarative sentence.

Implementations[edit]

The link grammar syntax parser is a library for natural language processing written in C. It is available under the LGPL license. The parser[2] is an ongoing project. Recent versions include improved sentence coverage, Russian, Persian and Arabic language support, prototypes for German, Hebrew, Lithuanian and Turkish, and programming API's for Python, Java, Common LISP, AutoIt and OCaml, with 3rd-party bindings for Perl[3] and Ruby.[4]

A current major undertaking is a project to learn the grammar and morphology of new languages, using unsupervised learning algorithms.[5][6]

The link-parser program along with rules and word lists for English may be found in standard Linux distributions, e.g., as a Debian package, although many of these are years out of date.[7]

Applications[edit]

AbiWord checks grammar using Link Grammar

AbiWord,[8] a free word processor, uses Link Grammar for on-the-fly grammar checking. Words that cannot be linked anywhere are underlined in green.

The semantic relationship extractor RelEx,[9] layered on top of the Link Grammar library, generates a dependency grammar output by making explicit the semantic relationships between words in a sentence. Its output can be classified as being at a level between that of SSyntR and DSyntR of Meaning-Text Theory. It also provides framing/grounding, anaphora resolution, head-word identification, lexical chunking, part-of-speech identification, and tagging, including entity, date, money, gender, etc. tagging. It includes a compatibility mode to generate dependency output compatible with the Stanford parser,[10] and Penn Treebank[11]-compatible POS tagging.

Link Grammar has also been employed for information extraction of biomedical texts[12][13] and events described in news articles,[14] as well as experimental machine translation systems from English to German and Turkish.

The Link Grammar link dictionary is used to generate and verify the syntactic correctness of two different natural language generation systems: NLGen[15] and NLGen2.[16] It is also used as a part of the NLP pipeline in the OpenCog AI project.

Notes[edit]

  1. ^ Anssi Yli-Jyrä and Matti Nykänen (2004). "A Hierarchy of Mildly Context-Sensitive Dependency Grammars". In G. P. Gerhard Jäger, Paola Monachesi and S. Wintner. Proceedings of the 9th conference on Formal Grammar 2004 "FGNancy". Pre-Proceedings. pp. 151–165. 
  2. ^ AbiWord — Link Grammar Parser
  3. ^ Lingua-LinkParser (Perl interfaces)
  4. ^ Ruby Link Parser interfaces
  5. ^ OpenCog Language Learning
  6. ^ Learning Language from a Large (Unannotated) Corpus
  7. ^ Debian - Package Search Results - link-grammar
  8. ^ AbiWord — Link Grammar Parser
  9. ^ RelEx Dependency Relationship Extractor
  10. ^ The Stanford Parser: A statistical parser
  11. ^ The Penn Treebank Project
  12. ^ Jing Ding, Daniel Berleant, Jun Xu, Andy W. Fulmer (November 2003). "Extracting biochemical interactions from MEDLINE using a link grammar parser". Tools with Artificial Intelligence, 2003. Proceedings. 15th IEEE International Conference on. pp. 467–471. ISBN 0-7695-2038-3. 
  13. ^ Sampo Pyysalo, Tapio Salakoski, Sophie Aubin and Adeline Nazarenko, "Lexical Adaptation of Link Grammar to the Biomedical Sublanguage: a Comparative Evaluation of Three Approaches", BMC Bioinformatics 7(Suppl 3):S2 (2006).
  14. ^ Harsha V. Madhyastha, N. Balakrishnan, K. R. Ramakrishnan (2003). "Event Information Extraction Using Link Grammar". 13th International WorkShop on Research Issues in Data Engineering: Multi-lingual Information Management (RIDE'03). p. 16. doi:10.1109/RIDE.2003.1249841. 
  15. ^ Ruiting Lian, et al, "Sentence generation for artificial brains: a glocal similarity matching approach", Neurocomputing (Elsevier) (2009, submitted for publication).
  16. ^ Blake Lemoine, NLGen2: A Linguistically Plausible, General Purpose Natural Language Generation System (2009)

Further reading[edit]

External links[edit]

Language extensions[edit]