Mildly context-sensitive language
In formal grammar theory, mildly context-sensitive languages are a class of formal languages which can be efficiently parsed, but still possess enough context sensitivity to allow the parsing of natural languages. The concept was first introduced by Aravind Joshi in 1985.
Contents |
[edit] Definition
Mild context-sensitivity is defined in terms of sets of languages. A set of languages is mildly context-sensitive if and only if
- it contains all context-free languages,
- it admits limited cross-serial dependencies,
- all the languages are parsable in polynomial time, and
- all the languages have constant growth; this means that the distribution of string lengths should be linear rather than supralinear. This is often guaranteed by proving a pumping lemma for some class of mildly context-sensitive languages.
[edit] Formalisms
Some attempts at creating mildly context-sensitive language formalisms include:
- Linear context-free rewriting systems developed by D. J. Weir
- Minimalist grammars of Edward P. Stabler, Alain Lecomte, Christian Retoré, etc.
- Multicomponent tree-adjoining grammars (defined in [1]).
- Multiple context free grammars, developed in.[2]
- Simple range concatenation grammars, developed by Boullier, 2000.
-
- Head grammars of Carl Pollard
- Combinatory categorial grammars developed by Mark Steedman
- Linear indexed grammars defined by Gerald Gazdar
- Tree-adjoining grammars developed by Aravind Joshi
The first grouping of these grammar classes define one set of languages, while the second grouping defines another strictly smaller class.[3] The larger of the two classes may be parsed by thread automatons, while the other, smaller one may be parsed by embedded pushdown automatons.
[edit] Control Language Hierarchy
A more precisely defined hierarchy of languages that correspond to the mildly context-sensitive class was defined by David J. Weir. Based on the work of Nabil A. Khabbaz, Weir's Control Language Hierarchy is a containment hierarchy of countable set of language classes where the Level-1 is defined as context-free, and Level-2 is the class of tree-adjoining and the other three grammars.
Following are some of the properties of Level-k languages in the hierarchy:
- Level-k languages are properly contained in the Level-(k + 1) language class
- Level-k languages can be parsed in
time - Level-k contains the language
, but not 
- Level-k contains the language
, but not 
Those properties correspond well (at least for small k > 1) to the conditions of mildly context-sensitive languages imposed by Joshi, and as k gets bigger, the language class becomes, in a sense, less mildly context-sensitive.
[edit] See also
[edit] Notes
- ^ Joshi, et. al, 1991
- ^ T., Kasami; M. Seki, and H. Fuji (1988). "Generalized context-free grammars, multiple context-free grammars, and head grammars". Technical Report, Department of Information and Computer Science (Osaka, Japan: Osaka University).
- ^ see, e.g., Joshi 1991
[edit] Further reading
- Joshi, Aravind; Vijay-Shanker, K.; Weir, David (1991), "The Convergence of Mildly Context-Sensitive Grammar Formalisms", in Sells, Peter; Shieber, Stuart; Wasow, Thomas, Foundational Issues in Natural Language Processing, Cambridge, MA: MIT Press, pp. 31–81, ISBN 0-262-19303-5
- Vijay-Shanker, K.; Weir, David (1994), "The Equivalence of Four Extensions of Context-Free Grammars", Mathematical Systems Theory 27 (6): 511–546, doi:10.1007/BF01191624, ISSN 0025-5661
- Villemonte de La Clergerie, Eric (2002), "Parsing Mildly Context-Sensitive Languages with Thread Automata", Proceedings of the 19th international conference on Computational linguistics 1, pp. 1–7
- Weir, D. J. (1992), "A geometric hierarchy beyond context-free languages", Theoretical computer science 104 (2): 235–261, doi:10.1016/0304-3975(92)90124-X.
[edit] External links
- Parsing Beyond Context-Free Grammar, by Laura Kallmeyer
- Seminar on tree-adjoining grammars and mildly context-sensitive languages and formalisms, by Laura Kallmeyer
- Mildly Context-Sensitive Grammars, by Aravind Joshi
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||
time
, but not 
, but not 