Mildly context-sensitive language
In formal language theory, a class of languages is mildly context-sensitive if it contains all context-free languages, can describe cross-serial dependencies, contains only polynomial languages, and if its languages are of constant growth.[1] The concept was introduced by Aravind Joshi in 1985 as a characterization of the type of grammar formalism needed for dealing with natural languages. Mild context-sensitivity occupies a middle ground between context-freeness, which is too limited to describe all phenomena present in natural languages, and full context sensitivity, which is too general to reveal anything about the class of natural languages in particular. A variety of formalisms are known to generate language classes which are mildly context-sensitive.
Definition
Mild context-sensitivity is defined in terms of sets of languages. A set of languages is mildly context-sensitive if and only if
- it contains all context-free languages,
- it admits limited cross-serial dependencies,[clarification needed]
- all the languages are parsable in polynomial time, and
- all the languages have constant growth; this means that the distribution of string lengths should be linear rather than supralinear. This is often guaranteed by proving a pumping lemma for the set of languages in question.
Formalisms
The notion of mild context-sensitivity does not designate a single class of languages, but applies to any language class meeting the criteria in the definition.[2] Two such classes are notable, each being generated by several equivalent formalisms. The smaller of the two classes is a proper subset of the larger class.[3]
The smaller language class is generated by the following formalisms:
- Head grammars (HG) of Carl Pollard
- Combinatory categorial grammars (CCG) developed by Mark Steedman
- Linear indexed grammars (LIG) defined by Gerald Gazdar
- Tree-adjoining grammars (TAG) developed by Aravind Joshi
- Embedded pushdown automata (EPDA) of Vijay-Shanker
The larger language class is generated by the following formalisms:
- Linear context-free rewriting systems (LCFRS) developed by D. J. Weir
- Minimalist grammars (MG) of Edward P. Stabler, Alain Lecomte, Christian Retoré, etc.
- Multicomponent tree-adjoining grammars (MCTAG) defined in [4]
- Multiple context free grammars (MCFG), developed in [5]
- Simple range concatenation grammars (SRCG), developed by Boullier, 2000
The larger class is a subset of the class of languages generated by thread automata, but whether this inclusion is proper is not known.[6]
Control Language Hierarchy
A more precisely defined hierarchy of languages that correspond to the mildly context-sensitive class was defined by David J. Weir.[7] Based on the work of Nabil A. Khabbaz,[8][9] Weir's Control Language Hierarchy is a containment hierarchy of countable set of language classes[clarify] where the Level-1 is defined as context-free, and Level-2 is the class of tree-adjoining and the other three grammars.
Following are some of the properties of Level-k languages in the hierarchy:
- Level-k languages are properly contained in the Level-(k + 1) language class
- Level-k languages can be parsed in time
- Level-k contains the language , but not
- Level-k contains the language , but not
Those properties correspond well (at least for small k > 1) to the conditions of mildly context-sensitive languages imposed by Joshi, and as k gets bigger, the language class becomes, in a sense, less mildly context-sensitive.
See also
Notes
- ^ Kallmeyer 2010, p. 23.
- ^ Kallmeyer 2010, p. 23.
- ^ Kallmeyer 2010, p. 215-6.
- ^ Joshi, et. al, 1991
- ^ T., Kasami; M. Seki; H. Fuji (1988). "Generalized context-free grammars, multiple context-free grammars, and head grammars". Technical Report, Department of Information and Computer Science. Osaka, Japan: Osaka University.
- ^ Kallmeyer 2010, p. 216.
- ^ Weir, D. J. (1992), "A geometric hierarchy beyond context-free languages" (PDF), Theoretical computer science, 104 (2): 235–261, doi:10.1016/0304-3975(92)90124-X.
- ^ Nabil Anton Khabbaz (1972). Generalized context-free languages (Ph.D.). University of Iowa.
- ^ Nabil Anton Khabbaz (1974). "A geometric hierarchy of languages" (PDF). J. Comput. System Sci. 8: 142–157. doi:10.1016/s0022-0000(74)80052-8.
Further reading
- Joshi, Aravind; Vijay-Shanker, K.; Weir, David (1991), "The Convergence of Mildly Context-Sensitive Grammar Formalisms", in Sells, Peter; Shieber, Stuart; Wasow, Thomas (eds.), Foundational Issues in Natural Language Processing, Cambridge, MA: MIT Press, pp. 31–81, ISBN 0-262-19303-5
- Kallmeyer, Laura (2010). Parsing Beyond Context-Free Grammars. Springer.
- Vijay-Shanker, K.; Weir, David (1994), "The Equivalence of Four Extensions of Context-Free Grammars", Mathematical Systems Theory, 27 (6): 511–546, doi:10.1007/BF01191624, ISSN 0025-5661
- Villemonte de La Clergerie, Eric (2002), "Parsing Mildly Context-Sensitive Languages with Thread Automata", Proceedings of the 19th international conference on Computational linguistics (PDF), vol. 1, pp. 1–7
External links
- Parsing Beyond Context-Free Grammar, by Laura Kallmeyer
- Seminar on tree-adjoining grammars and mildly context-sensitive languages and formalisms, by Laura Kallmeyer
- Mildly Context-Sensitive Grammars, by Aravind Joshi
The tree adjoining language class and the unnamed one immediately above it belong to the mildly context-sensitive language classes, see above.