Mildly context-sensitive language

In formal language theory, a class of languages is mildly context-sensitive if it contains all context-free languages, can describe cross-serial dependencies, contains only polynomial languages, and if its languages are of constant growth.^[1] The concept was introduced by Aravind Joshi in 1985 as a characterization of the type of grammar formalism needed for dealing with natural languages. Mild context-sensitivity occupies a middle ground between context-freeness, which is too limited to describe all phenomena present in natural languages, and full context sensitivity, which is too general to reveal anything about the class of natural languages in particular. A variety of formalisms are known to generate language classes which are mildly context-sensitive.

Definition

Mild context-sensitivity is defined in terms of sets of languages. A set of languages is mildly context-sensitive if and only if

it contains all context-free languages,
it admits limited cross-serial dependencies,^{[clarification needed]}
all the languages are parsable in polynomial time, and
all the languages have constant growth; this means that the distribution of string lengths should be linear rather than supralinear. This is often guaranteed by proving a pumping lemma for the set of languages in question.

Formalisms

The notion of mild context-sensitivity does not designate a single class of languages, but applies to any language class meeting the criteria in the definition.^[2] Two such classes are notable, each being generated by several equivalent formalisms. The smaller of the two classes is a proper subset of the larger class.^[3]

The smaller language class is generated by the following formalisms:

Head grammars (HG) of Carl Pollard
Combinatory categorial grammars (CCG) developed by Mark Steedman
Linear indexed grammars (LIG) defined by Gerald Gazdar
Tree-adjoining grammars (TAG) developed by Aravind Joshi
Embedded pushdown automata (EPDA) of Vijay-Shanker

The larger language class is generated by the following formalisms:

Linear context-free rewriting systems (LCFRS) developed by D. J. Weir
Minimalist grammars (MG) of Edward P. Stabler, Alain Lecomte, Christian Retoré, etc.
Multicomponent tree-adjoining grammars (MCTAG) defined in ^[4]
Multiple context free grammars (MCFG), developed in ^[5]
Simple range concatenation grammars (SRCG), developed by Boullier, 2000

The larger class is a subset of the class of languages generated by thread automata, but whether this inclusion is proper is not known.^[6]

Control Language Hierarchy

A more precisely defined hierarchy of languages that correspond to the mildly context-sensitive class was defined by David J. Weir.^[7] Based on the work of Nabil A. Khabbaz,^[8]^[9] Weir's Control Language Hierarchy is a containment hierarchy of countable set of language classes^[clarify] where the Level-1 is defined as context-free, and Level-2 is the class of tree-adjoining and the other three grammars.

Following are some of the properties of Level-k languages in the hierarchy:

Level-k languages are properly contained in the Level-(k + 1) language class
Level-k languages can be parsed in $O(n^{3\cdot 2^{k-1}})$ time
Level-k contains the language $\{a_{1}^{n}\dotso a_{2^{k}}^{n}|n\geq 0\}$ , but not $\{a_{1}^{n}\dotso a_{2^{k+1}}^{n}|n\geq 0\}$
Level-k contains the language $\{w^{2^{k-1}}|w\in \{a,b\}^{*}\}$ , but not $\{w^{2^{k-1}+1}|w\in \{a,b\}^{*}\}$

Those properties correspond well (at least for small k > 1) to the conditions of mildly context-sensitive languages imposed by Joshi, and as k gets bigger, the language class becomes, in a sense, less mildly context-sensitive.

Notes

^ Kallmeyer 2010, p. 23.
^ Kallmeyer 2010, p. 23.
^ Kallmeyer 2010, p. 215-6.
^ Joshi, et. al, 1991
^ T., Kasami; M. Seki; H. Fuji (1988). "Generalized context-free grammars, multiple context-free grammars, and head grammars". Technical Report, Department of Information and Computer Science. Osaka, Japan: Osaka University.
^ Kallmeyer 2010, p. 216.
^ Weir, D. J. (1992), "A geometric hierarchy beyond context-free languages" (PDF), Theoretical computer science, 104 (2): 235–261, doi:10.1016/0304-3975(92)90124-X.
^ Nabil Anton Khabbaz (1972). Generalized context-free languages (Ph.D.). University of Iowa.
^ Nabil Anton Khabbaz (1974). "A geometric hierarchy of languages" (PDF). J. Comput. System Sci. 8: 142–157. doi:10.1016/s0022-0000(74)80052-8.

External links

The tree adjoining language class and the unnamed one immediately above it belong to the mildly context-sensitive language classes, see above.

[1] Kallmeyer 2010, p. 23.

[2] Kallmeyer 2010, p. 23.

[3] Kallmeyer 2010, p. 215-6.

[4] Joshi, et. al, 1991

[5] T., Kasami; M. Seki; H. Fuji (1988). "Generalized context-free grammars, multiple context-free grammars, and head grammars". Technical Report, Department of Information and Computer Science. Osaka, Japan: Osaka University.

[6] Kallmeyer 2010, p. 216.

[7] Weir, D. J. (1992), "A geometric hierarchy beyond context-free languages" (PDF), Theoretical computer science, 104 (2): 235–261, doi:10.1016/0304-3975(92)90124-X.

[8] Nabil Anton Khabbaz (1972). Generalized context-free languages (Ph.D.). University of Iowa.

[9] Nabil Anton Khabbaz (1974). "A geometric hierarchy of languages" (PDF). J. Comput. System Sci. 8: 142–157. doi:10.1016/s0022-0000(74)80052-8.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]