Mildly context-sensitive language

In formal grammar theory, mildly context-sensitive languages are a class of formal languages which can be efficiently parsed, but still possess enough context sensitivity to allow the parsing of natural languages. The concept was first introduced by Aravind Joshi in 1985.

Definition

Mild context-sensitivity is defined in terms of sets of languages. A set of languages is mildly context-sensitive if and only if

1. it contains all context-free languages,
2. it admits limited cross-serial dependencies,
3. all the languages are parsable in polynomial time, and
4. all the languages have constant growth; this means that the distribution of string lengths should be linear rather than supralinear. This is often guaranteed by proving a pumping lemma for some class of mildly context-sensitive languages.

Formalisms

Some attempts at creating mildly context-sensitive language formalisms include:

-

The first grouping of these grammar classes define one set of languages, while the second grouping defines another strictly smaller class.[3] The larger of the two classes may be parsed by thread automatons, while the other, smaller one may be parsed by embedded pushdown automatons.

Control Language Hierarchy

A more precisely defined hierarchy of languages that correspond to the mildly context-sensitive class was defined by David J. Weir. Based on the work of Nabil A. Khabbaz, Weir's Control Language Hierarchy is a containment hierarchy of countable set of language classes where the Level-1 is defined as context-free, and Level-2 is the class of tree-adjoining and the other three grammars.

Following are some of the properties of Level-k languages in the hierarchy:

• Level-k languages are properly contained in the Level-(k + 1) language class
• Level-k languages can be parsed in $O(n^{3\cdot2^{k-1}})$ time
• Level-k contains the language $\{a_1^n \dotso a_{2^k}^n|n\geq0\}$, but not $\{a_1^n \dotso a_{2^{k+1}}^n|n\geq0\}$
• Level-k contains the language $\{w^{2^{k-1}}|w\in\{a,b\}^*\}$, but not $\{w^{2^{k-1}+1}|w\in\{a,b\}^*\}$

Those properties correspond well (at least for small k > 1) to the conditions of mildly context-sensitive languages imposed by Joshi, and as k gets bigger, the language class becomes, in a sense, less mildly context-sensitive.