Mildly context-sensitive language

From Wikipedia, the free encyclopedia
Jump to: navigation, search

In formal language theory, a class of languages is mildly context-sensitive if it contains all context-free languages, can describe cross-serial dependencies, contains only polynomial languages, and if its languages are of constant growth.[1] The concept was introduced by Aravind Joshi in 1985 as a characterization of the type of grammar formalism needed for dealing with natural languages. Mild context-sensitivity occupies a middle ground between context-freeness, which is too limited to describe all phenomena present in natural languages, and full context sensitivity, which is too general to reveal anything about the class of natural languages in particular. A variety of formalisms are known to generate language classes which are mildly context-sensitive.

Definition[edit]

Mild context-sensitivity is defined in terms of sets of languages. A set of languages is mildly context-sensitive if and only if

  1. it contains all context-free languages,
  2. it admits limited cross-serial dependencies,
  3. all the languages are parsable in polynomial time, and
  4. all the languages have constant growth; this means that the distribution of string lengths should be linear rather than supralinear. This is often guaranteed by proving a pumping lemma for the set of languages in question.

Formalisms[edit]

The notion of mild context-sensitivity does not designate a single class of languages, but applies to any language class meeting the criteria in the definition. Two such classes are notable, each being generated by several equivalent formalisms. The smaller of the two classes is a proper subset of the larger class.[2]

The smaller language class is generated by the following formalisms:

The larger language class is generated by the following formalisms:

The larger class is a subset of the class of languages generated by thread automata, but whether this inclusion is proper is not known.[5]

Control Language Hierarchy[edit]

A more precisely defined hierarchy of languages that correspond to the mildly context-sensitive class was defined by David J. Weir.[6] Based on the work of Nabil A. Khabbaz,[7][8] Weir's Control Language Hierarchy is a containment hierarchy of countable set of language classes where the Level-1 is defined as context-free, and Level-2 is the class of tree-adjoining and the other three grammars.

Following are some of the properties of Level-k languages in the hierarchy:

  • Level-k languages are properly contained in the Level-(k + 1) language class
  • Level-k languages can be parsed in O(n^{3\cdot2^{k-1}}) time
  • Level-k contains the language \{a_1^n \dotso a_{2^k}^n|n\geq0\}, but not \{a_1^n \dotso a_{2^{k+1}}^n|n\geq0\}
  • Level-k contains the language \{w^{2^{k-1}}|w\in\{a,b\}^*\}, but not \{w^{2^{k-1}+1}|w\in\{a,b\}^*\}

Those properties correspond well (at least for small k > 1) to the conditions of mildly context-sensitive languages imposed by Joshi, and as k gets bigger, the language class becomes, in a sense, less mildly context-sensitive.

See also[edit]

Notes[edit]

  1. ^ Kallmeyer 2010, p. 23.
  2. ^ Kallmeyer 2010, p. 215-6.
  3. ^ Joshi, et. al, 1991
  4. ^ T., Kasami; M. Seki; H. Fuji (1988). "Generalized context-free grammars, multiple context-free grammars, and head grammars". Technical Report, Department of Information and Computer Science (Osaka, Japan: Osaka University). 
  5. ^ Kallmeyer 2010, p. 216.
  6. ^ Weir, D. J. (1992), "A geometric hierarchy beyond context-free languages", Theoretical computer science 104 (2): 235–261, doi:10.1016/0304-3975(92)90124-X. 
  7. ^ Nabil Anton Khabbaz (1972). Generalized context-free languages (Ph.D.). University of Iowa. 
  8. ^ Nabil Anton Khabbaz (1974). "A geometric hierarchy of languages". J. Comput. System Sci. 8: 142–157. 

Further reading[edit]


External links[edit]

The tree adjoining language class and the unnamed one immediately above it belong to the mildly context-sensitive language classes, see above.