Jump to content

User:Man in the Hollow/sandbox

From Wikipedia, the free encyclopedia

In computational linguistics, the term mildly context-sensitive grammar formalisms refers to several grammar formalisms that have been developed with the ambition to provide adequate descriptions of the syntactic structure of natural language. Every such formalism defines a class of mildly context-sensitive grammars (the grammars that can be specified in the formalism), and therefore also a class of mildly context-sensitive languages (the formal languages generated by the grammars).

Background

[edit]

By 1985, several researchers in descriptive and mathematical linguistics had provided evidence against the hypothesis that the syntactic structure of natural language can be adequately described by context-free grammars.[1][2] At the same time, the step to the next level of the Chomsky hierarchy, to context-sensitive grammars, appeared both unnecessary and undesirable. In an attempt to pinpoint the exact formal power required for the adequate description of natural language syntax, Aravind Joshi characterized ‘grammars (and associated languages) that are only slightly more powerful than context-free grammars (context-free languages)’.[3] He called these grammars mildly context-sensitive grammars and the associated languages mildly context-sensitive languages.

Joshi’s characterization of mildly context-sensitive grammars was biased toward his work on tree-adjoining grammar (TAG). However, together with his students Vijay Shanker and David Weir, Joshi soon discovered that TAGs are equivalent, in terms of the generated languages, to the independently introduced head grammar (HG).[4] This was followed by two similar equivalence results, for linear indexed grammar (LIG)[5] and combinatory categorial grammar (CCG)[6], which showed that the notion of mildly context-sensitivity is a very general one and not tied to a specific formalism.

The TAG-equivalent formalisms were generalized by the introduction of linear context-free rewriting systems (LCFRS).[7][8] These grammars define an infinite hierarchy of languages in between the context-free and the context-sensitive languages, with the languages generated by the TAG-equivalent formalisms at the lower end of the hierarchy. Because of this, LCFRS is often regarded as the most general formalism for specifying mildly context-sensitive grammars. However, several authors have noted that some of the characteristic properties of the TAG-equivalent formalisms are not preserved by LCFRS[9], and that on the other hand there are languages that have the characteristic properties of mildly context-sensitivity but are not generated by LCFRS[10].

Characterization

[edit]

There is no generally accepted formal definition of mildly context-sensitivity; Joshi[3] only provides a ‘rough characterization’. According to this, a class of mildly context-sensitive grammars has the following properties:

  1. limited cross-serial dependencies
  2. constant growth
  3. polynomial parsing

In addition to these, it is understood that every class of mildly context-sensitive grammars should be able to generate all context-free languages.

Cross-serial dependencies

[edit]

The term cross-serial dependencies refers to certain characteristic word ordering patterns, in particular to the verb–argument patterns observed in subordinate clauses in Dutch[1] and Swiss German[2]. These are the very patterns that can be used to argue against the context-freeness of natural language; thus requiring mildly context-sensitive grammars to model cross-serial dependencies means that these grammars must be more powerful than context-free grammars.

Some authors identify the ability to model cross-serial dependencies with the ability to generate the copy language

This language is not context-free, which can be shown using the pumping lemma.

Constant growth

[edit]

A formal language is of constant growth if every string in the language is longer than the next shorter strings by at most a (language-specific) constant. Languages that violate this property are often considered to be beyond human capacity, although some authors have argued that certain phenomena in natural language do show a growth that cannot be bounded by a constant .[11]

Most mildly context-sensitive grammar formalisms (in particular, LCFRS) actually satisfy a stronger property than constant growth called semilinearity.[7] A language is semilinear if its image under the Parikh-mapping (the mapping that ‘forgets’ the relative position of the symbols in a string, effectively treating it as a bag of words) is a regular language. All semilinear languages are of constant growth, but not every language with constant growth is semilinear.[10]

Polynomial parsing

[edit]

A grammar formalism is said to have polynomial parsing if its membership problem can be solved in deterministic polynomial time. This is the problem to decide, given a grammar G written in the formalism and a string w, whether w is generated by G – that is, whether w is ‘grammatical’ according to G. The time complexity of this problem is measured in terms of the combined size of G and w. Note that for practical applications one is interested not only in the yes/no question whether a sentence is grammatical, but also in the syntactic structure that the grammar assigns to the sentence.

Formalisms

[edit]

Over the years, a large number of grammar formalisms have been introduced that satisfy some or all of the characteristic properties put forth by Joshi.

TAG-equivalent formalisms

[edit]

LCFRS-equivalent formalisms

[edit]

References

[edit]
  1. ^ a b Riny Huybregts. The Weak Inadequacy of Context-Free Phrase Structure Grammars. In Ger de Haan, Mieke Trommelen, and Wim Zonneveld, editors, Van periferie naar kern, pages 81–99. Foris, Dordrecht, The Netherlands, 1984.
  2. ^ a b Stuart M. Shieber. Evidence Against the Context-Freeness of Natural Language. Linguistics and Philosophy, 8(3):333–343, 1985.
  3. ^ a b c Aravind K. Joshi. Tree Adjoining Grammars: How Much Context-Sensitivity Is Required to Provide Reasonable Structural Descriptions?. In David R. Dowty, Lauri Karttunen, and Arnold M. Zwicky, editors, Natural Language Parsing, pages 206–250. Cambridge University Press, 1985.
  4. ^ David J. Weir, K. Vijay-Shanker, and Aravind K. Joshi. The Relationship Between Tree Adjoining Grammars and Head Grammars. In Proceedings of the 24th Annual Meeting of the Association for Computational Linguistics (ACL), pages 67–74, New York, USA, 1986.
  5. ^ K. Vijay-Shanker. A Study of Tree Adjoining Grammars. Ph.D. thesis, University of Pennsylvania, Philadelphia, USA, 1987.
  6. ^ a b David J. Weir and Aravind K. Joshi. Combinatory Categorial Grammars: Generative Power and Relationship to Linear Context-Free Rewriting Systems. In Proceedings of the 26th Annual Meeting of the Association for Computational Linguistics (ACL), pages 278–285, Buffalo, USA, 1988.
  7. ^ a b c d K. Vijay-Shanker, David J. Weir, and Aravind K. Joshi. Characterizing Structural Descriptions Produced by Various Grammatical Formalisms. In Proceedings of the 25th Annual Meeting of the Association for Computational Linguistics (ACL), pages 104–111, Stanford, CA, USA, 1987.
  8. ^ a b David J. Weir. Characterizing Mildly Context-Sensitive Grammar Formalisms. Ph.D. thesis, University of Pennsylvania, Philadelphia, USA, 1988.
  9. ^ Makoto Kanazawa. The Pumping Lemma for Well-Nested Multiple Context-Free Languages. In Developments in Language Theory. 13th International Conference, DLT 2009, Stuttgart, Germany, June 30–July 3, 2009. Proceedings, volume 5583 of Lecture Notes in Computer Science, pages 312–325, 2009.
  10. ^ a b Laura Kallmeyer. On Mildly Context-Sensitive Non-Linear Rewriting. Research on Language and Computation, 8(4):341–363, 2010.
  11. ^ Jens Michaelis and Marcus Kracht. Semilinearity as a Syntactic Invariant. In Logical Aspects of Computational Linguistics. First International Conference, LACL 1996, Nancy, France, September 23–25, 1996. Selected Papers, volume 1328 of Lecture Notes in Computer Science, pages 329–345. Springer, 1997.
  12. ^ Carl J. Pollard. Generalized Phrase Structure Grammars, Head Grammars, and Natural Language. Ph.D. thesis, Stanford University, 1984.
  13. ^ Kelly Roach. Formal Properties of Head Grammars. In Alexis Manaster-Ramer, editor, Mathematics of Language, pages 293–347. John Benjamins, 1987.
  14. ^ Gerald Gazdar. Applicability of Indexed Grammars to Natural Language. In Uwe Reyle and Christian Rohrer, editors, Natural Language Parsing and Linguistic Theories, pages 69–94. D. Reidel, 1987.
  15. ^ Hiroyuki Seki, Takashi Matsumura, Mamoru Fujii, and Tadao Kasami. On Multiple Context-Free Grammars. Theoretical Computer Science, 88(2):191–229, 1991.
  16. ^ Jens Michaelis. Derivational Minimalism Is Mildly Context-Sensitive. In Logical Aspects of Computational Linguistics, Third International Conference, LACL 1998, Grenoble, France, December 14–16, 1998, Selected Papers, volume 2014 of Lecture Notes in Computer Science, pages 179–198. Springer, 1998.
  17. ^ Pierre Boullier. Range Concatenation Grammars. In Harry C. Bunt, John Carroll, and Giorgio Satta, editors, New Developments in Parsing Technology, volume 23 of Text, Speech and Language Technology, pages 269–289. Kluwer Academic Publishers, 2004.