Talk:Indexed grammar

From Wikipedia, the free encyclopedia
Jump to: navigation, search


I'd just like to note that the grammar as written cannot be correct:

  1. S -> aAfc
  2. aAfc -> a(aAgc)fc
  3. a(aAgc)fc -> a(a(aAgc)gc)fc
  4. a(a(aAgc)gc)fc -> a(a(abBc)gc)fc

And now the derivation halts, because the only nonterminal B has no production - it's separated from its index by the intervening terminal c. --Peter Farago (talk) 06:33, 14 March 2009 (UTC)

The description of indexed grammars is actually incomplete; this problem doesn't actually arise because the indexes aren't rewritten in that fashion. I'm working on fixing it now. --Augur (talk) 17:22, 17 February 2010 (UTC)

Questions on the example[edit]

I may not understand completely this formalism, but it appears to me that in the section "Example", the third rule should be:

instead of:

and the rightmost part of the proposed derivation also seems incorrect, it should be:

instead of:

—Preceding unsigned comment added by (talk) 10:18, 6 July 2010 (UTC)
I think anonymous is correct -- the example as currently stated is incorrect.-- (talk) 00:50, 25 December 2010 (UTC)
I agree. Fixed. Clément Pillias (talk) 23:21, 27 November 2011 (UTC)

Source does not correspond to citation[edit]

In section Linear indexed grammars, Gazdar's work "Applicability of Indexed Grammars to Natural Languages" is cited as proving "Membership in a linear indexed grammar can be decided in polynomial time." I just read quickly this paper (accessible throug Google Books [1]) and I have the impression this is not true. Mgalle (talk) 12:40, 14 September 2010 (UTC)


I think a formal definition of IGs is needed, right after the introductory paragraph -- definitely before the first example. Could someone who is actually familiar with IGs provide one? UKoch (talk) 16:24, 26 December 2010 (UTC)

Deciding if a string is in an indexed grammar is NP-complete?[edit]

The reference for the sentence in the introduction to this article, "The problem of determining whether a string is recognized by an indexed grammar is NP-complete.", is Hopcroft and Ullman, Introduction to Automata Theory, Languages, and Computation, 1979. However, I am looking at the (very brief) section on indexed languages in that textbook, and no such statement is made. Therefore I am removing this statement from the article. J. Finkelstein (talk) 21:46, 19 January 2012 (UTC)

Needs cleaning-up[edit]

This article has become a jumble of sections ... it should be more clearly structured. Definition, examples, then linear indexed grammars, equivalences etc.

One thing was missing, though: the formal definition (from Aho's paper) has now been provided. Please double-check! [ɯ:] (talk) 16:29, 27 December 2012 (UTC)

Apparently, the definition uses different terminology than the examples; it should be adapted. In the examples, it should be made clear which production are "index productions" and which are ordinary ones. I cannot see how the 3rd rule in the first example, viz. "T[σf]→T[σ]a", fits in either scheme (index or ordinary production), as both allow only a simple nonterminal on the left-hand side, but not a stack "[σ]" of index symbols, let alone "[σf]". Maybe somebody could explain that. - Jochen Burghardt (talk) 20:18, 4 November 2013 (UTC)

Settings of Hopcroft+Ullman 1979 added for comparison purposes[edit]

The definition of Aho 1968 in the article differs (slightly?) from that in Hopcroft+Ullman 1979. I put the latter here (see below), in case an indexed-grammar expert might wish to include it in the article or use it in some other way. The anbncn example in the article refers to Hopcroft+Ullman 1979, but the grammar's first rule isn't admitted by the definition below. Therefore, I also append the example from Hopcroft+Ullman 1979.

Definition by Hopcroft+Ullman 1979

Formally, an indexed grammar[1] is a 5-tuple G = ⟨N,T,F,P,S⟩ where

  • N is a set of variables or nonterminal symbols,
  • T is a set ("alphabet") of terminal symbols,
  • F is a set of so-called indices,
  • SN is the start symbol, and
  • P is a finite set of productions of the form
    1. A → α,
    2. ABf, or
    3. Af → α,

where A, BN are nonterminal symbols, fF is an index, and α ∈ (NT)* is a string of nonterminal and terminal symbols.

Derivations are similar to those in a context-free grammar except that a nonterminal symbol may be followed by a string ("stack") of indices. Terminal symbols may not be followed by indices. When a production like e.g. ABC is applied, the index stack of A is attached to both B and C.

Formally, the relation ⇒ ("direct derivation") is defined on the set (NF*T)* of "sentential forms" as follows:

  1. If AX1 ... Xn is a production of type 1, then β Aφ γ ⇒ β X1φ1 ... Xnφn γ, where φi is φ if XiN is a nonterminal, and is the empty string ε if XiT is a terminal symbol. That is, a copy of the rule's left hand side's index stack φ is attached to each nonterminal of the right hand side.
  2. If ABf is a production of type 2, then β Aφ γ ⇒ β Bf φ γ. That is, the right hand side's index stack is obtained from the left hand side's stack by pushing f onto it.
  3. If AfX1 ... Xn is a production of type 3, then β Afφ γ ⇒ β X1φ1 ... Xnφn γ, where again φi is φ if XiN is a nonterminal, and ϵ if XiT is a terminal symbol. That is, the first index is popped from the left hand side's stack, which is then distributed to each nonterminal of the right hand side.

As usual, the derivation relation ⇒* is defined as the reflexive transitive closure of direct derivation ⇒. The language L(G) = { w ∈ T*: S* w } is the set of all strings of terminal symbols derivable from the start symbol.

Example by Hopcroft+Ullman 1979

The grammar G = ⟨ {S,T,A,B,C}, {a,b,c}, {f,g}, P, S ⟩ produces the language { anbncn: n ≥ 1 }, where the production set P consists of

STg AfaA Aga
TTf BfbB Bgb
TABC       CfcC       Cgc

That language is known to be not context-free. An example derivation is STgTfgAfg Bfg CfgaAg Bfg CfgaAg bBg CfgaAg bBg cCgaa bBg cCgaa bb cCgaa bb cc.

  1. ^ John E. Hopcroft and Jeffrey D. Ullman (1979). Introduction to Automata Theory, Languages, and Computation. Addison-Wesley. ISBN 0-201-02988-X.  ; here: Sect.14.3, p.389-390. This section is omitted in the 2nd edition 2003.
Jochen Burghardt (talk) 14:40, 5 January 2014 (UTC)

Linear Indexed Grammars[edit]

My impression is that the definition of linear indexed grammars predates the work of Gazdar, although he may first have made the connection with mildly context-sensitive languages.

There is an article "Linear Indexed Languages" by Duske and Parchmann, Theoretical Computer Science 32 (1984), 47-60. They prove the fact that the class of languages L_1 introduced by N.A. Khabbaz in "A Geometric Hierarchy of Languages", J. Comput. Syst. Sci 8 (1974), 142-157 coincides precisely with what they call linear indexed languages: these are obtained by "controlling" a linear context-free grammar with a context-free language. Here a context-free grammar is linear, if the right side of every production contains at most one variable.

What baffles me in the current Wikipedia article is the formulation "by requiring that AT MOST ONE nonterminal in each production be specified as receiving the stack, whereas in a normal indexed grammar, ALL nonterminals receive copies of the stack" (emphasis mine) by which linear indexed grammars are distinguished from ordinary indexed grammars. I would have expected that there is at most one variable present, which therefore has to receive the stack. But perhaps Gazdar defined a different notion of linear indexed grammar, where one can chose among several variables, where to put the stack? Unfortunately, I presently don't have acccess to Gazdar's paper. — Preceding unsigned comment added by (talkcontribs) 4 January 2014‎

I was lucky to look at p.71 of Gazdar's 1988 paper at Google books, where he says:
"nonterminal symbols are indicated by upper case letters (A, B, C), (...) possibly empty strings of terminals and nonterminals by W, W1, W2, etc., indices by lower case italics letters (i, j, k), and stacks of indices by square brackets and periods ([], [..], [i,..]), where [i,..] is a stack whose topmost index is i, [] is an empty, and [..] is a possibly empty stack of indices. (...) In the standard formulation, an indexed grammar can contain rules of three different sorts:
  1. A[..] → W[..]
  2. A[..] → B[i,..]
  3. A[i,..] → W[..]
I shall refer to rules that have one or other of these three forms as H&U rules. The first type of rule simply copies the stack to all nonterminal daughters. The second type of rule pushes a new index onto the stack handed down to its unique nonterminal daughter. And the third type of rule pops an index off the stack and distributes what is left to its nonterminal daughters. (...) A compound symbol of the form A[..] means that the nonterminal A bears the stack [..]. A compound symbol of the form W[..] stands for a string of terminal and/or nonterminal symbols each nonterminal symbol of which bears the stack [..]. Terminal symbols cannot bear stacks."
"(...)" denotes an omission I made, while "[..]" is quoted from the paper. "H&U" refers to Hopcroft and Ullman (1979), in contrast to Aho (1968). Gazdar doesn't mention linear indexed grammars in his introduction, which I could read completely. Does that help? - Jochen Burghardt (talk) 20:38, 4 January 2014 (UTC)
The LIG of Gazdar and Vijay-Shanker is not the same notion as the same-named one from Duske and Parchmann. LIGs in the former sense are much more simply formed. Quoting from Kallmeyer's book: "An indexed grammar is called a linear indexed grammar (LIG) (Gazdar,1988; Vijay-Shanker, 1987) if in a production A → α or Af → α a the stack of A is copied only to one non-terminal in α." That's all there is to Gazdar/Vijay-Shanker's LIG. JMP EAX (talk) 12:27, 17 August 2014 (UTC)

Farily WP:RANDYesque citation request[edit]

The definition of mildly context-sensitive language contains the requirement that it must parseable in PTIME. JMP EAX (talk) 12:08, 17 August 2014 (UTC)

As far as I remember, Gazdar 1988 didn't speak about mildly context-sensitivity either (on my accessible pages), so both statements would need a citation. On second thought, maybe they both can be established from weak equivalence to TAGs, but that didn't come to my mind in Feb.2014.
BTW: As far as I understood, "WP:RANDYesque" means something like "being a non-expert and proud of it"; if that is not too wrong, I would appreciate if you stop to attribute that to me; I try to be neither a non-expert nor proud. - Jochen Burghardt (talk) 13:02, 17 August 2014 (UTC)

External links modified[edit]

Hello fellow Wikipedians,

I have just added archive links to one external link on Indexed grammar. Please take a moment to review my edit. If necessary, add {{cbignore}} after the link to keep me from modifying it. Alternatively, you can add {{nobots|deny=InternetArchiveBot}} to keep me off the page altogether. I made the following changes:

When you have finished reviewing my changes, please set the checked parameter below to true to let others know.

You may set the |checked=, on this template, to true or failed to let other editors know you reviewed the change. If you find any errors, please use the tools below to fix them or call an editor by setting |needhelp= to your help request.

  • If you have discovered URLs which were erroneously considered dead by the bot, you can report them with this tool.
  • If you found an error with any archives or the URLs themselves, you can fix them with this tool.

If you are unable to use these tools, you may set |needhelp=<your help request> on this template to request help from an experienced user. Please include details about your problem, to help other editors.

Cheers.—cyberbot IITalk to my owner:Online 05:38, 23 February 2016 (UTC)