Indexed grammars are a generalization of context-free grammars in that nonterminals are equipped with lists of flags, or index symbols. The language produced by an indexed grammar is called an indexed language.
Formally, an indexed grammar is a 5-tuple where
- is a finite alphabet of variables or nonterminal symbols
- is a finite alphabet of terminal symbols
- is the finite set of so-called flags (each flag is itself a set of so-called index productions)
- is the finite set of productions
- is the sentence symbol
Direct derivations are as follows:
- A production from matches a nonterminal followed by its (possibly empty) string of flags . In context, , via , derives to , where if was a nonterminal and the empty word otherwise. The old flags of are therefore copied to each new nonterminal produced by
- An index production matches (the flag it comes from must match the first symbol following the nonterminal) and copies the remaining index string to each new nonterminal: derives to , where is the empty word when is a nonterminal and otherwise.
A string with a non-terminal symbol with some index symbols on its stack, can be denoted as (using the [ and ] as metasymbols to denote the stack). In an indexed grammar, the application of a production rule such as would rewrite the string by replacing with , which copies the stack symbols from to each non-terminal in its replacement — and in this case. In the process, a symbol can be pushed to, or popped from, the stack before the stack is copied to the introduced non-terminals, which would be specified in the rule for the rewriting operation.
In practice, stacks of indices can count and remember what rules were applied and in which order. For example, indexed grammars can describe the context-sensitive language of word triples :
The derivation of is then
Linear indexed grammars
Gerald Gazdar has defined a second class, the linear indexed grammars, by requiring that at most one nonterminal in each production be specified as receiving the stack, whereas in a normal indexed grammar, all nonterminals receive copies of the stack. This new class of grammars defines a strictly smaller class of languages, which is mildly context-sensitive. Membership in a linear indexed grammar can be decided in polynomial time.
The language is generable by an indexed grammar, but not by a linear indexed grammar, while is generable by a linear indexed grammar.
Letting denote an arbitrary collection of stack symbols, we can define a grammar for the language  as
To derive the string we have the steps .
Similarly, for : .
The linearly indexed languages are a subset of the indexed languages, and thus all LIGs can be recoded as IGs, making the LIGs strictly less powerful than the IGs. A conversion from a LIG to an IG is relatively simple. LIG rules in general look approximately like , modulo the push/pop part of a rewrite rule. The symbols and represent strings of terminal and/or non-terminal symbols, and any non-terminal symbol in either must have an empty stack, by the definition of a LIG. This is, of course, counter to how IGs are defined: in an IG, the non-terminals whose stacks are not being pushed to or popped from must have exactly the same stack as the rewritten non-terminal. Thus, somehow, we need to have non-terminals in and which, despite having non-empty stacks, behave as if they had empty stacks.
Let's consider the rule as an example case. In converting this to an IG, the replacement for must be some that behaves exactly like regardless of what is. To achieve this, we can simply have a pair of rules that takes any where is not empty, and pops symbols from the stack. Then, when the stack is empty, it can be rewritten as .
We can apply this in general to derive an IG from an LIG. So for example if the LIG for the language is as follows:
The sentential rule here is not an IG rule, but using the above conversion algorithm, we can define new rules for , changing the grammar to:
Each rule now fits the definition of an IG, in which all the non-terminals in the right hand side of a rewrite rule receive a copy of the rewritten symbol's stack. The indexed grammars are therefore able to describe all the languages that linearly indexed grammars can describe.
Vijay-Shanker and Weir (1994) demonstrates that Linear Indexed Grammars, Combinatory Categorial Grammars, Tree-adjoining Grammars, and Head Grammars are weakly equivalent formalisms, in that they all define the same string languages.
Distributed Index (DI) grammars
Another form of indexed grammars, introduced by Staudacher (1993), is the class of Distributed Index grammars (DIGs). What distinguishes DIGs from Aho's Indexed Grammars is the propagation of indexes. Unlike Aho's IGs, which distribute the whole symbol stack to all non-terminals during a rewrite operation, DIGs divide the stack into substacks and distributes the substacks to selected non-terminals.
The general rule schema for a binarily distributing rule of DIG is the form
Where , , and are arbitrary terminal strings. For a ternarily distributing string:
And so forth for higher numbers of non-terminals in the right hand side of the rewrite rule. In general, if there are non-terminals in the right hand side of a rewrite rule, the stack is partitioned ways and distributed amongst the new non-terminals. Notice that there is a special case where a partition is empty, which effectively makes the rule a LIG rule. The Distributed Index languages are therefore a superset of the Linearly Indexed languages.
- Aho, Alfred (1968). "Indexed grammars—an extension of context-free grammars". Journal of the ACM 15 (4): 647–671. doi:10.1145/321479.321488.
- Gazdar, Gerald (1988). "Applicability of Indexed Grammars to Natural Languages". In U. Reyle and C. Rohrer. Natural Language Parsing and Linguistic Theories. pp. 69–94.
- Hopcroft, John; Jeffrey Ullman (1979). Introduction to automata theory, languages, and computation. Addison-Wesley. p. 390. ISBN 0-201-02988-X.
- Vijay-Shanker, K. and Weir, David J. 1994. The Equivalence of Four Extensions of Context-Free Grammars. Mathematical Systems Theory 27(6): 511-546.
- Staudacher, Peter. 1993. New frontiers beyond context-freeness: DI-grammars and DI-automata. In Sixth Conference of the European Chapter of the Association for Computational Linguistics (EACL '93).