LL grammar

From Wikipedia, the free encyclopedia
Jump to: navigation, search

In formal language theory, an LL grammar is a formal grammar that can be parsed by an LL parser, which parses the input from Left to right, and constructs a Leftmost derivation of the sentence (hence LL, compared with LR parser that constructs a rightmost derivation). A language that has an LL grammar is known as an LL language. These form subsets of deterministic context-free grammars (DCFGs) and deterministic context-free languages (DCFLs), respectively. One says that a given grammar or language "is an LL grammar/language" or simply "is LL" to indicate that it is in this class.

LL parsers are table-based parsers, similar to LR parsers. LL grammars can alternatively be characterized as precisely those that can be parsed by a predictive parser – a recursive descent parser without backtracking – and these can be readily written by hand. This article is about the formal properties of LL grammars; for parsing, see LL parser or recursive descent parser.

Relation to LL parsers[edit]

There is a separate LL(k) parser for each natural number k (0, 1, 2, ...). A LL parser is called a LL(k) parser if it uses k tokens of lookahead when parsing a sentence. A LL(k) parser recognizes the languages generated by some ε-free LL(k) grammar. As allowing more tokens of lookahead makes the parser strictly more powerful, the languages that can be recognized with a LL(k) parser are a strict subset of the languages that can be recognized by a LL(k+n), n > 0 parser. This creates a strictly increasing sequence of sets: LL(0) ⊊ LL(1) ⊊ LL(2) ⊊ …. Since these are all DCFLs, a corollary is that for any fixed k, there are DCFLs that cannot be recognized by a LL(k) parser.[1][2]

An LL parser is called an LL(*) parser if it is not restricted to a finite k tokens of lookahead, but can make parsing decisions by recognizing whether the following tokens belong to a regular language (for example by use of a Deterministic Finite Automaton), and accordingly there are the set of LL(*) grammars and the set of LL(*) languages.[citation needed]

Although ε-free LL(k) grammars are considered for LL(k) parsers, allowing ε-rules increases the expressive power of the grammar: For every ε-free LL(k+1) grammar, there exists a LL(k) grammar with ε-rules that generates the same language.[3]

Relation to other grammar classes[edit]

Every LL(k) grammar is also a LR(k) grammar. It is also decidable if a given LR(k) grammar is also a LL(m) grammar for some m.[4] A ε-free LL(1) grammar is also a SLR(1) grammar. A LL(1) grammar with symbols that have both empty and non-empty derivations is also a LALR(1) grammar. A LL(1) grammar with symbols that have only the empty derivation may or may not be LALR(1).[5]

LL grammars cannot have rules left recursion. Removing left recursion from a context-free grammar is always possible. However, the resulting grammar may be bigger, require more lookahead tokens than preferred to be an LL grammar, or not be an LL grammar at all. LL(k) grammars that are ε-free can be transformed into Greibach normal form (which by definition do not have rules with left recursion) without increasing the lookeahead tokens.[6]

Simple deterministic languages[edit]

The class of languages having an ε-free LL(1) grammar equals the class of simple deterministic languages.[7] The latter languages are characterized by that subset of the Greibach normal form grammars in which each rule has the form Za Y1...Yn, for n≥0, and for which the pairs (Z, a) are distinct among the rules. This language class includes the regular sets with end-markers.[8] This language class is the most general one for which the equivalence problem was known in 1966 to be solvable.[9]

Applications[edit]

LL grammars, particularly LL(1) grammars, are of great practical interest, as they are easy to parse, either by LL parsers or by recursive descent parsers, and many computer languages are designed to be LL(1) for this reason. Languages based on grammars with a high value of k have traditionally been considered[citation needed] to be difficult to parse, although this is less true now given the availability and widespread use[citation needed] of parser generators supporting LL(k) grammars for arbitrary k.

See also[edit]

References[edit]

  1. ^ Rosenkrantz, D. J., Stearns, R. E. (1969). "Properties of Deterministic Top Down Grammars". Proceedings of the First Annual ACM Symposium on Theory of Computing (STOC). ACM. pp. 165–180. 
  2. ^ Rosenkrantz & Stearns (1970, p. 247)
  3. ^ Rosenkrantz & Stearns (1970, p. 242)
  4. ^ Rozenkratz & Stearns (1970, pp. 254–255)
  5. ^ Beaty (1982)
  6. ^ Rozenkratz & Stearns (1970, p. 242)
  7. ^ Rosenkrantz & Stearns (1970, p. 243)
  8. ^ Hopcroft & Ullman (1979, p. 229)
  9. ^ Korenjak, A.J., Hopcroft, J.E. (1966). "Simple deterministic languages". IEEE Conf. Rec. 7th Ann. Symp. on Switching and Automata Theory (SWAT). IEEE Pub. No. 16-C-40. pp. 36–46. doi:10.1109/SWAT.1966.22.