LALR parser

From Wikipedia, the free encyclopedia
Jump to: navigation, search

In computer science, an LALR parser (or look-ahead LR parser) is a type of LR parser based on a finite-state-automata concept.[dubious ] The data structure[dubious ] used by an LALR parser is a pushdown automaton (PDA). A deterministic PDA is a deterministic finite automaton (DFA) with the addition of a stack for a memory, indicating which states the parser has passed through to arrive at the current state. Because of the stack, a PDA can recognize grammars that would be impossible with a DFA; for example, a PDA can determine whether an expression has any unmatched parentheses, whereas an automaton with no stack would require an infinite number of states due to unlimited nesting of parentheses.

LALR parsers are driven by a parser table in a finite-state machine (FSM) format. An FSM is tedious enough for humans to construct by hand that it is more convenient to use a software tool called an LALR parser generator to generate a parser table automatically from a grammar in Backus–Naur Form which defines the syntax of the computer language the parser will process[1]. The parser table is often generated in source code format in a computer language (such as C++ or Java). When the parser (with parser table) is compiled and/or executed, it will recognize files written in the language defined by the BNF grammar.

LALR parsers are generated from LALR grammars, which are capable of defining a larger class of languages than SLR grammars, but not as large a class as LR grammars. Real computer languages can often be expressed as LALR(1) grammars, and in cases where a LALR(1) grammar is insufficient, usually an LALR(2) grammar is adequate. If the parser generator handles only LALR(1) grammars, then the LALR parser will have to interface with some hand-written code when it encounters the special LALR(2) situation in the input language.

Contents

[edit] History

LR parsing was invented by Donald Knuth in 1965 in a paper, "On the Translation of Languages from Left to Right". LR parsers have the potential of providing the highest parsing speed of all parsing algorithms. However, LR parsers were once considered impractical because algorithms to generate LR parsers of tractable size were not known until the mid 1970s.[2]

LALR parsing was invented by Frank DeRemer in 1969 in a paper, "Practical LR(k) Translators". LALR parsers offer the same high performance of LR parsers, and produce much smaller tables than the early LR parser generation algorithms of the late 1960s. For this reason, the LALR algorithm became popular, and LALR parsers are the ones most often generated by compiler-compilers such as yacc and GNU bison.

(Compiler-compilers such as Menhir and HYacc that generate true LR parser tables using Pager's algorithm have appeared in recent years but have not seen widespread adoption -- their chief advantage is that they do not create spurious reduce/reduce conflicts for unambiguous deterministic grammars.)

[edit] Generating LALR Parsers

Similar to an SLR parser generator, an LALR parser generator constructs the LR(0) state machine first and then computes the lookahead sets for all rules in the grammar, checking for ambiguity. An SLR parser generator computes the lookahead sets by examining the BNF grammar (these are called follow sets). However, an LALR parser generator computes the lookahead sets by examining the LR(0) state machine (these are called LALR lookahead sets, which are more accurate and less likely to cause a conflict/ambiguity to be reported by the parser generator).

Like SLR, LALR is a refinement to the technique for constructing LR(0) parse tables. While SLR uses follow sets to construct reduce actions, LALR uses lookahead sets, which are more specific because they take more of the parsing context into account. Follow sets are associated with a symbol, while lookahead sets are specific to an LR(0) item and a parser state.

Specifically, the follow set for a given LR(0) item I in a given parser state S contains all symbols that are allowed by the grammar to appear after I's left-hand-side nonterminal. In contrast, the lookahead set for item I in state S contains only those symbols that are allowed by the grammar to appear after I's right-hand-side has been parsed starting from state S. follow(I) is effectively the union of the lookahead sets for all LR(0) items with the same left-hand-side as I, regardless of parser states or right-hand-sides, therefore losing all context information. Because the lookahead set is specific to a particular parsing context, it can be more selective, therefore allowing finer distinctions than the follow set.

Unfortunately, computing LALR lookahead sets is much more complicated than SLR. Frank DeRemer and Tom Pennello wrote a paper about this, published in SIGPLAN Notices in 1979 and in TOPLAS in 1982, called "Efficient Computation Of LALR(1) Look-Ahead Sets".

[edit] Advantages

  1. An LALR parser can be automatically generated from an LALR grammar.
  2. An LALR grammar can be used to define many computer languages.
  3. An LALR parser is small.
  4. An LALR parser is fast (if the parsing algorithm uses a matrix parser-table format).
  5. An LALR parser is linear in speed (i.e. the speed is based on the size of the input text file only and not based on the size of the language being recognized).
  6. The LALR grammar provides valuable documentation of the language being recognized.
  7. Error recovery may already be built-in to the parser.
  8. Generalized error messages may already be built into the parser.
  9. Abstract-syntax-tree construction may already be built into the parser.
  10. Recognition of context-sensitive language constructs may already be built into the parser.

[edit] Disadvantages

  1. Software engineers are required to use an LALR parser generator, which may or may not be user friendly and may require some learning time.
  2. Implementing meaningful error messages in the parser may be very difficult or impossible.
  3. Understanding the parsing algorithm is often quite difficult.
  4. If an error occurs, it may be difficult to determine whether it's in the grammar or the parser code.
  5. If there is an error in the parser generator, this may be very difficult to fix.

[edit] References

  1. ^ Levine, John; Mason, Tony; Brown, Doug. (1992). Lex and Yacc.. O'Reilly Books,. ISBN 1565920007. 
  2. ^ Pager, D., A Practical General Method for Constructing LR(k) Parsers. Acta Informatica 7, 249 - 268 (1977)
  • Alfred V. Aho, Ravi Sethi, and Jeffrey D. Ullman. Compilers: Principles, Techniques, and Tools Addison--Wesley, 1986. (AKA The Dragon Book, describes the traditional techniques for building LALR(1) parsers.)
  • Frank DeRemer and Thomas Pennello. Efficient Computation of LALR(1) Look-Ahead Sets ACM Transactions on Programming Languages and Systems (TOPLAS) 4:4, pp. 615–649. 1982. (Describes a more efficient technique for building LALR(1) parsers.)
  • Richard Bornat Understanding and Writing Compilers, Macmillan, 1979. (Describes the principles of automated left-to-right parsing and how to construct the parser tables, what a follow set is, etc., in English, not mathematics – available freely from the author's page at [1].)

[edit] See also

[edit] External links

  • Parsing Simulator This simulator is used to generate parsing tables LALR and resolve the exercises of the book.
  • JS/CC JavaScript based implementation of a LALR(1) parser generator, which can be run in a web-browser or from the command-line.
  • LALR(1) tutorial A flash card like tutorial on LALR(1) parsing.
Personal tools
Namespaces
Variants
Actions
Navigation
Interaction
Toolbox
Print/export
Languages