LALR parser
In computer science, an LALR parser or Look-Ahead LR parser is a simplified version of a Canonical LR parser. It was invented by Frank DeRemer for his 1969 PhD. dissertation "Practical Translators for LR(k) languages" in order to address the practical difficulties of that time of implementing Canonical LR parsers. The simplification that takes place results in a parser with significantly reduced memory requirements but decreased language recognition power.[1] However, this power is enough for many real computer languages including Java.[2] The addition of some hand-written code, specific to the language being parsed, can improve the power of the LALR parser.
LALR parsers are automatically generated by compiler compilers such as Yacc and GNU Bison.
Contents |
History [edit]
In 1965, Donald Knuth invented the LR parser (Left to Right, Rightmost derivation). The LR parser can recognize any deterministic context-free language in linear-bounded time.[3] However, rightmost derivation has very large memory requirements and implementing an LR parser was impractical due to the limited memory of computers at that time. To address this shortcoming, in 1969, Frank DeRemer proposed two simplified versions of the LR parser, namely the Look-Ahead LR (LALR) and the Simple LR parser that had much lower memory requirements at the cost of less language recognition power with the LALR parser being the most powerful alternative.[1] Later, in 1977, memory optimizations for the LR parser were invented[4] but still the LR parser was less memory efficient than the simplified alternatives.
In 1979, Frank DeRemer and Tom Pennello announced a series of optimizations for the LALR parser that would further improve its memory efficiency.[5] The formal presentation of these optimizations was made in 1982.[6]
Overview [edit]
As with other types of LR parsers, an LALR parser is quite efficient at finding the single correct bottom-up parse in a single left-to-right scan over the input stream, because it doesn't need to use backtracking. Being a lookahead parser by definition, it always uses a lookahead, with LALR(1) being the most common case.
The LALR(1) parser uses production rules of the form
G -> S ⊥
S -> c A1 t
-> c A2 n
-> r A2 t
-> r A1 n
A1 -> a
A2 -> a
as is the case with any parser based on the LR(1) parser. The simplification that the LALR parser introduces, consists in merging rules that have identical kernel item sets, because during the LR(0) state-construction process the lookaheads are not known. This reduces the power of the parser because, as the following example depicts, not knowing the lookahead symbols can confuse the parser as to which grammar rule to pick next, resulting in reduce/reduce conflicts. Two states will be merged into one state and later the lookaheds will be found to be ambiguous. The one state with lookaheads is:
A1 -> a {t,n}
A2 -> a {t,n}
An LR(1) parser would have created two different states, neither of which is ambiguous. In a LALR parser the state sequence cannot be resolved because the parser encounters a duplicate rule, which is an error. The above grammar will be declared ambiguous by a LALR parser generator.
Implementation issues [edit]
Because the LALR parser performs a right derivation instead of the more intuitive left derivation, understanding how it works is quite difficult. This makes the process of finding a correct and efficient LALR grammar very demanding and time consuming.[citation needed] For the same reason error reporting can be quite hard because LALR parser errors cannot always be interpreted into messages meaningful for the end user.[citation needed] For this reason the recursive descent parser is sometimes preferred over the LALR parser. This parser requires more hand-written code because of less language recognition power. However, it does not have the special difficulties of the LALR parser because it performs left-derivation. Notable examples of this phenomenon are the C and C++ parsers of GCC. They started as LALR parsers but were later changed to recursive descent parsers.[7][8]
See also [edit]
- LR parser
- Parser generator
- LALR parser generator
- Comparison of parser generators
- Lookahead in parsing
- LL parser
References [edit]
- ^ a b Franklin L. DeRemer (1969). "Practical Translators for LR(k) languages". MIT, PhD Dissertation.
- ^ "Generate the Parser". Eclipse JDT Project. Retrieved 29 June 2012.
- ^ Knuth, Donald (July 1965). "On the Translation of Languages from Left to Right". Information and Control 8: 707 – 639. Retrieved 29 May 2011.
- ^ Pager, D. (1977), "A Practical General Method for Constructing LR(k) Parsers", Acta Informatica 7: 249–268
- ^ Frank DeRemer, Thomas Pennello (1979), "Efficient Computation of LALR(1) Look-Ahead Sets", Sigplan Notices - SIGPLAN, vol. 14, no. 8: 176–187
- ^ Frank DeRemer, Thomas Pennello (1982). "Efficient Computation of LALR(1) Look-Ahead Sets". TOPLAS vol. 4, no. 4. pp. 615–649.
- ^ GCC 3.4 Release Series Changes, New Features, and Fixes
- ^ GCC 4.1 Release Series Changes, New Features, and Fixes
External links [edit]
- Parsing Simulator This simulator is used to generate parsing tables LALR and resolve the exercises of the book.
- JS/CC JavaScript based implementation of a LALR(1) parser generator, which can be run in a web-browser or from the command-line.
- LALR(1) tutorial A flash card like tutorial on LALR(1) parsing.