Left recursion

From Wikipedia, the free encyclopedia
Jump to: navigation, search

In the formal language theory of computer science, left recursion is a special case of recursion.

In terms of context-free grammar, a non-terminal r is left-recursive if the left-most symbol in any of r’s productions (‘alternatives’) either immediately (direct/immediate left-recursive) or through some other non-terminal definitions (indirect/hidden left-recursive) rewrites to r again.

Definition[edit]

"A grammar is left-recursive if we can find some non-terminal A which will eventually derive a sentential form with itself as the left-symbol."[1]

Immediate left recursion[edit]

Immediate left recursion occurs in rules of the form

A \to A\alpha \mid \beta

where \alpha and \beta are sequences of nonterminals and terminals, and \beta doesn't start with A. For example, the rule

\mathit{Expr} \to \mathit{Expr} + \mathit{Term}

is immediately left-recursive. The recursive descent parser for this rule might look like:

function Expr()
{  
    Expr();  match('+');  Term();
}

and a recursive descent parser would fall into infinite recursion when trying to parse a grammar which contains this rule.

Indirect left recursion[edit]

Indirect left recursion in its simplest form could be defined as:

A \to B\alpha \mid C
B \to A\beta \mid D,

possibly giving the derivation A \Rightarrow B\alpha \Rightarrow A\beta\alpha \Rightarrow \ldots

More generally, for the nonterminals A_0, A_1, \ldots, A_n, indirect left recursion can be defined as being of the form:

A_0 \to A_1\alpha_1 \mid \ldots
A_1 \to A_2\alpha_2 \mid \ldots
\cdots
A_n \to A_0\alpha_{n+1} \mid \ldots

where \alpha_1, \alpha_2, \ldots, \alpha_n are sequences of nonterminals and terminals.

Accommodating left recursion in top-down parsing[edit]

A formal grammar that contains left recursion cannot be parsed by a LL(k)-parser or other naive recursive descent parser unless it is converted to a weakly equivalent right-recursive form. In contrast, left recursion is preferred for LALR parsers because it results in lower stack usage than right recursion. However, more sophisticated top-down parsers can implement general context-free grammars by use of curtailment. In 2006, Frost and Hafiz described an algorithm which accommodates ambiguous grammars with direct left-recursive production rules.[2] That algorithm was extended to a complete parsing algorithm to accommodate indirect as well as direct left-recursion in polynomial time, and to generate compact polynomial-size representations of the potentially exponential number of parse trees for highly ambiguous grammars by Frost, Hafiz and Callaghan in 2007.[3] The authors then implemented the algorithm as a set of parser combinators written in the Haskell programming language.[4]

Removing left recursion[edit]

Removing immediate left recursion[edit]

The general algorithm to remove immediate left recursion follows. Several improvements to this method have been made, including the ones described in "Removing Left Recursion from Context-Free Grammars", written by Robert C. Moore.[5] For each rule of the form

A \rightarrow A\alpha_1\,|\,\ldots\,|\,A\alpha_n\,|\,\beta_1\,|\,\ldots\,|\,\beta_m

where:

  • A is a left-recursive nonterminal
  • \alpha is a sequence of nonterminals and terminals that is not null (\alpha \ne \epsilon )
  • \beta is a sequence of nonterminals and terminals that does not start with A.

replace the A-production by the production:

A \rightarrow \beta_1A^\prime\, |\, \ldots\,  |\,  \beta_mA^\prime

And create a new nonterminal

A^\prime \rightarrow \epsilon\, |\, \alpha_1A^\prime\,  |\,  \ldots\, |\, \alpha_nA^\prime

This newly created symbol is often called the "tail", or the "rest".

As an example, consider the rule

Expr \rightarrow Expr\,+\,Expr\,|\,Int\,|\,String

This could be rewritten to avoid left recursion as

Expr \rightarrow Int\,ExprRest\,|\,String\,ExprRest

ExprRest \rightarrow \epsilon\,|\,+\,Expr\,ExprRest

The last rule happens to be equivalent to the slightly shorter form

ExprRest \rightarrow \epsilon\,|\,+\,Expr

Removing indirect left recursion[edit]

If the grammar has no \epsilon-productions (no productions of the form A \rightarrow \ldots | \epsilon | \ldots ) and is not cyclic (no derivations of the form A \Rightarrow  \ldots \Rightarrow A for any nonterminal A), this general algorithm may be applied to remove indirect left recursion :

Arrange the nonterminals in some (any) fixed order A_1, \ldots A_n.

for i = 1 to n {
for j = 1 to i – 1 {
  • let the current A_j productions be
A_j \rightarrow \delta_1 | \ldots | \delta_k
  • replace each production A_i \rightarrow A_j \gamma by
A_i \rightarrow \delta_1\gamma | \ldots | \delta_k\gamma
}
  • remove direct left recursion for A_i
}

Pitfalls[edit]

The above transformations remove left-recursion by creating a right-recursive grammar; but this changes the associativity of our rules. Left recursion makes left associativity; right recursion makes right associativity. Example: We start with a grammar:

Expr \rightarrow Expr\,+\,Term\,|\,Term

Term \rightarrow Term\,*\,Factor\,|\,Factor

Factor \rightarrow (Expr)\,|\,Int

After having applied standard transformations to remove left-recursion, we have the following grammar:

Expr \rightarrow Term\ Expr'

Expr' \rightarrow {} + Term\ Expr'\,|\,\epsilon

Term \rightarrow Factor\ Term'

Term' \rightarrow {} * Factor\ Term'\,|\,\epsilon

Factor \rightarrow (Expr)\,|\,Int

Parsing the string 'a + a + a' with the first grammar in an LALR parser (which can recognize left-recursive grammars) would have resulted in the parse tree:

                            Expr
                         /   |   \
                       Expr   +   Term
                     /  |  \        \
                   Expr  + Term      Factor
                    |      |         |
                   Term    Factor    Int
                    |      |
                  Factor    Int
                    |
                   Int  

This parse tree grows to the left, indicating that the '+' operator is left associative, representing (a + a) + a.

But now that we've changed the grammar, our parse tree looks like this:

                            Expr ---
                           /        \
                         Term      Expr' --
                          |       /  |     \ 
                        Factor   +  Term   Expr' ------
                          |          |      |  \       \
                         Int       Factor   +  Term   Expr'
                                     |           |      |
                                    Int        Factor   \epsilon
                                                 |
                                                Int

We can see that the tree grows to the right, representing a + (a + a). We have changed the associativity of our operator '+', it is now right-associative. While this isn't a problem for the associativity of integer addition, it would have a significantly different value if this were subtraction.

The problem is that normal arithmetic requires left associativity. Several solutions are: (a) rewrite the grammar to be left recursive, or (b) rewrite the grammar with more nonterminals to force the correct precedence/associativity, or (c) if using YACC or Bison, there are operator declarations, %left, %right and %nonassoc, which tell the parser generator which associativity to force.

See also[edit]

References[edit]

  1. ^ Notes on Formal Language Theory and Parsing, James Power, Department of Computer Science National University of Ireland, Maynooth Maynooth, Co. Kildare, Ireland.JPR02
  2. ^ Frost, R.; R. Hafiz (2006). "A New Top-Down Parsing Algorithm to Accommodate Ambiguity and Left Recursion in Polynomial Time.". ACM SIGPLAN Notices 41 (5): 46–54. doi:10.1145/1149982.1149988. 
  3. ^ Frost, R.; R. Hafiz and P. Callaghan (June 2007). "Modular and Efficient Top-Down Parsing for Ambiguous Left-Recursive Grammars.". 10th International Workshop on Parsing Technologies (IWPT), ACL-SIGPARSE (Prague): 109–120. 
  4. ^ Frost, R.; R. Hafiz and P. Callaghan (January 2008). "Parser Combinators for Ambiguous Left-Recursive Grammars". 10th International Symposium on Practical Aspects of Declarative Languages (PADL), ACM-SIGPLAN. Lecture Notes in Computer Science 4902 (2008): 167–181. doi:10.1007/978-3-540-77442-6_12. ISBN 978-3-540-77441-9. 
  5. ^ Moore, Robert C. (May 2000). "Removing Left Recursion from Context-Free Grammars". 6th Applied Natural Language Processing Conference: 249–255. 

External links[edit]