Talk:Recursive descent parser

From Wikipedia, the free encyclopedia
Jump to: navigation, search
WikiProject Computer science (Rated Start-class, High-importance)
WikiProject icon This article is within the scope of WikiProject Computer science, a collaborative effort to improve the coverage of Computer science related articles on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
Start-Class article Start  This article has been rated as Start-Class on the project's quality scale.
 High  This article has been rated as High-importance on the project's importance scale.
 

Untitled[edit]

This page really needs a discussion of the problem of left-recursive grammar rules. Such rules are extremely common, as for example

EXPR = INT | EXPR + EXPR | EXPR * EXPR;

A recursive descent parser, presented with this grammar, will loop forever on a malformed input.

Should probably mention that hand-written recursive descent parsers are usually based on EBNF grammars? These support left-recursion by turning

    A = A b C | C

into

    A = C (b C)*

Paolo Bonzini, 09:34, 8 Mar 2006 (UTC)

Recursive descent parser example[edit]

Dominus 04:44, 28 Aug 2003 (UTC)

The code on the page is wrong. Parse_E will parse the string NOTTRUETRUE which is not part of the language.

I don't think it does. Care to demonstrate? -- Jan Hidders 11:22, 28 Aug 2003 (UTC)
maybe the problem is hidden behind the if's, which should be else-if's??
Ah, yes, of course. Hope I fixed that now. -- Jan Hidders 00:51, 4 Nov 2004 (UTC)

Does it accept 1*-1? I'm not sure it does[edit]

Does not look like it accepts this:
// ident := number * - number .

Correct.

Symbol symbols[7] = {ident,becomes,number,times,minus,number,period};
Should it?

The parser is based on the [PL/0] grammar (a compiler used in many compiler construction courses), which is in turn based on Pascal. For reasons unknown to me, standard Pascal does not allow imbedded unary expressions. For instance:
ident := -ident * number;
is allowed, however, expressions such as yours are not.
To allow this, you could add "-|+" to the factor production, as in:
factor = 
         ident 
       | ("+"|"-") factor
       | number 
       | "(" expression ")"
And change the parser appropriately.
Since the [PL/0] grammar is so well known, I'm of the opinion that the grammar should be left as is. 68.219.72.181 13:47, 17 April 2006 (UTC)

Question about See Also entry[edit]

There is a new See Also entry that I have a question about.

It points to a 'source code' site (nothing wrong with that), but it doesn't seem to add very much. It is essentially the example, copied from this wikipedia site (without any notice of the copying, by the way), plus another simple example.

Should this link be deleted? 67.34.42.125 12:27, 22 May 2006 (UTC)

PEGs[edit]

I have removed references to parsing expression grammars in this discussion on recursive descent parsing, as the content referring to PEGs, at least as formulated, tended to give the impression that PEGs are a widely adopted formalism. Parsing expression grammars, while analytic, are a formal grammar, and RD parsing is a parsing strategy (algorithm). QTJ 04:36, 11 October 2006 (UTC)

Bug in the parser code[edit]

The code inside block(), after "while (accept(procsym))" seems to be nonsense, right? There are 2 nested while-loops and as far as i understand the grammar only the outer one should be there. Catskineater (talk) 17:28, 30 March 2008 (UTC)

SML example[edit]

I just wrote one of these in SML for an assignment. Should I post it here under Implementation in functional languages? bkhl (talk) 07:47, 13 May 2008 (UTC)

Recursive descent with backup[edit]

The "recursive descent with backup" discussion, while correct, troubles me. "X with backup", for any parsing algorithm X, will parse any context-free language. It will do so, except in a very few cases, at the expense of every virtue which the original algorithm X had.

Perhaps the "backup" discussion might be moved later in the article. I'd almost suggest deleting it, but the topic does come up. Almost automatically, whenever algorithm X proves inadequate, backup will be suggested. Quite often, it will be tried. My wishing it were not so, does not help. So some kind of discussion of "backup" as an alternative probably needs to be kept.

--Jeffreykegler (talk) 03:23, 26 May 2010 (UTC)

Expansion necessary[edit]

The example uses no token look ahead at all and thus is not really helpful to understand the strength of recursive descent parsers. There should at least be a bit of ambiguity in the grammar, otherwise the example is too trivial and - even worse - points the reader into wrong directions. — Preceding unsigned comment added by 139.30.5.126 (talk) 18:33, 27 March 2012 (UTC)

The article doesn't explain what a Recursive descent parser is[edit]

Sorry to nag, but this article doesn't provide an answer to the simple question of what a Recursive descent parser is. I'm sure that the information is technically correct in a formal way, but an encyclopedia article really should provide some understanding of the subject to an interested reader. I suspect the authors are very talented people that are communicating among themselves, but to the average person this is gobbledygook. I've been programming since 1972, know FORTRAN, Pascal, C, assembly languages for the PDP-8, CDC 6400 and 6600, 8080 and 8086 Intel chips, Python, APL, PHP, and probably a few others, and I don't know what a recursive descent parser is after reading this article. If I'm lost, many others are also going to be left bewildered by the article. Kd4ttc (talk) 04:54, 22 November 2012 (UTC)

Another bug in example code[edit]

The grammar rules suggest that the following string should be accepted by this grammar:

begin foo := 5 ; end

i.e. The grammar rules accepts (in fact, requires) a semicolon between the "5" number token and the "end" endsym token whereas the implementation of the relative portion of the "statement" function looks something like:

if (accept(beginsym))
{
    do
    {
        statement();
    }
    while (accept(semicolon));
    expect(endsym);
}

This only accepts strings of the form:

"begin" statement { ";" statement } "end"

Or, am I missing something? This example code is as old as the hills so it seems unlikely that such a bug could've been around for so long, but I guess it's possible.

Update: The version of this example parser at the URL http://www.princeton.edu/~achaney/tmve/wiki100k/docs/Recursive_descent_parser.html has a modified version of the grammar to reflect this. I'll update the article when I get the chance.

Update: I updated the grammar. It now matches the grammar as specified on the PL/0 page (without the | "?" ident | "!" expression rule) and corresponds to the C implementation given.

--Richardcook (talk) 15:45, 18 July 2014 (UTC)

Bad description[edit]

I know we have the accepted description. But it doesn't really describe most real world implementations.

The first implementations of these parsing languages were most likely the metacompiler's developed by Val Schorre. These are a type of TDPL languages.

The languages are very close to the example language used. $ is used for zero or more instead of { } and ( ) are used for all grouping.

expr = term $(('+'/'-') term);
term = factor $(('*'/'/') factor);
factor = .number/.id/'('expr')';

These languages included transforming constructs not shown above.

Each rule, like TDPLs, is a function returning success or failure. They read the source program as an input character stream and output the transformed code.

These are not substitution rules. They are analitical rules. Exactly the posited of production rules.

Second generation Schorre metacompilers transformed the input into abstract syntax trees:

expr = term $(('+':ADD/'-':AUB) term! 2);
term = factor $(('*':MPY/'/':DIV) factor! 2);
factor = .number/.id/'('expr')';

The ':' operator created a tree node. The '!' operator created a tree using the most recent created node and some number of parsed constructs specified following the '!'. Nodes were placed on a stack when created and removed by a the : opetator. The ! operator removed parse stack entries and pushed the newly created tree on the parse stack. Recognized tokens id and number were placed on the parse stack. The rules operating on the string "A + B" would have recognized A and B placing them on the parse stack. The ":" operator upon recognition of the + would of created an ADD node and pushed or on the node stack. The !2 removes the top node stack "ADD" and top 2 parse stack entries A and B. Combines them creating a tree ADD[A, B] and pushes it on the parse stack.

These transform by analysis the input into an abstract syntax tree. Steamerandy (talk) 20:04, 24 December 2014 (UTC)

The Schorre line of metacompilers illiterate a top down parsing language. I have seen and been classing them as recursive decent. But in all recursive decent examples they are hand coded.

They are explained as inferior to LR parsers.

In my experience there has been no need of undoing any previous action except in the case of an error in the input. Any parser has the same problem.

In my experience Tree Meta and CWIC are more efficient then LR parsers. All the Schorre metacompilers provide at least 1 symbol look head. The CWIC expr rules:

expr = term $(('+':ADD | '-':SUB) term!2);
term = factor $(('*':MPY | '/':DIV) factor!2);
factor = '(' expr ')' | number | id;

The above will parse an arithmetic expression and build an abstract syntax tree.

In CWIC there is backtracking limited only by book keeping storage. A higher level rule would use a backtracking alternate to catch an error. The parsed structure would be released and the input reset to backtrack point. A look ahead test would be used to find a restart point. Usually a negative match looking for a statement termination character.

skip = $(-';' .any) ';';

The above would skip over any character not a ;. The - operator nagates the success or failure of the test that follows. In either case the parse is not advanced. The .any matches any character. Literal string matches are normally not kept. The match '+' in the expr rule will match the character leaving the parse on the character following the matched character. There is a lot to these metacompilers. But the point I am trying to get at is they are programed. They are grammar rules that have a deterministic translation. They are logic equations returning success or failure. True or false if you will. A rule's operation, code wise, is known. The way one writes the rules determines the order of alternative tests etc.

The way these systems transform the input directly into an abstract syntax tree is direct eleminating the generation of a parse tree and reprocessing it to get the abstract syntax tree.

A fuller description of Schorre Metacompilers can be found in my personal space: Schorre Meta languages Steamerandy (talk) 11:07, 8 September 2015 (UTC)