|WikiProject Computer science|
Added a disadvantages section and edited the advantages. This article is in need of being balanced, scannerless parsing is a technique which makes sense in limited circumstances, usually when the language being parsed is very simple. Remember, there is a reason the lexer/parser distinction was made in the first place.
- lexer/parser distinction not neccesary: actually, yes it is, depending on what your needs are. as I said above, there was a very good reason it was developed, combining the 2 functions (scanning/parsing) with more complex languages becomes messy, harder to understand, develop, and maintain. Moved this into the introduction, changed semantics to explain when this technique is appropriate (as opposed to implying it's a universal truth)
- no keywordification: keywords are often included as a feature, and having a seperate lexer and parser doesn't mean you have to have keywords; scannerless parsing can do without them simply because it has less of the design constraints that make keywords attractive to implement in the first place. Also, many people would rightly consider keywords a feature, and not a requirement; go look up the early fortran days to get an inkling why. As such, moved this info to the token classification not required advantage.
I've been observing this article for a while, and I've been dismayed at how poor the article still is. It contains a number of factual mistakes and does not really explain anything. I'm reluctant to improve the article myself though, because I'm one of the researchers publishing on the merits of scannerless parsing. I have a few problems with the article:
- There is no decent explanation of the scanner/parsing process. This article should explain how in a traditional scanner/parser division a scanner splits up a character stream into tokens, and how the parser consumes the tokens.
- The article does not give any actual examples of cases where scannerless parsing is useful. The current list of applications is not correct. In fact, scannerless parsing is mostly useful for languages with a complex, context-sensitive lexical syntax. Typically, these are languages that involve a mixture of different sublanguages. We've published a series of papers on this: "Concrete Syntax for Objects" (OOPSLA'04) and "Declarative, Formal, and Extensible Syntax Definition for AspectJ - A Case for Scannerless Generalized-LR Parsing". The second paper in particular illustrates how the traditional scanner/parser separation breaks down on languages with a complex context-sensitive lexical syntax.
- The 'required extensions' section is largely focussed on language extensions in SDF/SGLR. Some of these extensions are not related to scannerless parsing at all. In particular: preference attributes (more an aspect of GLR) and per-production transitions (related to the priorities mechanism, which is unrelated to scannerless parsing).
I also think this article contains many errors. For example the advantage 'Grammars are compositional' is not related to Scannerless parsers: it is theoretically possible to have an LL(1)-parser that is scannerless, and LL(1) is not closed under composition. The third page of this paper describes it: http://www.springerlink.com/content/xugat38tyrxvtm9w/. So compositional grammars seem to be a feature of generalized parsers, instead of scannerless. It is related, because most scannerless parsers are generalized.
This page needs to back up its assertions with citations
I was quite gratified to discover this page, which broaches a topic I'd wondered about for ages, but never saw discussed. However, I regret that I was chagrined to discover that while it is replete with admirably many links to examples of the concept's use, it contains many unsupported general assertions yet cites at best only one external discussion of the theory. (And Visser's report does not pretend to cover the whole domain - just one solution).
I personally consider some of the unsupported assertions to be plausible. However even the bald assertions which I haven't singled out deserve to be backed up with cites.