ANTLR

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by Joseph.redfern (talk | contribs) at 15:09, 18 July 2013 (Bumped version number & release data, mentioned that ANTLR 4.1 now targets C#, too.). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

ANTLR
Original author(s)Terence Parr and others
Initial releaseFebruary 1992
Stable release
4.1 / 3.5 / June 30, 2013; 10 years ago (2013-06-30) / January 5, 2013; 11 years ago (2013-01-05)
Repository
Written inJava
PlatformCross-platform
LicenseBSD License
Websitewww.antlr.org

In computer-based language recognition, ANTLR (pronounced Antler), or ANother Tool for Language Recognition, is a parser generator that uses LL(*) parsing. ANTLR is the successor to the Purdue Compiler Construction Tool Set (PCCTS), first developed in 1989, and is under active development. Its maintainer is professor Terence Parr of the University of San Francisco.

ANTLR takes as input a grammar that specifies a language and generates as output source code for a recognizer for that language. While version 3 supported generating code in the programming languages Ada95, ActionScript, C, C#, Java, JavaScript, Objective-C, Perl, Python, Ruby, and Standard ML,[1] the current release at present only targets Java and C#. A language is specified using a context-free grammar which is expressed using Extended Backus–Naur Form (EBNF).

ANTLR allows generating lexers, parsers, tree parsers, and combined lexer-parsers. Parsers can automatically generate abstract syntax trees which can be further processed with tree parsers. ANTLR provides a single consistent notation for specifying lexers, parsers, and tree parsers. This is in contrast with other parser/lexer generators and adds greatly to the tool's ease of use.

By default, ANTLR reads a grammar and generates a recognizer for the language defined by the grammar (i.e. a program that reads an input stream and generates an error if the input stream does not conform to the syntax specified by the grammar). If there are no syntax errors, then the default action is to simply exit without printing any message. In order to do something useful with the language, actions can be attached to grammar elements in the grammar. These actions are written in the programming language in which the recognizer is being generated. When the recognizer is being generated, the actions are embedded in the source code of the recognizer at the appropriate points. Actions can be used to build and check symbol tables and to emit instructions in a target language, in the case of a compiler.

As well as lexers and parsers, ANTLR can be used to generate tree parsers. These are recognizers that process abstract syntax trees which can be automatically generated by parsers. These tree parsers are unique to ANTLR and greatly simplify the processing of abstract syntax trees.

ANTLR 3 is free software, published under a three-clause BSD License. Prior versions were released as public domain software.[2]

While ANTLR itself is free, the documentation necessary to use it is not. The ANTLR manual is a commercial book, The Definitive ANTLR Reference. Free documentation is limited to a handful of tutorials, code examples, and very basic API listings.

Several plugins have been developed for the Eclipse development environment to support the ANTLR grammar. There is ANTLR Studio, a proprietary product, as well as the ANTLR 2 and 3 plugins for Eclipse hosted on SourceForge.

ANTLR 4

ANTLR (ANother Tool for Language Recognition) is a parser generator for reading, processing, executing, or translating structured text or binary files. It's widely used to build languages, tools, and frameworks. From a grammar, ANTLR generates a parser that can build and walk parse trees.

ANTLR v4 deals with left recusion correctly and supports actions and attributes powerfully and flexibly. You can get more information in The Definitive ANTLR 4 Reference.

These passages excerpt from the preface of the ANTLR v4 book:

ANTLR is a powerful parser generator that you can use to read, process, execute, or translate structured text or binary files. It’s widely used in academia and industry to build all sorts of languages, tools, and frameworks. Twitter search uses ANTLR for query parsing, with over 2 billion queries a day. The languages for Hive and Pig, the data warehouse and analysis systems for Hadoop, both use ANTLR. Lex Machina uses ANTLR for information extraction from legal texts. Oracle uses ANTLR within SQL Developer IDE and their migration tools. NetBeans IDE parses C++ with ANTLR. The HQL language in the Hibernate object-relational mapping framework is built with ANTLR.

Aside from these big-name, high-profile projects, you can build all sorts of useful tools like configuration file readers, legacy code converters, wiki markup renderers, and JSON parsers. I’ve built little tools for object-relational database mappings, describing 3D visualizations, injecting profiling code into Java source code, and have even done a simple DNA pattern matching example for a lecture.

From a formal language description called a grammar, ANTLR generates a parser for that language that can automatically build parse trees, which are data structures representing how a grammar matches the input. ANTLR also automatically generates tree walkers that you can use to visit the nodes of those trees to execute application-specific code.

There are thousands of ANTLR downloads a month and it is included on all Linux and OS X distributions. ANTLR is widely used because it's easy to understand, powerful, flexible, generates human-readable output, comes with complete source under the BSD license, and is actively supported.

ANTLR has contributed to the theory and practice of parsing including:

  • linear approximate lookahead
  • semantic and syntactic predicates
  • ANTLRWorks
  • tree parsing
  • LL(*)
  • Adaptive LL(*) in ANTLR v4 (paper coming soon)

Where is it used?

Here is a non-comprehensive list of software built using ANTLR:

See also

References

Bibliography

  • Parr, Terence (May 17, 2007), The Definitive Antlr Reference: Building Domain-Specific Languages (1st ed.), Pragmatic Bookshelf, p. 376, ISBN 0-9787392-5-6
  • Parr, Terence (December, 2009), Language Implementation Patterns: Create Your Own Domain-Specific and General Programming Languages (1st ed.), Pragmatic Bookshelf, p. 374, ISBN 978-1-934356-45-6 {{citation}}: Check date values in: |date= (help)
  • Parr, Terence (January 15 2013), The Definitive ANTLR 4 Reference (1st ed.), Pragmatic Bookshelf, p. 328, ISBN 978-1-93435-699-9 {{citation}}: Check date values in: |date= (help)

Further reading

  • T. J. Parr, R. W. Quong, ANTLR: A Predicated-LL(k) Parser Generator, Software—Practice and Experience, Vol. 25(7), 789–810 (July 1995)

External links