Backus–Naur Form

From Wikipedia, the free encyclopedia
  (Redirected from Backus–Naur form)
Jump to: navigation, search
Not to be confused with BCNF or Boyce–Codd normal form.

In computer science, BNF (Backus Normal Form or Backus–Naur Form) is one of the two[1] main notation techniques for context-free grammars, often used to describe the syntax of languages used in computing, such as computer programming languages, document formats, instruction sets and communication protocols; the other main technique for writing context-free grammars is the van Wijngaarden form. They are applied wherever exact descriptions of languages are needed: for instance, in official language specifications, in manuals, and in textbooks on programming language theory.

Many extensions and variants of the original Backus–Naur notation are used; some are exactly defined, including Extended Backus–Naur Form (EBNF) and Augmented Backus–Naur Form (ABNF).

History[edit]

The idea of describing the structure of language using rewriting rules can be traced back to at least the work of Pāṇini (who lived sometime between the 7th and 4th century BC).[2] His notation to describe Sanskrit word structure notation is equivalent in power to that of Backus and has many similar properties.

In Western society, grammar was long regarded as a subject for teaching, rather than scientific study; descriptions were informal and targeted at practical usage. In the first half of the 20th century, linguists such as Leonard Bloomfield and Zellig Harris started attempts to formalize the description of language, including phrase structure.

Meanwhile, string rewriting rules as formal, abstract systems were introduced and studied by mathematicians such as Axel Thue (in 1914), Emil Post (1920s–40s) and Alan Turing (1936). Noam Chomsky, teaching linguistics to students of information theory at MIT, combined linguistics and mathematics, by taking what is essentially Thue's formalism as the basis for the description of the syntax of natural language. He also introduced a clear distinction between generative rules (those of context-free grammars) and transformation rules (1956).[3][4]

John Backus, a programming language designer at IBM, proposed "metalinguistic formulas"[5][6] to describe the syntax of the new programming language IAL, known today as ALGOL 58 (1959), using the BNF notation. BNF is a notation for Chomsky's context-free grammars. Apparently, Backus was familiar with Chomsky's work.[7]

Further development of ALGOL led to ALGOL 60; in its report (1963), Peter Naur named Backus's notation Backus Normal Form, and simplified it to minimize the character set used. However, Donald Knuth argued that BNF should rather be read as Backus–Naur Form, as it is "not a normal form in the conventional sense",[8] unlike, for instance, Chomsky Normal Form. The name Pāṇini Backus form has also been suggested in view of the facts that the expansion Backus Normal Form may not be accurate, and that Pāṇini had independently discovered a similar notation earlier. [9]

Introduction[edit]

A BNF specification is a set of derivation rules, written as

 <symbol> ::= __expression__

where <symbol> is a nonterminal, and the __expression__ consists of one or more sequences of symbols; more sequences are separated by the vertical bar, '|', indicating a choice, the whole being a possible substitution for the symbol on the left. Symbols that never appear on a left side are terminals. On the other hand, symbols that appear on a left side are non-terminals and are always enclosed between the pair <>.

The '::=' means that the symbol on the left must be replaced with the expression on the right.

Example[edit]

As an example, consider this possible BNF for a U.S. postal address:

 <postal-address> ::= <name-part> <street-address> <zip-part>
 
      <name-part> ::= <personal-part> <last-name> <opt-suffix-part> <EOL> 
                    | <personal-part> <name-part>
 
  <personal-part> ::= <first-name> | <initial> "."
 
 <street-address> ::= <house-num> <street-name> <opt-apt-num> <EOL>
 
       <zip-part> ::= <town-name> "," <state-code> <ZIP-code> <EOL>
 
<opt-suffix-part> ::= "Sr." | "Jr." | <roman-numeral> | ""
    <opt-apt-num> ::= <apt-num> | ""

This translates into English as:

  • A postal address consists of a name-part, followed by a street-address part, followed by a zip-code part.
  • A name-part consists of either: a personal-part followed by a last name followed by an optional suffix (Jr., Sr., or dynastic number) and end-of-line, or a personal part followed by a name part (this rule illustrates the use of recursion in BNFs, covering the case of people who use multiple first and middle names and/or initials).
  • A personal-part consists of either a first name or an initial followed by a dot.
  • A street address consists of a house number, followed by a street name, followed by an optional apartment specifier, followed by an end-of-line.
  • A zip-part consists of a town-name, followed by a comma, followed by a state code, followed by a ZIP-code followed by an end-of-line.
  • A opt-suffix-part consists of a suffix, such as "Sr.", "Jr." or a roman-numeral, or an empty string (i.e. nothing).
  • A opt-apt-num consists of an apartment number or an empty string (i.e. nothing).

Note that many things (such as the format of a first-name, apartment specifier, ZIP-code, and Roman numeral) are left unspecified here. If necessary, they may be described using additional BNF rules.

Further examples[edit]

BNF's syntax itself may be represented with a BNF like the following:

 <syntax>         ::= <rule> | <rule> <syntax>
 <rule>           ::= <opt-whitespace> "<" <rule-name> ">" <opt-whitespace> "::=" <opt-whitespace> <expression> <line-end>
 <opt-whitespace> ::= " " <opt-whitespace> | ""
 <expression>     ::= <list> | <list> "|" <expression>
 <line-end>       ::= <opt-whitespace> <EOL> | <line-end> <line-end>
 <list>           ::= <term> | <term> <opt-whitespace> <list>
 <term>           ::= <literal> | "<" <rule-name> ">"
 <literal>        ::= '"' <text> '"' | "'" <text> "'"

Note that "" is empty string, i.e. no whitespace.

The original BNF did not use quotes as shown in <literal> rule.

This assumes that no whitespace is necessary for proper interpretation of the rule. <EOL> represents the appropriate line-end specifier (in ASCII, carriage-return and/or line-feed, depending on the operating system). <rule-name> and <text> are to be substituted with a declared rule's name/label or literal text, respectively.

In the U.S. postal address example above, the entire block-quote is a syntax. Each line or unbroken grouping of lines is a rule; for example one rule begins with "<name-part> ::=". The other part of that rule (aside from a line-end) is an expression, which consists of two lists separated by a pipe "|". These two lists consists of some terms (three terms and two terms, respectively). Each term in this particular rule is a rule-name.

Variants[edit]

There are many variants and extensions of BNF, generally either for the sake of simplicity and succinctness, or to adapt it to a specific application. One common feature of many variants is the use of regular expression repetition operators such as * and +. The Extended Backus–Naur Form (EBNF) is a common one. In fact the example above is not the pure form invented for the ALGOL 60 report. The bracket notation "[ ]" was introduced a few years later in IBM's PL/I definition but is now universally recognised. ABNF and RBNF are other extensions commonly used to describe Internet Engineering Task Force (IETF) protocols.

Parsing expression grammars build on the BNF and regular expression notations to form an alternative class of formal grammar, which is essentially analytic rather than generative in character.

Many BNF specifications found online today are intended to be human readable and are non-formal. These often include many of the following syntax rules and extensions:

  • Optional items enclosed in square brackets: [<item-x>].
  • Items repeating 0 or more times are enclosed in curly brackets or suffixed with an asterisk ('*'), such as "<word> ::= <letter> {<letter>}" or "<word> ::= <letter> <letter>*", respectively.
  • Items repeating 1 or more times are suffixed with an addition (plus) symbol ('+').
  • Terminals may appear in bold rather than italics, and nonterminals in plain text rather than angle brackets.
  • Alternative choices in a production are separated by the ‘|’ symbol: <alternative-A> | <alternative-B>.
  • Where items are grouped, they are enclosed in simple parentheses.

See also[edit]

Software using BNF[edit]

  • ANTLR, another parser generator written in Java
  • BNF Converter (BNFC[10])
  • Coco/R, compiler generator accepting an attributed grammar in EBNF
  • DMS Software Reengineering Toolkit, program analysis and transformation system for arbitrary languages
  • GOLD BNF parser
  • GNU bison, GNU version of yacc
  • RPA BNF parser.[11] Online (PHP) demo parsing: JavaScript, XML
  • XACT X4MR System,[12] a rule-based expert system for programming language translation
  • XPL Analyzer, a tool which accepts simplified BNF for a language, and which produces a parser for that language in XPL, and which may be integrated into the supplied SKELETON program, with which the language may be debugged[13] (SHARE contributed program, which was preceded by A Compiler Generator, ISBN 978-0-13-155077-3, which see)
  • Yacc parser generator (used with Lex pre-processor)
  • bnfparser2,[14] a universal syntax verification utility
  • bnf2xml Markup input with XML tags using advanced BNF matching.
  • JavaCC Java Compiler Compiler tm (JavaCC tm) - The Java Parser Generator

References[edit]

This article is based on material taken from the Free On-line Dictionary of Computing prior to 1 November 2008 and incorporated under the "relicensing" terms of the GFDL, version 1.3 or later.

  1. ^ Grune, Dick (1999). Parsing Techniques: A Practical Guide. US: Springer. 
  2. ^ "Panini biography". School of Mathematics and Statistics, University of St Andrews, Scotland. Retrieved 2014-03-22. 
  3. ^ Chomsky, Noam (1956). "Three models for the description of language" (PDF). IRE Transactions on Information Theory 2 (2): 113–24. doi:10.1109/TIT.1956.1056813. 
  4. ^ Chomsky, Noam (1957). Syntactic Structures. The Hague: Mouton. 
  5. ^ Backus, J.W. (1959). "The syntax and semantics of the proposed international algebraic language of the Zurich ACM-GAMM Conference". Proceedings of the International Conference on Information Processing. UNESCO. pp. 125–132. 
  6. ^ Farrell, James A. (August 1995). "Compiler Basics". Extended Backus Naur Form. Archived from the original on 5 June 2011. Retrieved May 11, 2011. 
  7. ^ Fulton, III, Scott M. (20 March 2007). "John W. Backus (1924 - 2007)". betanews.com. BetaNews. Inc. Retrieved Jun 3, 2014. 
  8. ^ Knuth, Donald E. (1964). "Backus Normal Form vs. Backus Naur Form". Communications of the ACM 7 (12): 735–736. doi:10.1145/355588.365140. 
  9. ^ Ingerman, P. Z. (1967). ""Pāṇini Backus Form" suggested". Communications of the ACM 10 (3): 137. doi:10.1145/363162.363165. 
  10. ^ "BNFC", Language technology, SE: Chalmers 
  11. ^ "Online demo", RPatk 
  12. ^ "Tools", Act world, archived from the original on 2006-01-02 
  13. ^ If the target processor is System/360, or related, even up to z/System, and the target language is similar to PL/I (or, indeed, XPL), then the required code "emitters" may be adapted from XPL's "emitters" for System/360.
  14. ^ "BNF parser²", Source forge (project) 

External links[edit]

Language grammars[edit]