Jump to content

GNU Bison

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by 93.97.48.95 (talk) at 21:35, 26 March 2012. The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

GNU Bison
Developer(s)The GNU Project
Repository
Operating systemCross-platform
TypeParser generator
LicenseGPL (free software)
Websitewww.gnu.org/software/bison

GNU bison, commonly known as Bison, is a parser generator that is part of the GNU Project. Bison reads a specification of a context-free language, warns about any parsing ambiguities, and generates a parser (either in C, C++, or Java) which reads sequences of tokens and decides whether the sequence conforms to the syntax specified by the grammar. Bison[1] generates LALR parsers. Bison also supports “Generalized Left-to-right Rightmost” (GLR) parsers for grammars that are not LALR.

In POSIX mode, Bison is compatible with yacc, but also supports several improvements over this earlier program. flex, an automatic lexical analyser, is often used with Bison, to tokenise input data and provide Bison with tokens.

Bison is licensed as free software and is available in source code form. Earlier releases of bison used to stipulate that parts of its output were protected under the GPL, due to the inclusion of the yyparse() function from the original source code in the output. However, an exception was made, to allow other licenses to apply to the use of the output.[2]

The name is based on the joke that a bison "does the same thing as a yak".

A complete reentrant parser example

The following example shows how to use bison and flex to write a simple calculator program (only addition and multiplication) and a program for creating an abstract syntax tree. The next two files provide definition and implementation of the syntax tree functions.

/*
 * Expression.h
 * Definition of the structure used to build the syntax tree.
 */
#ifndef __EXPRESSION_H__
#define __EXPRESSION_H__

/**
 * @brief The operation type
 */
typedef enum tagEOperationType
{
    eVALUE,
    eMULTIPLY,
    ePLUS
}EOperationType;

/**
 * @brief The expression structure
 */
typedef struct tagSExpression
{
    EOperationType type;///< type of operation

    int value;///< valid only when type is eVALUE
    struct tagSExpression* left; ///< left side of the tree
    struct tagSExpression* right;///< right side of the tree
}SExpression;

/**
 * @brief It creates an identifier
 * @param value The number value
 * @return The expression or NULL in case of no memory
 */
SExpression* createNumber(int value);

/**
 * @brief It creates an operation
 * @param type The operation type
 * @param left The left operand
 * @param right The right operand
 * @return The expression or NULL in case of no memory
 */
SExpression* createOperation(
    EOperationType type,
    SExpression *left,
    SExpression *right);

/**
 * @brief Deletes a expression
 * @param b The expression
 */
void deleteExpression(SExpression *b);

#endif // __EXPRESSION_H__
/*
 * Expression.c
 * Implementation of functions used to build the syntax tree.
 */

#include "Expression.h"

#include <stdlib.h>

/**
 * @brief Allocates space for expression
 * @return The expression or NULL if not enough memory
 */
static SExpression* allocateExpression()
{
    SExpression* b = malloc(sizeof *b);

    if (b == NULL)
        return NULL;

    b->type = eVALUE;
    b->value = 0;

    b->left = NULL;
    b->right = NULL;

    return b;
}

SExpression* createNumber(int value)
{
    SExpression* b = allocateExpression();

    if (b == NULL)
        return NULL;

    b->type = eVALUE;
    b->value = value;

    return b;
}

SExpression *createOperation(
    EOperationType type,
    SExpression *left,
    SExpression *right)
{
    SExpression* b = allocateExpression();

    if (b == NULL)
        return NULL;

    b->type = type;
    b->left = left;
    b->right = right;

    return b;
}

void deleteExpression(SExpression *b)
{
    if (b == NULL)
        return;

    deleteExpression(b->left);
    deleteExpression(b->right);

    free(b);
}

Since the tokens are provided by flex we must provide the means to communicate between the parser and the lexer. This is accomplished by defining the YYSTYPE. More details on this can be found on the flex manual.[3]

/*
 * TypeParser.h
 * Definition of the structure used internally by the parser and lexer
 * to exchange data.
 */

#ifndef __TYPE_PARSER_H__
#define __TYPE_PARSER_H__

#include "Expression.h"

/**
 * @brief The structure used by flex and bison
 */
typedef union tagTypeParser
{
    SExpression *expression;
    int value;
}STypeParser;

// define the type for flex and bison
#define YYSTYPE STypeParser

#endif // __TYPE_PARSER_H__

Since in this sample we use the reentrant version of both flex and yacc we are forced to provide parameters for the yylex function, when called from yyparse.[3]

/*
 * ParserParam.h
 * Definitions of the parameters for the reentrant functions
 * of flex (yylex) and bison (yyparse)
 */

#ifndef __PARSERPARAM_H__
#define __PARSERPARAM_H__

#ifndef YY_NO_UNISTD_H
#define YY_NO_UNISTD_H 1
#endif // YY_NO_UNISTD_H

#include "TypeParser.h"
#include "Lexer.h"
#include "Expression.h"

/**
 * @brief structure given as argument to the reentrant 'yyparse' function.
 */
typedef struct tagSParserParam
{
    yyscan_t scanner;
    SExpression *expression;
}SParserParam;

// the parameter name (of the reentrant 'yyparse' function)
// data is a pointer to a 'SParserParam' structure
#define YYPARSE_PARAM    data

// the argument for the 'yylex' function
#define YYLEX_PARAM    ((SParserParam*)data)->scanner

#endif // __PARSERPARAM_H__

The tokens needed by the bison parser will be generated using flex.

%{

/*
 * Lexer.l file
 * To generate the lexical analyzer run: "flex --outfile=Lexer.c --header-file=Lexer.h Lexer.l"
 */

#include "TypeParser.h"
#include "Parser.h"

%}

%option reentrant noyywrap never-interactive nounistd
%option bison-bridge

LPAREN      "("
RPAREN      ")"
PLUS        "+"
MULTIPLY    "*"

NUMBER      [0-9]+
WS          [ \r\n\t]*

%%

{WS}            { /* Skip blanks. */ }
{NUMBER}        { sscanf(yytext,"%d",&yylval->value); return TOKEN_NUMBER; }

{MULTIPLY}      { return TOKEN_MULTIPLY; }
{PLUS}          { return TOKEN_PLUS; }
{LPAREN}        { return TOKEN_LPAREN; }
{RPAREN}        { return TOKEN_RPAREN; }
.               {  }

%%

int yyerror(const char *msg) {
    fprintf(stderr,"Error:%s\n",msg); return 0;
}

The bison file providing describing the grammar of the numerical expressions.

%{

/*
 * Parser.y file
 * To generate the parser run: "bison --defines=Parser.h Parser.y"
 */

#include "TypeParser.h"
#include "ParserParam.h"

%}

%define api.pure

%left '+' TOKEN_PLUS
%left '*' TOKEN_MULTIPLY

%token TOKEN_LPAREN
%token TOKEN_RPAREN
%token TOKEN_PLUS
%token TOKEN_MULTIPLY

%token <value> TOKEN_NUMBER

%type <expression> expr

%%

input:
    expr { ((SParserParam*)data)->expression = $1; }
    ;

expr:
      expr TOKEN_PLUS expr { $$ = createOperation( ePLUS, $1, $3 ); }
    | expr TOKEN_MULTIPLY expr { $$ = createOperation( eMULTIPLY, $1, $3 ); }
    | TOKEN_LPAREN expr TOKEN_RPAREN { $$ = $2; }
    | TOKEN_NUMBER { $$ = createNumber($1); }
;

%%

The code needed to obtain the syntax tree using the parser generated by bison and the scanner generated by flex is the following.

#include "ParserParam.h"
#include "Parser.h"
#include "Lexer.h"

#include <stdio.h>

int yyparse(void *param);

static int initParserParam(SParserParam* param)
{
    int ret = 0;

    ret = yylex_init(&param->scanner);
    param->expression = NULL;

    return ret;
}

static int destroyParserParam(SParserParam* param)
{
    return yylex_destroy(param->scanner);
}

SExpression *getAST(const char *expr)
{
    SParserParam p;
    YY_BUFFER_STATE state;

    if (initParserParam(&p))
    {
        // couldn't initialize
        return NULL;
    }

    state = yy_scan_string(expr, p.scanner);

    if (yyparse(&p))
    {
        // error parsing
        return NULL;
    }

    yy_delete_buffer(state, p.scanner);

    destroyParserParam(&p);

    return p.expression;
}

int evaluate(SExpression *e)
{
    switch (e->type)
    {
        case eVALUE:
            return e->value;
        case eMULTIPLY:
            return evaluate(e->left) * evaluate(e->right);
        case ePLUS:
            return evaluate(e->left) + evaluate(e->right);
        default:
            // shouldn't be here
            return 0;
    }
}

int main(void)
{
    SExpression *e = NULL;
    char test[]=" 4 + 2*10 + 3*( 5 + 1 )";
    int result = 0;

    e = getAST(test);

    result = evaluate(e);

    printf("Result of '%s' is %d\n", test, result);

    deleteExpression(e);

    return 0;
}

Issues

Reentrancy

Normally, Bison generates a parser which is not reentrant. In order to achieve reentrancy the declaration %define api.pure must be used. More details on Bison reentrancy can be found in the Bison manual.[4]

Using bison from other languages

Bison can only generate code for C, C++ and Java.[5] For using the bison generated parser from other languages a language binding tool such as SWIG can be used.

Where is it used?

Here is a non-comprehensive list of software built using Bison:

  • The Ruby Programming Language (YARV);
  • The PHP Programming Language (Zend Parser);
  • GCC started out using Bison, but switched to a hand-written parser for C++ in 2004[6];
  • The Go Programming Language (GC);
  • Bash shell uses a yacc grammar for parsing the command input. It is distributed with bison-generated files.

References

  1. ^ Levine, John (August 2009). flex & bison. O'Reilly Media. p. 304. ISBN 978-0-596-15597-1. {{cite book}}: Cite has empty unknown parameter: |coauthors= (help)
  2. ^ GNU Bison Manual: Conditions for Using Bison
  3. ^ a b GNU Bison Manual: C Scanners with Bison Parsers
  4. ^ GNU Bison Manual: A Pure (Reentrant) Parser
  5. ^ GNU Bison Manual: Bison Declaration Summary
  6. ^ GCC 3.4 Release Series Changes, New Features, and Fixes