Beaver - a LALR Parser Generator

Beaver

Introduction

What is Beaver

Beaver is a LALR(1) parser generator. It takes a context free grammar and converts it into a Java class that implements a parser for the language described by the grammar. Beaver accepts grammars expressed in the Extended Backus-Naur form(EBNF).

How Beaver Works

On the outside Beaver is not that different from any other parser generator. It reads the specification of the language for which a parser needs to be generated and produces a Java source file with a Java class that represents a parser for the language. It should be pointed out, though, that a language here should be viewed as a general term describing a linear form representing some structured information, not necessarily a programming language.

The inner workings of Beaver's parsing engine use some interesting techniques which make it really fast, probably as fast as a LARL parser can get:

An option to use "switch"-ing for an action routine code is also available. Though it's slower, it does not suffer from the size overhead created by delegates (inner classes). When space is an issue or when the top speed is not a priority, generating a parser that switch-es between reduce action routines is a viable option.

Beaver's compiler can be run from the command line or as an Ant task. It goes beyond that though, and provides a simple interface for starting it in other environments and by other means. This flexibility makes it easy to integrate Beaver with other applications and development tools.

The parser that Beaver generates represents only a part of the translation process -- it performs a syntactic analysis and assembles input tokens into structures as described by the production rules of the language specification. Those tokens that Beaver consumes as its input need to be produced and supplied by a scanner. The API that Beaver provides for integration with scanners makes it easy to use with such popular tools as JFlex and JLex.

In order to run a Beaver-generated parser, the scanner and parser need to be generated and compiled, and the resultant "generated" parser needs to be run with the Beaver-supplied runtime parser engine. Your generated parser source code represents only parsing tables and action routines that drive the parsing and translation process. After these two items, parsing tables and runtime parsing engine, are combined at runtime, they become a working parser.

Though it is possible to use a generated parser immediately as-is, typically a compiler writer will implement the real parser by extending the functionality which the generated parser provides. Most significantly, the generated parsers report all problems with the input source through System.err. In order to modify the error handling behavior, a parser needs to override certain methods to intercept error events.