Beaver - a LALR Parser Generator

Error Recovery

Recovery Method

To recover from a syntax error Beaver's parsing engine uses simple token stream repairs and a phrase-level error recovery.

How It Works

Whenever parser detects a syntax error, in other words enconters an unexpected terminal symbol, it reports it by calling overridable Events.syntaxError method and then attempts to perform a simple token stream repair. First, it removes an unexpected terminal from the stream. If that was unsuccessful, it inserts before erroneous token one by one terminals that are expected in the current state. If that also has not helped, erroneous terminal is replaced by one that is expected by parser. After each attempt to repair a token stream parser performs a dry-run to check whether a repair will make further parsing possible. During dry-run no action code associated with production rules is executed, i.e. parser simulates a future parsing. If during this simulation it was able to shift successfully 3 terminals, stream is considered repaired and syntax error recovered. Otherwise parser continues by starting phrase-level error recovery.

During phrase-level error recovery parser replaces some symbols around the erroneous terminal with error nonterminal and continues parsing. To find the start of the error phrase parser backtracks a little until it is finds a state where it is able to shift the error symbol. Symbols that were removed from the parsing stack by backtracking process match left context of the error phrase. These are removed and the error symbol is shifted. Now parser will try to match the right context of the error phrase. For this it attempts to parse ahead starting from the state with the error symbol on top of the parsing stack. Action routines are not executed during this "trial" parsing. While trying to parse ahead parser discards all terminals from the input stream that prevent successful parsing. These discarded terminals become a right context for the error phrase. The parsing ahead continues until parser is able to "shift" successfully 3 terminals or is able to reduce the grammar's goal. At this point parser considers syntax error recovered and returns to the regular parsing using those successfull terminals as the continuation of the input stream.

Important Facts

Expected tokens insertion and replacement during simple token stream repairs are performed only if parsing tables weren't compressed - only in that case they keep all lookahead terminals. Alternatively, for compressed parsing tables, an additional table that lists lookaheads for all states might be used. The latter is not implemented (yet?) though.

If the grammar has not defined how to match error phrases, in other words error symbol was not used on a right-hand side of any production, parser won't be able to recover from the error. Therefore any syntax error will be the last error found, unless simple token stream repair recovered from a syntax error first.

If parser cannot recover from a syntax error it raises Parser.Exception.

The value of the error symbol is null. This fact is probably irrelevant, but in case if you try to access error symbol value though an alias in an action routine you'll need to know this.

During backtracking parser will remove already reduced nonterminals from the parsing stack to find the left context of the error phrase. If action routines that were executed while those symbols were reduced do not have side-effects, i.e. they produced new value or a symbol, but have not changed anything else, then everything is fine. Shifted symbols will be discarded without any ill effects. Side-effects of action routines cannot be discarded, and further action routines of the generated parser may start behaving unpredictably.

Another source of problems during parsing is a scanner inability to match a part of input to a known terminal. Scanners raise Scanner.Exception then, which parser catches and reports via overridable scannerError method. Parser will continue requesting tokens from a scanner until the latter returns a token. Therefore compliant scanners need to be able to return at least an end-of-file token after an exception was raised.