To recover from a syntax error Beaver's parsing engine uses simple token stream repairs and a phrase-level error recovery.
Whenever parser detects a syntax error, in other words enconters an unexpected terminal symbol, it reports it by calling overridable
method and then attempts to perform a simple token stream repair. First, it removes an unexpected terminal from the stream. If that was unsuccessful, it inserts before
erroneous token one by one terminals that are expected in the current state. If that also has not helped, erroneous terminal is replaced by one that is expected by parser.
After each attempt to repair a token stream parser performs a dry-run to check whether a repair will make further parsing possible. During dry-run no action code associated
with production rules is executed, i.e. parser simulates a future parsing. If during this simulation it was able to shift successfully 3 terminals, stream is
considered repaired and syntax error recovered. Otherwise parser continues by starting phrase-level error recovery.
During phrase-level error recovery parser replaces some symbols around the erroneous terminal with
error nonterminal and continues parsing. To find the
start of the error phrase parser backtracks a little until it is finds a state where it is able to shift the
error symbol. Symbols that were removed from
the parsing stack by backtracking process match left context of the error phrase. These are removed and the
error symbol is shifted. Now parser will try to
match the right context of the error phrase. For this it attempts to parse ahead starting from the state with the
error symbol on top of the parsing stack.
Action routines are not executed during this "trial" parsing. While trying to parse ahead parser discards all terminals from the input stream that prevent successful
parsing. These discarded terminals become a right context for the error phrase. The parsing ahead continues until parser is able to "shift" successfully 3 terminals or
is able to reduce the grammar's goal. At this point parser considers syntax error recovered and returns to the regular parsing using those successfull terminals as the
continuation of the input stream.
Expected tokens insertion and replacement during simple token stream repairs are performed only if parsing tables weren't compressed - only in that case they keep all lookahead terminals. Alternatively, for compressed parsing tables, an additional table that lists lookaheads for all states might be used. The latter is not implemented (yet?) though.
If the grammar has not defined how to match error phrases, in other words
error symbol was not used on a right-hand side of any production, parser
won't be able to recover from the error. Therefore any syntax error will be the last error found, unless simple token stream repair recovered from a syntax error first.
If parser cannot recover from a syntax error it raises
The value of the error symbol is
null. This fact is probably irrelevant, but in case if you try to access error symbol value though an alias in
an action routine you'll need to know this.
During backtracking parser will remove already reduced nonterminals from the parsing stack to find the left context of the error phrase. If action routines that were executed while those symbols were reduced do not have side-effects, i.e. they produced new value or a symbol, but have not changed anything else, then everything is fine. Shifted symbols will be discarded without any ill effects. Side-effects of action routines cannot be discarded, and further action routines of the generated parser may start behaving unpredictably.
Another source of problems during parsing is a scanner inability to match a part of input to a known terminal. Scanners raise
Scanner.Exception then, which
parser catches and reports via overridable
scannerError method. Parser will continue requesting tokens from a scanner until the latter returns a token. Therefore
compliant scanners need to be able to return at least an end-of-file token after an exception was raised.