Next: Default Reductions, Up: Tuning LR [Contents][Index]
For historical reasons, Bison constructs LALR(1) parser tables by default. However, LALR does not possess the full language-recognition power of LR. As a result, the behavior of parsers employing LALR parser tables is often mysterious. We presented a simple example of this effect in Mysterious Conflicts.
As we also demonstrated in that example, the traditional approach to eliminating such mysterious behavior is to restructure the grammar. Unfortunately, doing so correctly is often difficult. Moreover, merely discovering that LALR causes mysterious behavior in your parser can be difficult as well.
Fortunately, Bison provides an easy way to eliminate the possibility of such
mysterious behavior altogether. You simply need to activate a more powerful
parser table construction algorithm by using the %define lr.type
directive.
Specify the type of parser tables within the LR(1) family. The accepted values for type are:
lalr
(default)
ielr
canonical-lr
(This feature is experimental. More user feedback will help to stabilize it.)
For example, to activate IELR, you might add the following directive to you grammar file:
%define lr.type ielr
For the example in Mysterious Conflicts, the mysterious conflict is then eliminated, so there is no need to invest time in comprehending the conflict or restructuring the grammar to fix it. If, during future development, the grammar evolves such that all mysterious behavior would have disappeared using just LALR, you need not fear that continuing to use IELR will result in unnecessarily large parser tables. That is, IELR generates LALR tables when LALR (using a deterministic parsing algorithm) is sufficient to support the full language-recognition power of LR. Thus, by enabling IELR at the start of grammar development, you can safely and completely eliminate the need to consider LALR’s shortcomings.
While IELR is almost always preferable, there are circumstances where LALR or the canonical LR parser tables described by Knuth (see Knuth 1965) can be useful. Here we summarize the relative advantages of each parser table construction algorithm within Bison:
There are at least two scenarios where LALR can be worthwhile:
When employing GLR parsers (see GLR Parsers), if you do not resolve any
conflicts statically (for example, with %left
or %precedence
),
then
the parser explores all potential parses of any given input. In this case,
the choice of parser table construction algorithm is guaranteed not to alter
the language accepted by the parser. LALR parser tables are the smallest
parser tables Bison can currently construct, so they may then be preferable.
Nevertheless, once you begin to resolve conflicts statically, GLR behaves
more like a deterministic parser in the syntactic contexts where those
conflicts appear, and so either IELR or canonical LR can then be helpful to
avoid LALR’s mysterious behavior.
Occasionally during development, an especially malformed grammar with a major recurring flaw may severely impede the IELR or canonical LR parser table construction algorithm. LALR can be a quick way to construct parser tables in order to investigate such problems while ignoring the more subtle differences from IELR and canonical LR.
IELR (Inadequacy Elimination LR) is a minimal LR algorithm. That is, given any grammar (LR or non-LR), parsers using IELR or canonical LR parser tables always accept exactly the same set of sentences. However, like LALR, IELR merges parser states during parser table construction so that the number of parser states is often an order of magnitude less than for canonical LR. More importantly, because canonical LR’s extra parser states may contain duplicate conflicts in the case of non-LR grammars, the number of conflicts for IELR is often an order of magnitude less as well. This effect can significantly reduce the complexity of developing a grammar.
While inefficient, canonical LR parser tables can be an interesting means to
explore a grammar because they possess a property that IELR and LALR tables
do not. That is, if %nonassoc
is not used and default reductions are
left disabled (see Default Reductions), then, for every left context of
every canonical LR state, the set of tokens accepted by that state is
guaranteed to be the exact set of tokens that is syntactically acceptable in
that left context. It might then seem that an advantage of canonical LR
parsers in production is that, under the above constraints, they are
guaranteed to detect a syntax error as soon as possible without performing
any unnecessary reductions. However, IELR parsers that use LAC are also
able to achieve this behavior without sacrificing %nonassoc
or
default reductions. For details and a few caveats of LAC, see LAC.
For a more detailed exposition of the mysterious behavior in LALR parsers and the benefits of IELR, see Denny 2008 March, and Denny 2010 November.
Next: Default Reductions, Up: Tuning LR [Contents][Index]