Bison 3.0.4: LR Table Construction

5.8.1 LR Table Construction

For historical reasons, Bison constructs LALR(1) parser tables by default. However, LALR does not possess the full language-recognition power of LR. As a result, the behavior of parsers employing LALR parser tables is often mysterious. We presented a simple example of this effect in Mysterious Conflicts.

As we also demonstrated in that example, the traditional approach to eliminating such mysterious behavior is to restructure the grammar. Unfortunately, doing so correctly is often difficult. Moreover, merely discovering that LALR causes mysterious behavior in your parser can be difficult as well.

Fortunately, Bison provides an easy way to eliminate the possibility of such mysterious behavior altogether. You simply need to activate a more powerful parser table construction algorithm by using the %define lr.type directive.

Directive: %define lr.type type

Specify the type of parser tables within the LR(1) family. The accepted values for type are:

lalr (default)
ielr
canonical-lr

(This feature is experimental. More user feedback will help to stabilize it.)

For example, to activate IELR, you might add the following directive to you grammar file:

%define lr.type ielr

For the example in Mysterious Conflicts, the mysterious conflict is then eliminated, so there is no need to invest time in comprehending the conflict or restructuring the grammar to fix it. If, during future development, the grammar evolves such that all mysterious behavior would have disappeared using just LALR, you need not fear that continuing to use IELR will result in unnecessarily large parser tables. That is, IELR generates LALR tables when LALR (using a deterministic parsing algorithm) is sufficient to support the full language-recognition power of LR. Thus, by enabling IELR at the start of grammar development, you can safely and completely eliminate the need to consider LALR’s shortcomings.

While IELR is almost always preferable, there are circumstances where LALR or the canonical LR parser tables described by Knuth (see Knuth 1965) can be useful. Here we summarize the relative advantages of each parser table construction algorithm within Bison:

LALR
There are at least two scenarios where LALR can be worthwhile:
- GLR without static conflict resolution.
  When employing GLR parsers (see GLR Parsers), if you do not resolve any conflicts statically (for example, with %left or %precedence), then the parser explores all potential parses of any given input. In this case, the choice of parser table construction algorithm is guaranteed not to alter the language accepted by the parser. LALR parser tables are the smallest parser tables Bison can currently construct, so they may then be preferable. Nevertheless, once you begin to resolve conflicts statically, GLR behaves more like a deterministic parser in the syntactic contexts where those conflicts appear, and so either IELR or canonical LR can then be helpful to avoid LALR’s mysterious behavior.
- Malformed grammars.
  Occasionally during development, an especially malformed grammar with a major recurring flaw may severely impede the IELR or canonical LR parser table construction algorithm. LALR can be a quick way to construct parser tables in order to investigate such problems while ignoring the more subtle differences from IELR and canonical LR.
IELR
IELR (Inadequacy Elimination LR) is a minimal LR algorithm. That is, given any grammar (LR or non-LR), parsers using IELR or canonical LR parser tables always accept exactly the same set of sentences. However, like LALR, IELR merges parser states during parser table construction so that the number of parser states is often an order of magnitude less than for canonical LR. More importantly, because canonical LR’s extra parser states may contain duplicate conflicts in the case of non-LR grammars, the number of conflicts for IELR is often an order of magnitude less as well. This effect can significantly reduce the complexity of developing a grammar.
Canonical LR
While inefficient, canonical LR parser tables can be an interesting means to explore a grammar because they possess a property that IELR and LALR tables do not. That is, if %nonassoc is not used and default reductions are left disabled (see Default Reductions), then, for every left context of every canonical LR state, the set of tokens accepted by that state is guaranteed to be the exact set of tokens that is syntactically acceptable in that left context. It might then seem that an advantage of canonical LR parsers in production is that, under the above constraints, they are guaranteed to detect a syntax error as soon as possible without performing any unnecessary reductions. However, IELR parsers that use LAC are also able to achieve this behavior without sacrificing %nonassoc or default reductions. For details and a few caveats of LAC, see LAC.

For a more detailed exposition of the mysterious behavior in LALR parsers and the benefits of IELR, see Denny 2008 March, and Denny 2010 November.