Technically GLR works fine, but for me it is annoying that I have to postpone some decisions until run time. Consider such Skila code:
abstract class... static def... // function static let... // field
and grammar for it:
class_mod = ε | class_mod ABSTRACT | class_mod STATIC | ... class = class_mod CLASS ... func_mod = ε | func_mod ABSTRACT | func_mod STATIC | ... func = func_mod DEF ... field = STATIC? LET
This creates a shift-reduce/reduce-reduce conflict and GLR when parsing `abstract override static ...
` forks on each keyword waiting for the right moment to kill incorrect parse trees. Could we do better? Yes — we know in advance that we are waiting for `class
`, `def
` or `let
` keywords. All we have to do is to incorporate that knowledge into generated parser and instead of forking the tree we will check the incoming data.
To achieve this we have to make just small modifications to the parser generator — along with the first sets and follow sets we add two new ones. Cover set for given symbol is a bag of symbols which appeared on the right hand side of the productions for this symbol (directly or indirectly). For example cover set for `func
` would contain among others things `DEF
`, `func_mod
` and also `ABSTRACT
` as well. The second new set is — surprise, surprise — horizon set. It is very similar to the follow set, but unlike it, it does not mindlessly tell us what can show up right after given symbol (like for `class_mod
` — `ABSTRACT
`). Horizon set ignores such reductions/expansions as shown in the first and the third line of grammar, it focuses solely on true reductions of the symbol. Such reduction, that it cannot be recursively triggered again. So for `class_mod
` horizon set has only one symbol — `CLASS
`.
Having first sets, follow sets and cover sets, computing horizon sets is easy — ignore all productions for which given symbol covers left hand side (LHS) of the production, and for other productions take first symbols of what comes right after our symbol. If there is too little data, take the rest from the follow set of LHS of the current production. Or in other words — it is exactly the same algorithm as for the follow sets, only this time we narrow down the productions with help of cover set and because we have follow sets computed we don’t have to bother with recurrent computations, we just grab the needed data.
For each non-terminal horizon set is checked against its cover set — if those two sets overlap, horizon set is scrapped as unusable. Further computations are performed only in case all involved horizon sets are usable (non-empty).
Armed with all four sets we are ready to resolve ambiguities — in case of reduce-reduce conflict we check whether the horizons overlap, if not, we can use those data in run time. Shift-reduce resolution is a bit more elaborate — having production in shift state we process symbol after symbol from RHS testing if there is an overlap between cover set for current symbol and horizon for reduction on one hand and on the other — between LHS of reduction (when the verification of entire production was unsuccessful we make one final check — instead of using cover set for current symbol we use after-lookaheads). As previously, no overlap means we resolved the conflict — this time using not only horizon for reduction action, but also adding the non-overlapping symbol as a cut off marker to avoid excessive reading of the input.
In our example to resolve reduce-reduce conflict in run time `class_mod
` and `func_mod
` would run over stream of input symbols until they find their horizons, `CLASS
` or `DEF
` respectively — which one is found first, appropriate reduction wins. However because we have shift production in the mix we stop short on `LET
` (this is cut off for both reduction actions), and if this happens shift action wins.
The code is uploaded, so take a look if you are interested in implementation.