Two wrong approaches with optimization later and GLR parser is ready for download.
In my case the last missing step was upgrading LALR parser to MLR — I was following great explanation given by David R. Tribble (of course all errors in implementations are mine). One thing that caused my confusion was the term “lookahead” — it seems a lot of people using it for two different things, and so I introduced “after lookahead” and “next lookahead” terms to clarify which is which. The first one is used when building a parser to denote terminals which can show up after entire production is consumed (so it is irrelevant how much of the production is processed). The “next lookahead” is used when parsing — it denotes terminals which can show next in input stream (for reduce states of productions those two sets are identical).
Besides that implementing MLR parser was easier than LALR one — you don’t have to compute follow sets, there is less work with NFA and DFA.
As for optimization I went with pre-computing first sets for entire chunks of the symbols present in productions and with stamping NFA states whenever their after-lookaheads get updated. It allowed me to get to the point where lexer is again the bottleneck of entire processing.
I don’t know how about you, but I think I deserved Grand Imperial Porter. I will get another one after implementing multi-regex.