The idea of context scanning I wrote yesterday about was like a virus of the mind — I just couldn’t settle down having the syntax for the generator I didn’t like.
The solution I described was pretty complex especially for forking parser — because it might happen that context switch occurs inside one of the forked parse trees. This translates to having one lexer (or lexer environment) per each parse tree. There is another issue with having conflicts on lookaheads — the same lookahead coming from one production can have lexer state switch, and coming from another might not have such switch.
It is hard to find any discussion about context scanning, because the internet is polluted with basic “how do I do X in Lex/Yacc”, and any search I tried was jammed. The only pearl I found was “Context-Sensitive Scanning in ANTLR” by invaluable Scott Stanchfield.
In short — problems. And while I am not afraid of hard work, the clock is ticking — I needed working solution fast. So I came up with recipe that simply works in my case, hoping that it might work in others too.
The context for the lexer is just a sequence of previous tokens — that’s all. It allowed me to get rid of such syntax:
/[A-Za-z_][A-Za-z_0-9]*/ -> ID { new IdSymbol($text) };
and introduce more natural (because it looks more like pair):
/[A-Za-z_][A-Za-z_0-9]*/ -> ID, new IdSymbol($text);
It works, because comma character switches to EXPRESSION
state if and only if there were two tokens before — right arrow and identifier.
If I could only solve other problems that quickly…