It is about time to think about NLT performance — the results I saw in profiler output really surprised me. I thought that parser is the culprit of poor performance (I tested regular LALR), but except for creating huge logs I was mistaken. Running regular expressions against input (lexer) takes most of the time.
Another surprise is how a tiny line can be a major drag — converting
int takes about 9% of total time. This is actually due to flaws in C# — firstly lack of
enum constraint on generic types. All I have is some
struct (despite I know it is an
enum really) so I have to cast data to
object and then back to
int. In effect I have unboxing just after boxing, which should lead to no-op — but it does not ¹.
All in all, I managed to shave running time to around 30% of original performance — I will keep
enum as it is now, for regular expressions I am thinking about writing my own engine capable of matching multiple regexes at the same time (the whole purpose of the lexer is to pick up the longest match). Since it is a challenge not to be taken lightly (time) I moved to other issues.
I am not a fan of using ad-hoc symbols, but I see its purpose — it is easier to explain some production without introducing extra code. NLT grammar looked too heavy for my taste, so I added support for literals used as terminals in parser productions. One can write:
expr Expr -> e1:expr "+" e2:expr ...
instead of introducing beforehand appropriate terminal like “