It is about time to think about NLT performance — the results I saw in profiler output really surprised me. I thought that parser is the culprit of poor performance (I tested regular LALR), but except for creating huge logs I was mistaken. Running regular expressions against input (lexer) takes most of the time.
Another surprise is how a tiny line can be a major drag — converting enum
to int
takes about 9% of total time. This is actually due to flaws in C# — firstly lack of enum
constraint on generic types. All I have is some struct
(despite I know it is an enum
really) so I have to cast data to object
and then back to int
. In effect I have unboxing just after boxing, which should lead to no-op — but it does not ¹.
All in all, I managed to shave running time to around 30% of original performance — I will keep enum
as it is now, for regular expressions I am thinking about writing my own engine capable of matching multiple regexes at the same time (the whole purpose of the lexer is to pick up the longest match). Since it is a challenge not to be taken lightly (time) I moved to other issues.
I am not a fan of using ad-hoc symbols, but I see its purpose — it is easier to explain some production without introducing extra code. NLT grammar looked too heavy for my taste, so I added support for literals used as terminals in parser productions. One can write:
expr Expr -> e1:expr "+" e2:expr ...
instead of introducing beforehand appropriate terminal like “PLUS
”.