Hard time with optimization

It is about time to think about NLT performance — the results I saw in profiler output really surprised me. I thought that parser is the culprit of poor performance (I tested regular LALR), but except for creating huge logs I was mistaken. Running regular expressions against input (lexer) takes most of the time.

Another surprise is how a tiny line can be a major drag — converting enum to int takes about 9% of total time. This is actually due to flaws in C# — firstly lack of enum constraint on generic types. All I have is some struct (despite I know it is an enum really) so I have to cast data to object and then back to int. In effect I have unboxing just after boxing, which should lead to no-op — but it does not ¹.

All in all, I managed to shave running time to around 30% of original performance — I will keep enum as it is now, for regular expressions I am thinking about writing my own engine capable of matching multiple regexes at the same time (the whole purpose of the lexer is to pick up the longest match). Since it is a challenge not to be taken lightly (time) I moved to other issues.

I am not a fan of using ad-hoc symbols, but I see its purpose — it is easier to explain some production without introducing extra code. NLT grammar looked too heavy for my taste, so I added support for literals used as terminals in parser productions. One can write:

expr Expr -> e1:expr "+" e2:expr
             ...  

instead of introducing beforehand appropriate terminal like “PLUS”.

¹ C# non-boxing conversion of generic enum to int?.

Tagged , , ,

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: