Tag Archives: boxing

All the things that can bite you…

The summer is rather slow when it comes to development (hopefully we passed 35°C zone) but I am doing good progress with cleaning the code anyway.

One thing that stopped me was C# struct. Most of the time I work with classes and despite I know what the struct is, I simply don’t have vast practice with it, and so I was caught off-guard.

I was implementing Option which should work as Nullable for all types. Since it should express strictly the notion of an option, it has to be a struct (otherwise you would have to deal with missing option, unset option and set option with missing value — tad too much). Implementing it was no brainer, thanks to C# properties next to SetValue I added nice Value setter which changes HasValue as well. Concise, easy to understand and… completely wrong.

I added it to Skila and to my surprise on one end I called SetValue just to observe no value was ever set on the other end. What on the Earth? Well, struct is always passed by value, which means that if Option is returned from property getter or a method, there is a copy returned of real data. I could call SetValue million times and it would not change a thing for stored data (only local, temporary, copy of Option would be changed).

So rule of thumb for C# developer — by default make your structs immutable. For a language/framework designer there is a lesson too — don’t make risky features implicit.

I‘ve recently watched “Essential Truths Everyone Should Know about Performance in a Large Managed Codebase” by Dustin Campbell. It is longer version of an old saying “profiler is your friend” — but what struck me while watching it is that in C# you don’t see bottlenecks. Boxing? Implicit. Unboxing? Implicit. Method wrapping (with boxing)? Implicit.

Add to this a framework which uses object too often, and profiler becomes necessity. And I thought implicit conversion constructors in C++ were bad — boy, you can at least make them explicit by yourself.

Tagged , , , , , ,

Hard time with optimization

It is about time to think about NLT performance — the results I saw in profiler output really surprised me. I thought that parser is the culprit of poor performance (I tested regular LALR), but except for creating huge logs I was mistaken. Running regular expressions against input (lexer) takes most of the time.

Another surprise is how a tiny line can be a major drag — converting enum to int takes about 9% of total time. This is actually due to flaws in C# — firstly lack of enum constraint on generic types. All I have is some struct (despite I know it is an enum really) so I have to cast data to object and then back to int. In effect I have unboxing just after boxing, which should lead to no-op — but it does not ¹.

All in all, I managed to shave running time to around 30% of original performance — I will keep enum as it is now, for regular expressions I am thinking about writing my own engine capable of matching multiple regexes at the same time (the whole purpose of the lexer is to pick up the longest match). Since it is a challenge not to be taken lightly (time) I moved to other issues.

I am not a fan of using ad-hoc symbols, but I see its purpose — it is easier to explain some production without introducing extra code. NLT grammar looked too heavy for my taste, so I added support for literals used as terminals in parser productions. One can write:

expr Expr -> e1:expr "+" e2:expr
             ...  

instead of introducing beforehand appropriate terminal like “PLUS”.

¹ C# non-boxing conversion of generic enum to int?.

Tagged , , ,