Tag Archives: utf-8

Strings — performance hurts

I barely have time to write what happens with Skila but this one is so off I cannot resist — strings. Wide Unicode or UTF-8? The latter right, because they take less space so they require less fetches from the memory. How do we index them? By character or by byte? The former will not give us constant time thus by byte.

I know there are currently languages going this path — like Rust or Julia — but I don’t feel comfortable with it. For one, there is no abstraction layer here, the implementation details leak right into API. Secondly it forbids having common IString interface with UnicodeString because it would be indexed differently. Oh boy…

This choice even pushed me to a strange at first glance decision for having reverse methods lastIndexOf starting from exclusive index rather than inclusive. Oddly this is somewhat in sync with ranges (inclusive to exclusive).

UTF-8 string, where will you lead me…?

Tagged , ,