Strings — performance hurts

I barely have time to write what happens with Skila but this one is so off I cannot resist — strings. Wide Unicode or UTF-8? The latter right, because they take less space so they require less fetches from the memory. How do we index them? By character or by byte? The former will not give us constant time thus by byte.

I know there are currently languages going this path — like Rust or Julia — but I don’t feel comfortable with it. For one, there is no abstraction layer here, the implementation details leak right into API. Secondly it forbids having common IString interface with UnicodeString because it would be indexed differently. Oh boy…

This choice even pushed me to a strange at first glance decision for having reverse methods lastIndexOf starting from exclusive index rather than inclusive. Oddly this is somewhat in sync with ranges (inclusive to exclusive).

UTF-8 string, where will you lead me…?

Advertisement
Tagged , ,

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: