Tag Archives: variables

Do lexer generator — get appetite for more

As a little break from Skila I decided to add missing part in NLT generator — a lexer. Irony, because adding it was more difficult than parser part.

First of all I was bitten by lack of tail recursion in C# — when scanning through huge comment block all I’ve got was StackOverflowException. Secondly, I couldn’t decipher scanning history of my own program — that’s bad. Once I fixed those weak spots I started adding feature after feature. Namely:

  • lexer can return stream of tokens per each match text,
  • if lexer detects Value for matched token is not set it uses Text for it (if it is not set too, the terminal code),
  • lexer generator supports pseudo-variables like “$match”, “$value”, “$text”, and “$token” for easier writing scanning rules.

Oh, wait — I didn’t say anything about lexer generator. Well, there are two more sections added to grammar file — one is for defining states for lexer, the second is for the rules.

The NLT generator allows to define string and regex based rules:

"class" -> CLASS;

/[A-Za-z_][A-Za-z_0-9]*/ 
  -> ID { new IdSymbol($text) };

It lets you define simple form rules (as above) — where you just return terminal and optionally a value as a C# expression — or complex rules where everything is expressed as block of C# statements:

"*/" {
       $value = "Unmatched */";
       $token = TokenEnum.Error;
     };

Still, this is an experimental project, but it is not far to get fully operational generator — the only part missing is translating parser rules into action table instead of creating the builder.

And yet I am tempted to wait and add two features to my agenda:

  • supporting pseudo-variables in parser rules,
  • make the parser-driven lexer.

The second idea might be an overkill, but so far I don‘t see simpler solution. Consider such rule:

/[A-Za-z_][A-Za-z_0-9]*/ 
  -> ID { new IdSymbol($text) };

This is easy to scan, because once I see open brace I switch lexer state to CODE and use different scanning rules. But as a user I don’t like the syntax of the grammar file. What I would like to see is:

/[A-Za-z_][A-Za-z_0-9]*/ 
  -> ID, new IdSymbol($text);

No braces, and it is more obvious that simple form requires expression, not a block of code (statements). Not an easy task for lexer though — I cannot switch to CODE state each time I see a comma character. I could add extra logic to scanning rules, but this work partially is already done — it is called parser.

Parser knows much more — which part of the tree it is working on and here after comma it can anticipate nothing else but an expression, because there is no such form as identifier followed by comma except terminal and value pair. Thus it might be possible to change the workflow of lexer — instead of scanning everything in one take, scanning on demand. Second change — allowing parser to change the lexer state.

It looks very promising, and I cannot stop wondering if it is the best approach to tackle such problem as described above. Anyway, the new NLT package is ready for download.

Tagged , , , ,

struct/class — changing the course

After some considerations I decided to steer away from muddy waters of improving everything related to types and objects management in one take. It does not take the genius to notice it is a vast area to cover and I will have plenty of work merging C# value and reference types with C++ mutable/const attributes with non-/nullable pointers.

This is already the challenge because with such variety of features it is easy to produce some obscure syntax. So I changed my perspective — keep it simple, make one step at a time, scratch only if it itches.

struct” is value type in Skila exactly like it is in C#. There is richer annotation though:

var x Struct = new Struct();
const y Struct = new Struct();

The first line declares a variable, thus you can change the data of “x”. The second line declares a constant — this works like in C++, so it is logical immutability, not the bitwise one.

The same modifier can be applied when passing “struct” to a function:

def foo(var a Struct) ...
def bar(const b Struct) ...

By default the parameter data is assumed to be constant so you can drop “const” in the second line.

The first line is more interesting — it tells you can change the data and those changes will be visible on caller side. So from caller perspective it is a side effect — it should not go unnoticed and it doesn’t:

foo(!x);
bar(x);
bar(y);

Those are valid calls — just like with mutating methods there is exclamation mark added as acknowledgment of alteration of “x” variable.

I didn’t add ability to pass a copy of the variable which could be changed just inside the function (pass by value). Time will tell if it is needed, now I move to C# “class” — it is a bit more problematic, because the data can be constant, the reference can be constant, and a reference can be “null” — the syntax is just boiling over.

Tagged , , , , , ,

Initialization and undef

I like C# initialization because it is safe, and I like C++ initialization because not a single CPU cycle is wasted. Skila borrows the best from those two worlds — all variables have to be initialized. Explicitly by you — after a while implicit initialization becomes a habit, and even if not it is too easy to forget about it. However if for some reasons you cannot or don’t want to initialize variable, “assign” undef then:

var a = 5; // OK
var b : Int; // error
var c : Int = undef; // OK
var d = undef; // error

Do you know why the last one is erroneous? Compiler has to know something about a type, so you can have uninitialized (explicitly) variable, but it still has to have a type.

Tagged

“Sink” variable

Skila does not have lambdas (yet), but even now with functions it is useful to drop the value somewhere and indicate very clearly that the value will not be used. Or even better, drop in such way it cannot be used:

@ = my_function(5);

Function will be called, and the value will be read (as required per function in Skila), however it is impossible to use it any way, because sink variable (predefined, type Object) is not readable. In terms of C# you can think of it as pure setter (no getter).

Speaking of lambdas — it will be possible to reuse it in the same expression:

passlambda((@,expr,@) => expr);

So it works here like _ (underscore) in Scala.

Note: the choice of “@” character is not carved in rock of course, but I had problems with reading single underscore character when writing in Scala. And besides, the shape of “@” reminds me of a vortex.

Tagged

Pointers and references

Apart from objects in-place (like in C++, or structs in C#) there are pointers and references in Skila. Pointers behave like pointers in C++, but references are just like pointers with one exception — they cannot be null. In C++ once you assigned reference, it couldn’t be change, so it was more like a alias to object. In Skila it can be reassigned, but one thing you cannot do is this:

my_ref = null.MyClass;

Note: nulls will be typed in Skila (there is no just null).

Tagged