Monthly Archives: September 2013

Polishing the parser generator

The best way to get a reason for improvement is using your own program — currently I constantly switch between Skila and NLT parser generator, because every feature I add makes my work on Skila much easier.

I added types definitions, so no longer I have to cast the symbol objects in the code blocks:

types
  expression Expression;
  IDENTIFIER IdentifierSymbol;
end

With such definitions I can use symbol object “expression” and C# code will know it is of type “Expression”.

With sets and sequences I can reduce amount of rules significantly — sets:

fparam -> [ r:REF s:SINK e:EXCL ]? id:IDENTIFIER 
          { new FunctionParameter(...) };

Here we see an optional set (very similar notion to regular expressions) — identifier can be preceded by “PARAM_REF” or “SINK” or “EXCLAMATION” or (set above is optional) nothing.

Once I implemented set, sequence was an obvious addition:

field -> id:IDENTIFIER t:type (EQ ex:expression)?
         { new ClassField(...) };

Type can be followed by “ASSIGN” and “expression”. Unlike set it is everything (entire sequence in given order) or — in case of optional sequence, like here — nothing.

On Skila front I made some progress working on reference types (class) but finishing it will take some more time.

Advertisement
Tagged , , ,

Grammar generator is coming to NLT

For now not very powerful, but already capable of more than existing builder. OK, it is not really generator, it just rewrites the rules into builder method calls but since I focus solely on grammar syntax it is fine with me.

Let’s take a look at the core — production rules:

data_type ->
             STRUCT
             { DataTypeEnum.ValueType }
           | CLASS
             { DataTypeEnum.ReferenceType }
           ;

The syntax is way more compact than written in pure C#, on the other hand the file format is not supported by any IDE so you cannot refactor the grammar or the code (enclosed in braces).

Of course this “generator” is used to build parser for Skila — and now I can write productions with optional symbols:

deco_identifier -> sink:SINK? name:IDENTIFIER
    { new IdentifierDecorated((IdentifierSymbol)name)
      { WithSink = (sink!=null) } };

Probably not the best example (I am writing this entry right after testing and uploading NLT), but it shows you can squeeze two productions into one — it is up to user which form is more readable.

This ability is the main reason why I created parser generator for NLT — one option in production is no brainer, but take two and you will have four productions. The number of rules quickly explodes as you have more and more options inside production.

Behind the scene single rule as shown above is rewritten into two regular rules — the difference is it is done by computer, not human. And with generator it is easy to implement because all it does is writing text — there are no boundaries for it. Handling such case with builder (just C#) is doable but leads to huge amounts of code — each permutation of optional symbols has to be created in advance by hand and put inside builder.

Since further enhancement of the builder would be a really fruitless effort I decided to go with “generator”. Next I will try to add value constraints, groups, then integrate lexer patterns and as last step I will make it true generator.

Tagged , ,

Detection of optional production

Consider such productions:

class_mod := /* empty */ | VIRTUAL | ABSTRACT
data_type := STRUCT | CLASS
type := class_mod data_type name

The problem lies in the very first entry — the empty one. Naive parser detects a conflict between “class_mod” and “data_type” on both “STRUCT” and “CLASS” when “class_mod” is empty. See below for possible actions (there are lookaheads given in braces):

class_mod = . { STRUCT , CLASS }
data_type = . STRUCT { STRUCT }
data_type = . CLASS { CLASS }

On one hand “data_type” can be shifted (once), on the other — “class_mod” can be reduced (multiple times).

The forking parser can be forced to solve this by trying both moves, but it is obvious it is not efficient, and since I hit this problem several times already I added detection of such scenario. Namely, if the conflicting reduce move is empty one and there is only one shift move parser will reduce and then — without rechecking action table — it will shift.

It works nice (there is no need to fiddle with dependency table) and it does not ruin my testsuite, but because I feel there are some checks needed for reliable detection of optional production I added this feature only to forking parser.

Tagged

struct/class — changing the course

After some considerations I decided to steer away from muddy waters of improving everything related to types and objects management in one take. It does not take the genius to notice it is a vast area to cover and I will have plenty of work merging C# value and reference types with C++ mutable/const attributes with non-/nullable pointers.

This is already the challenge because with such variety of features it is easy to produce some obscure syntax. So I changed my perspective — keep it simple, make one step at a time, scratch only if it itches.

struct” is value type in Skila exactly like it is in C#. There is richer annotation though:

var x Struct = new Struct();
const y Struct = new Struct();

The first line declares a variable, thus you can change the data of “x”. The second line declares a constant — this works like in C++, so it is logical immutability, not the bitwise one.

The same modifier can be applied when passing “struct” to a function:

def foo(var a Struct) ...
def bar(const b Struct) ...

By default the parameter data is assumed to be constant so you can drop “const” in the second line.

The first line is more interesting — it tells you can change the data and those changes will be visible on caller side. So from caller perspective it is a side effect — it should not go unnoticed and it doesn’t:

foo(!x);
bar(x);
bar(y);

Those are valid calls — just like with mutating methods there is exclamation mark added as acknowledgment of alteration of “x” variable.

I didn’t add ability to pass a copy of the variable which could be changed just inside the function (pass by value). Time will tell if it is needed, now I move to C# “class” — it is a bit more problematic, because the data can be constant, the reference can be constant, and a reference can be “null” — the syntax is just boiling over.

Tagged , , , , , ,

Reference & dereference — between C++ and C#

What I like about reference/dereference approach of C# is clean syntax — you work with data (struct) and pointers to data (class), yet in no place so far I made explicit mark that I have pointer or that I dereference a pointer. However the world of data in C# is so much divided that it takes an extra effort to write a method coping with structures and classes in generic way.

I would like to have in Skila more of C# model than C++, but not entirely — the level of control C++ provides is desirable. And each time I think of novel approach I come up with the same conclusion, so here we go…

The basic type definition would be “struct” with the same meaning as in C++. The usage:

var a Foo = new Foo(); 
var b ^Foo = new ^Foo();
var c ^^Foo = new ^^Foo();
// and so on

a” would be an instance of type (struct) “Foo”, “b” would be a pointer to “Foo”, and “c” would be a pointer to pointer to “Foo”.

Skila would have “class” keyword but unlike C# it would not mean a pointer to data, but pointer by default. You could be able to derive “class” from “struct” or vice versa, and of course you would be able to override default behaviour. Let’s assume “Bar” was defined as “class”:

var v *Bar = new *Bar(); 
var w Bar = new Bar();
var x ^Bar = new ^Bar();
// and so on

This is equivalent of Foo example — in the first line “vis data (because of dereference operator — “*”). “w” is a pointer to “Bar” (default behaviour for class), and “x” is a pointer to pointer to “Bar”.

We need one final touch — automatic referencing and dereferencing. This means you can overload functions on types but not on reference depth (some price has to be paid). The advantage is you have to pay attention when implementing receiver (type declaration, a function) and on callee side you can rely on the simplest syntax. For example:

var b ^Foo = new Foo();
var v *Bar = new Bar(); 

But:

var b = new ^Foo();
var v = new *Bar(); 

And function declaration and call:

def CallMe(f ^^Foo) Void;

CallMe(new Foo());

So far I didn’t see a language like this, so I hope overloading on reference depth won’t be deadly needed.

Tagged , , , ,