Tag Archives: open problem

More auto-magic comes to NLT

What is parser generator without even simplest form of repetition? I had to add it to NLT as well:

namespaces -> ns:namespace_+
              { new Namespaces(currCoords(),ns) };

Behind the scenes NLT creates a list — List<T> — where “T” is either “object” or the type defined for given symbol (here: “Namespace” for “ns”). So when I pass “ns” in code all it takes is making “Namespaces” class accept “IEnumerable<Namespace>”.

When I tried to write the production for parsing parameters I ran into a problem:

// INCORRECT
params_ -> (COMMA p:param)*
           ...

Of course this is incorrect, it says that the list of parameters starts with the comma.

// INVALID IN NLT
params_ -> p:param (COMMA p:param)*
           ...

I didn’t like it very much — not only I would have to make an exception of allowing duplicate names, but the user would have to repeat some of the symbols. Redundancy is bad. So I came up with an idea:

params_ -> (COMMA- p:param)*
           { new Parameters(currCoords(),p) };

Minus character tells NLT to exclude such symbol from initial repetition and the rule is written in compact way. I don’t know if this syntax will scale up to other problems, but for now it works.

One thing I miss here is local projection — the list of the parameters is created with raw parameters but on-fly transformations are helpful to express more complex scenarios:

// SUGGESTED SYNTAX
params_ := (COMMA- p:param -> {p.ToUpperCase()})*
           { new Parameters(currCoords(),p) };

The main action is the same, but the rule contains transformation for each parameter which is added to the list. I wrote it next to shuffle mode on my “to-do” list.

Another Skila-related issue emerged while working on NLT — there are a lot of constraints which are too complex to handle them in parser rules, however they do not entirely fit into semantic analysis (consider checking if all of the elements inside the group are not marked with minus sign).

Let‘s say I add another line to action — after creating an object I check it, and if it is invalid I raise an exception. It works, but it also means that an error during check actually stops all subsequent checks. NLT provides a mechanism for that — pass an extra argument, ParseControl, and set its state accordingly to the error. This approach has two problems — it requires more code and it does not prevent from using incorrect object.

I solved the first issue in the simplest way — I introduced “IParseControl” interface and implemented it in desired classes. This way I can create an object, validate its state in constructor and set state of the “IParseControl” part accordingly to the error. The action code is clean as before — just creating an object.

The remaining issue is how to prevent from using invalid object? Because it is really interesting not only for parsing I restate this problem fully:

How to create an object, validate it and if it is invalid forbid using it except of passing reference of it?

Maybe I am wrong but in C# world the solution is quite elaborate — let’s assume we are about to validate class “Foo”:

  • make all constructors private in “Foo”,
  • make all fields private in “Foo”,
  • inherit from “Foo” a new type “InvalidFoo” which overrides every property and every method with single statement — throwing an exception,
  • add static method “Create” to “Foo” — it creates instance of “Foo”, validates it, if everything is OK it returns it, but if not — it returns an instance of “InvalidFoo” instead.

I don’t see simpler solution in any language which does not support objects validation natively. I have in mind something similar to Ruby’s taint checking. And so this issue makes an entry in Skila “to-do” list.

Update 2013-11-06: while maybe it is interesting from theoretical point of view, I am puzzled that I didn’t saw a tree while standing in the forest. In NLT I simply even don’t need that mechanism, because once user returns invalid object no other user action is executed. In other words invalid object won’t make its way into any other object. Passing back and forth parse control object might look cleaner, but throwing an exception enforce more tight control and requires less code (1KB).

Advertisement
Tagged , , , , , ,

From goal-driven execution to macros

I keep reading how goal-driven execution model looks like, and while I really like natural feel of such condition:

if (a < b < c)

yesterday I found the piece of code which put me off. Consider printing out the just-read line in Icon:

write(read())

When read hits the EOF it fails, and thus entire chain of expressions fails — here it means write won’t be called. Let’s say you would like to catch the content of the line nevertheless, so you do a little rewrite:

line := read()
write(line)

OK, we have 2 lines, but the code basically does the same thing… Wait, it does not. It is not even close. Now when read hits the EOF the chain of expressions ends at the assignment, so it will be omitted and line will keep its previous value. Since the second line is completely unrelated it will be executed with the old line value.

Sure, I am surprised, and maybe part of the surprise is because I am not used to the way Icon works. But I think transparency of adding temporary objects to keep intermediate results is pretty fundamental concept, and Icon breaks it — something I cannot like.

From that point there are two approaches — give up or design improved goal-driven model (it might the same path though).

As for giving up I went back to my old idea of returning tuple — pair of function outcome and error:

if ((success:@,value) = dict["hello"]).success then
  // we have direct access to value
  // success is dropped

The syntax has a lot to be desired, so how about adding when construct which could handle exactly such cases:

when value = dict["hello"] then
  // we have direct access to value

This would be syntactic sugar only — pure internal rewrite to the if presented above.

And then it struck me — forget about Lisp homoiconicity, uniformity stupid! Lispers have to struggle with ugly syntax:

(if (= 4 (+ 2 2)) (...

instead of clean “syntax for the masses”:

if 4 = 2+2 then...

but Lisp syntax is uniform, and because of that Lispers don’t have to wait for anyone to bring new construct to the language. They add their when whenever they like — it blends in perfectly.

On the other hand in C-derivatives all you have is function-like expansion, you cannot add another for or while. Any attempt to bring Lisp-macro to C world would require allowing the user to alter the grammar of the language.

I didn’t solve anything with goal-driven model — instead I added yet another puzzle for myself. What to sacrifice in the design to bring power of macros to Skila?

If you find this topic interesting, more readings for you — The Nature of Lisp, Homoiconicity isn’t the point and Lisp: It’s Not About Macros, It’s About Read.

Tagged , , , , ,

Constraints on parameter value

From now on I will use term pointer both for not-null pointers and possible-null pointers. Just to avoid confusion with the mechanism I describe below.

You can declare a function parameter value with 3 types of constraints:

def foo(x MyClass)

is shorter form of:

def foo(val x MyClass)

The argument is passed by value and you have a guarantee you cannot change “x” inside “foo”. If you need speed and you know you work in single-threaded environment you could use a shortcut (both forms are shown):

def foo(ptr x MyClass)
def foo(ptr val x MyClass)

The argument is passed by pointer (fast), but since compiler guarantees the data is not changed inside a function, the only worry is whether the callee continues to work or not.

The second type:

def foo(var x MyClass)

The execution is slow (value is copied), you can change the value inside the function but the alterations do not leak outside.

And third one:

def foo(ref x MyClass)

Execution is fast and not only you can change the value inside, but any change is also visible outside the function (argument is passed via reference). However because a callee is affected you have to pass the argument consciously:

foo(ref x);

All the checking is done during compile time. However one constraint is missing — declaring a function which takes constant and forces callee it is constant as well. Too academic?

Consider creating a dictionary (a collection that maps keys on values). For fast retrieval usually a dictionary orders keys in some way — the algorithm is not important, all we know that keys are kept in some order, and dictionary uses this fact to skip a good portion of the keys when doing lookup. If you create dictionary and pass initial data, and then you change the keys on a callee side, the dictionary will not know about the change, and it won’t reorder the keys. Thus on next retrieval you can expect anything, which translates to “undefined behaviour”.
One way to solve this problem would be cloning all the keys, but this is inefficient and kills any attempt to optimize the algorithm (once you embed cloning in the dictionary you cannot get rid of it from the outside).

So far I see two ways to solve this — either add another constraint:

def foo(const x MyClass)

Meaning — I cannot change “x” inside, and I also require “x” is constant outside. In other words, the value has to be generally constant, not just in local scope. Since it is const two ways it can by passed silently via pointer (fast); note that passing the data explicitly by pointer (“ptr val”) does not give a guarantee on true constness.

Or go Ruby way and provide a method ”freeze” that would lock object against writing. I prefer the first approach (enforced during compilation) but I am not sure whether this is doable, it looks suspiciously tempting.

I like the idea of setting constraints on both sides, however even in this post I already see too rich syntax. Those constraints are much better if they are triggered automatically — for example knowing the call is done synchronously or not (for 100%) could allow automatic optimization how to pass data efficiently, by pointer or by value.

Tagged , , , ,

Fragments — multi-syntax language?

As an avid reader of old Joel on Software blog I am fully aware of the complain actually any developer can make:

They [people who don’t Get Things Done] will say, “Spreadsheets are really just a special case of programming language,” and then go off for a week and write a thrilling, brilliant whitepaper about the theoretical computational linguistic attributes of a spreadsheet as a programming language. Smart, but not useful.

Quoted from: The Guerrilla Guide to Interviewing.

I would like to be useful but after tinkering with syntax for collections I realized that maybe it is wrong path — one reason is I have only limited ASCII set so it is hard to satisfy all expressions I like to have and make them look nice at the same time. The other is if Skila is supposed to be really multi-purpose language can one syntax fit all the needs? Maybe another approach is required — one language, multiple syntaxes (one could think of the analogy of Java world — one byte code, multiple languages and thus multiple syntaxes).

I am not the first one with this idea, take for example TeX with its environments, once you start math mode all the symbols are treated differently than in the main block. In programming language it might pay off, because when writing general code you would like to be more self explanatory:

1/Math.Pow(x,2.0)

But if you have a function which does only heavy mathematical computation it could be more useful to make a switch:

#math
1/x^2
#end math

Who knows? It is an odd idea, but could be a right vehicle if you would like to have the same semantics but different syntax for given domain, you could say inner DSL.

Tagged , ,

Collections — short form syntax

Thanks to “Programming Clojure” by Stuart Halloway and Aaron Bedra I finally found syntax for all the basic collections.

Array (fixed size):
#[ 3, 4 ]

Vector (can be resized):
[ 3, 15 ]

Dictionary (map):
[ foo: 3, bar: 17 ]

Tuple:
( 3, 4 )

Named tuple (OK, it is exactly the same as regular tuple, I am just showing the syntax):
( foo: 3, bar: 4 )

And the last one — list:
{ 3, 89 }

I think all of them make sense — tuple syntax should be related to passing arguments to a function. List syntax can be associated with statements block — like in C-like languages.

One thing that bothers me is spending braces just for one purpose. So maybe lists should be created with brackets after all:
~[ 3, 4 ]

Tagged , ,

Modern constructor — how does it look?

For sure the way the constructors are written in C++ has to go — starting from naming constructor not by a name of the class, but simply “__construct” (like in PHP) or “init” (I would also like to add ability to return a value, but it is a minor thing as well). The real problem is how to initialize object fields in a constructor — briefly, yet in informative way?

Scala integrated constructor went too far — the code is very concise, true, but at the expense of the readability. Entire body of a class becomes a constructor and this means constructor parameters are leaking directly into all class methods.

C# is much better in this regard, I love its syntax:

new Person()
  {
    FirstName = "Joe",
    LastName = "Doe"
  };

The problem? It is not a constructor really (just the first line is). It is not only the matter of the name though, there are technical implications as well — “FirstName” and “LastName” cannot be readonly properties for example. But the clarity is hard to beat.

So the question for Skila is really this — how to bring C# syntax and make it real constructor? So far I came up only with an idea of macro constructor…

Mark the fields of the class which can be initialized by simple assignment and call constructor as usual — all parameters for marked fields are passed via magical “ini” tuple and initialized in one assignment.

class Foo
  var FirstName <- String("John");
  var LastName <- String;
  var BirthYear Int;

  def init(birthYear Int, $ini)
  do
    this.BirthYear = birthYear;
    this.$ini = $ini;
  end
end

new Foo(1918, FirstName : "Joe",
              LastName : "Doe");

Some of the features of macro constructor:

  • field pseudo-initialization serves as indication of optional argument, i.e. you could omit “FirstName” here when creating “Foo” object; it is not field initilization like in C#
  • when passing arguments via “ini” you have to name all arguments — in other words “ini” expands to a list of forced named parameters, and thanks to that it can always be safely placed as the last (pseudo) parameter,
  • because assignment in constructor is explicit you can decide when to call it.

Is this the best what can be done in 2013? Do you have better ideas?

Tagged , , , , ,

Function result — how to read it?

Lately I spent quite some time thinking about perfect “else if” syntax (don’t confuse it with “if-else”), and I just wonder how much time I will spend on this subject.

Let’s start from easy one — in all C-like languages function call and condition would like this:

if ((idx = str.indexOf("hello")) != -1)
  ...

This is ugly. Not only we have to add extra scope (not shown here) to avoid variable leak, but we are using magic number. Worse — what about such values as “-2”? They are valid for “idx” but they are invalid in context of “indexOf”.

Before you even warm up consider more difficult case — reading values from a map (associative array, dictionary):

value = dict["hello"];

This won’t fly, if there is no such key we will get an exception. So maybe like this:

if (dict.ContainsKey("hello"))
  value = dict["hello"];

OK, this is safe, but now we hit map twice. In case of C# the following is the best approach:

if (dict.TryGetValue("hello",out value))
  // we have valid value here

One can say — not so bad — but! First of all you have to mark your “trying” functions in some way to distinguish them from “regular” functions. For me it is not appealing to have bunch of “TryRead”, “TryConcat” and alike. Secondly, such syntax is inconsistent with already existing notion:

result ← expression

I have nothing against switching the flow (to — from left to right), however what bothers me here is inconsistency. I wouldn’t like to see this happening to Skila.

What are the other options? “Options” indeed:

if (!(value = dict["hello"]).isEmpty())
  Console.WriteLine(value.get());

Such code is efficient (kind of), with result being kept on left, but it is too elaborate. And all the time you use value, you have to use “get” method for it (or introduce temporary variable).

I was struggling with some mad ideas of dual variable, which is partially and implicitly converted to “bool”, or solving the issue with tuples (Skila style):

if ((success:@,value) = dict["hello"]).success then
  // we have direct access to value
  // success is dropped

But the concept of failure from Icon struck me as really elegant. Map indexer could “return” failure in case of missing key, and this would cause “if” to fail.

I didn’t write even “hello world” in Icon, so better read about Icon from somewhere else, Wikipedia may be a good starting point.

At first glance it looks, oh, so charming — I am afraid though that this beauty might come at cost of problematic maintenance, workflow which is hard to understand, not to mention complex implementation.

But this approach will be my first to investigate. As usual, if you have better, or simply other ideas worth considering, I am all ears.

Tagged , , ,

Operator — what is it?

Either I am missing something or there is not too much ink spent on the issue of picking up an operator from the sequence of symbols. Sadly to say, I am still guessing and improving my parser by trial&error approach when it comes to shift/reduce operator selection. Consider such case:

5 + 4 | * 3

where “|” character denotes boundary between stack and input. Assuming “*” is defined with higher priority than “+” it is easy to say we should shift. Sure, but how can parser tell that “+” is the operator to consider? Why for example does “*” not stand for reduce operator as well? Or what if we have the same sequence but written as:

NUM + NUM | * NUM

and “NUM” is defined in precedence table as well, thus leading to problem of resolving priority between “NUM” and “*”, instead of “+” and “*”? The bigger the precedence table, the more valid the issue is.

So many questions and so little answers…

And since I was just bitten by this, I solved the problem of choosing the right operator for reduce by considering the last global operator on the stack (within the symbols considered for reduction). The global operator is the one defined without specifying for which productions it should be applied. Example? Usually “+” is defined as-is, without worrying about productions, thus I would call it global operator. In contrast to this, take a look at those productions:

fq_identifier := IDENTIFIER
fq_identifier := fq_identifier COLON COLON IDENTIFIER
named_argument := IDENTIFIER COLON expression

The first two productions are for C++-like syntax for accessing static property of a class (simplified here), the last one is for passing a named expression to a function. The way they are written now, they will cause shift/reduce conflict on “COLON”. One solution would be to define global operator “COLON”, however this is crude. The more precise way would be to define “COLON” locally — for “fq_identifier” and “named_argument” only. This translates to ignoring “COLON” operator when searching on the stack.

Will it work reliably? I don’t know. If not — I have another two refinements in my pocket:

  • setting sets of related operators explicitly, this way user could separate “+” from “COLON”,
  • adding info for each operator where it should be looked for — on stack, in input, or both.

But! I am not a big fan of reinventing the wheel. So if you — dear reader — know any good resource about this subject, I would be grateful for letting me know about it.

Tagged