Operator — what is it?

Either I am missing something or there is not too much ink spent on the issue of picking up an operator from the sequence of symbols. Sadly to say, I am still guessing and improving my parser by trial&error approach when it comes to shift/reduce operator selection. Consider such case:

5 + 4 | * 3

where “|” character denotes boundary between stack and input. Assuming “*” is defined with higher priority than “+” it is easy to say we should shift. Sure, but how can parser tell that “+” is the operator to consider? Why for example does “*” not stand for reduce operator as well? Or what if we have the same sequence but written as:

NUM + NUM | * NUM

and “NUM” is defined in precedence table as well, thus leading to problem of resolving priority between “NUM” and “*”, instead of “+” and “*”? The bigger the precedence table, the more valid the issue is.

So many questions and so little answers…

And since I was just bitten by this, I solved the problem of choosing the right operator for reduce by considering the last global operator on the stack (within the symbols considered for reduction). The global operator is the one defined without specifying for which productions it should be applied. Example? Usually “+” is defined as-is, without worrying about productions, thus I would call it global operator. In contrast to this, take a look at those productions:

fq_identifier := IDENTIFIER
fq_identifier := fq_identifier COLON COLON IDENTIFIER
named_argument := IDENTIFIER COLON expression

The first two productions are for C++-like syntax for accessing static property of a class (simplified here), the last one is for passing a named expression to a function. The way they are written now, they will cause shift/reduce conflict on “COLON”. One solution would be to define global operator “COLON”, however this is crude. The more precise way would be to define “COLON” locally — for “fq_identifier” and “named_argument” only. This translates to ignoring “COLON” operator when searching on the stack.

Will it work reliably? I don’t know. If not — I have another two refinements in my pocket:

  • setting sets of related operators explicitly, this way user could separate “+” from “COLON”,
  • adding info for each operator where it should be looked for — on stack, in input, or both.

But! I am not a big fan of reinventing the wheel. So if you — dear reader — know any good resource about this subject, I would be grateful for letting me know about it.

Advertisement
Tagged

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: