LALR(k>1) — FIRST sets

Update: Thanks to help by Hendrik Jan after 4 days of struggle I can see my mistake — problem solved.

Three days ago I started working on LALR(k>1) to save a weekend solely for Skila development — unfortunately I underestimated the problem and now not only I feel dizzy but also I don’t see the end of the work. All three days went on algorithm for computing FIRSTk sets.

Surprisingly I didn’t find too many sources explaining how to compute FIRSTk sets for k>1. There is old “The Theory of Parsing, Translation, and Compiling (vol.1)” by Alfred V. Aho, and Jeffrey D. Ullman (yes, that’s 1972), a page “Review of LR(k) and LALR(k) Parsing Theory” by Sean E. O’Connor and a paper “An Algorithm for Constructing a Semi-LL(2) Grammar’s Parsing Table” by Yoshida Keiichi, and Takeuchi Yoshiko.

I have hard time deciphering the last one, and I cannot use the former two because the algorithm presented there (the same one) is in my opinion incorrect. If you don’t have a copy of Aho&Ullman please refer to the mentioned page by Sean E. O’Connor because the notation is the same.

The algorithm uses helper Fi sets to iteratively compute FIRSTk sets. We have:

Fi(a) = { a }

for every terminal a and index i ≥ 0. For non terminals starting point is defined as:

F0(A) = { x | ∃ A := x α, |x| ≤ k }

x denotes a sequence (possibly empty) of terminals, α denotes a sequence of symbols (terminals or non terminals) — also possibly empty.

Two things worth noting — there has to be such production A := x α defined, a derived production is not enough, and second thing — x is a greedy sequence, if it can match k terminals, it does.

At each iteration we get from Fi(A) set no smaller Fi+1(A) set — i.e. each element from Fi(A) is present in Fi+1(A). The claim is when we reach to the point when Fi(A) is equal to Fi+1(A) we stop because Fi(A) is our desired FIRSTk(A) set.

And I don’t think this claim holds.

Since it says FIRSTk(A) = Fi(A) ⊆ Fi-1(A) ⊆ ... ⊆ F0(A) to support my doubt it is enough to find an element from F0(A) that does not belong to FIRSTk(A). Consider such productions:

A := a B
B := b

For k=2 what is the FIRSTk(A) set?

FIRSTk(A) = { a b }

Note, there is no other string, the FIRSTk(A) set contains just one element. And how does the F0(A) look like?

F0(A) = { a }

This element is not part of the FIRSTk(A) and as counter example it shows the algorithm is incorrect.

The real question is — where did I make a mistake? Because when it comes to finding errors by Mr.Nobody in well established publications by well know authors, let’s face it — Mr.Nobody is more often wrong than right. I will be grateful for your comments.

Solution: I misinterpreted the way x’es are obtained for A’s — they can be shorter than k if and only if α is empty.

Tagged , ,

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: