## LALR(k>1) — FIRST sets

Update: Thanks to help by Hendrik Jan after 4 days of struggle I can see my mistake — problem solved.

Three days ago I started working on LALR(k>1) to save a weekend solely for Skila development — unfortunately I underestimated the problem and now not only I feel dizzy but also I don’t see the end of the work. All three days went on algorithm for computing `FIRSTk` sets.

Surprisingly I didn’t find too many sources explaining how to compute `FIRSTk` sets for `k>1`. There is old “The Theory of Parsing, Translation, and Compiling (vol.1)” by Alfred V. Aho, and Jeffrey D. Ullman (yes, that’s 1972), a page “Review of LR(k) and LALR(k) Parsing Theory” by Sean E. O’Connor and a paper “An Algorithm for Constructing a Semi-LL(2) Grammar’s Parsing Table” by Yoshida Keiichi, and Takeuchi Yoshiko.

I have hard time deciphering the last one, and I cannot use the former two because the algorithm presented there (the same one) is in my opinion incorrect. If you don’t have a copy of Aho&Ullman please refer to the mentioned page by Sean E. O’Connor because the notation is the same.

The algorithm uses helper `Fi` sets to iteratively compute `FIRSTk` sets. We have:

`Fi(a) = { a }`

for every terminal `a` and index `i ≥ 0`. For non terminals starting point is defined as:

`F0(A) = { x | ∃ A := x α, |x| ≤ k }`

`x` denotes a sequence (possibly empty) of terminals, `α` denotes a sequence of symbols (terminals or non terminals) — also possibly empty.

Two things worth noting — there has to be such production `A := x α` defined, a derived production is not enough, and second thing — `x` is a greedy sequence, if it can match `k` terminals, it does.

At each iteration we get from `Fi(A)` set no smaller `Fi+1(A)` set — i.e. each element from `Fi(A)` is present in `Fi+1(A)`. The claim is when we reach to the point when `Fi(A)` is equal to `Fi+1(A)` we stop because `Fi(A)` is our desired `FIRSTk(A)` set.

And I don’t think this claim holds.

Since it says `FIRSTk(A) = Fi(A) ⊆ Fi-1(A) ⊆ ... ⊆ F0(A)` to support my doubt it is enough to find an element from `F0(A)` that does not belong to `FIRSTk(A)`. Consider such productions:

```A := a B B := b```

For `k=2` what is the `FIRSTk(A)` set?

`FIRSTk(A) = { a b } `

Note, there is no other string, the `FIRSTk(A)` set contains just one element. And how does the `F0(A)` look like?

`F0(A) = { a } `

This element is not part of the `FIRSTk(A)` and as counter example it shows the algorithm is incorrect.

The real question is — where did I make a mistake? Because when it comes to finding errors by Mr.Nobody in well established publications by well know authors, let’s face it — Mr.Nobody is more often wrong than right. I will be grateful for your comments.

Solution: I misinterpreted the way `x`’es are obtained for `A`’s — they can be shorter than `k` if and only if `α` is empty.

Tagged , ,