**Update:** Thanks to help by Hendrik Jan after 4 days of struggle I can see my mistake — problem solved.

Three days ago I started working on LALR(k>1) to save a weekend solely for Skila development — unfortunately I underestimated the problem and now not only I feel dizzy but also I don’t see the end of the work. All three days went on algorithm for computing *FIRST*_{k}

sets.

Surprisingly I didn’t find too many sources explaining how to compute *FIRST*_{k}

sets for *k>1*

. There is old “The Theory of Parsing, Translation, and Compiling (vol.1)” by Alfred V. Aho, and Jeffrey D. Ullman (yes, that’s 1972), a page “Review of LR(k) and LALR(k) Parsing Theory” by Sean E. O’Connor and a paper “An Algorithm for Constructing a Semi-LL(2) Grammar’s Parsing Table” by Yoshida Keiichi, and Takeuchi Yoshiko.

I have hard time deciphering the last one, and I cannot use the former two because the algorithm presented there (the same one) is in my opinion incorrect. If you don’t have a copy of Aho&Ullman please refer to the mentioned page by Sean E. O’Connor because the notation is the same.

The algorithm uses helper *F*_{i}

sets to iteratively compute *FIRST*_{k}

sets. We have:

*F*_{i}(a) = { a }

for every terminal *a*

and index *i ≥ 0*

. For non terminals starting point is defined as:

*F*_{0}(A) = { x | ∃ A := x α, |x| ≤ k }

*x*

denotes a sequence (possibly empty) of terminals, *α*

denotes a sequence of symbols (terminals or non terminals) — also possibly empty.

Two things worth noting — there has to be such production *A := x α*

defined, a derived production is not enough, and second thing — *x*

is a greedy sequence, if it can match *k*

terminals, it does.

At each iteration we get from *F*_{i}(A)

set no smaller *F*_{i+1}(A)

set — i.e. each element from *F*_{i}(A)

is present in *F*_{i+1}(A)

. The claim is when we reach to the point when *F*_{i}(A)

is equal to *F*_{i+1}(A)

we stop because *F*_{i}(A)

is our desired *FIRST*_{k}(A)

set.

And I don’t think this claim holds.

Since it says *FIRST*_{k}(A) = F_{i}(A) ⊆ F_{i-1}(A) ⊆ ... ⊆ F_{0}(A)

to support my doubt it is enough to find an element from *F*_{0}(A)

that does not belong to *FIRST*_{k}(A)

. Consider such productions:

*A := a B*

B := b

For *k=2*

what is the *FIRST*_{k}(A)

set?

*FIRST*_{k}(A) = { a b }

Note, there is no other string, the *FIRST*_{k}(A)

set contains just one element. And how does the *F*_{0}(A)

look like?

*F*_{0}(A) = { a }

This element is not part of the *FIRST*_{k}(A)

and as counter example it shows the algorithm is incorrect.

The real question is — where did **I** make a mistake? Because when it comes to finding errors by Mr.Nobody in well established publications by well know authors, let’s face it — Mr.Nobody is more often wrong than right. I will be grateful for your comments.

**Solution:** I misinterpreted the way *x*

’es are obtained for *A*

’s — they can be shorter than *k*

if and only if *α*

is empty.