**Update:** Thanks to help by Hendrik Jan after 4 days of struggle I can see my mistake — problem solved.

Three days ago I started working on LALR(k>1) to save a weekend solely for Skila development — unfortunately I underestimated the problem and now not only I feel dizzy but also I don’t see the end of the work. All three days went on algorithm for computing

sets.*FIRST _{k}*

Surprisingly I didn’t find too many sources explaining how to compute

sets for *FIRST _{k}*

*k>1*

. There is old “The Theory of Parsing, Translation, and Compiling (vol.1)” by Alfred V. Aho, and Jeffrey D. Ullman (yes, that’s 1972), a page “Review of LR(k) and LALR(k) Parsing Theory” by Sean E. O’Connor and a paper “An Algorithm for Constructing a Semi-LL(2) Grammar’s Parsing Table” by Yoshida Keiichi, and Takeuchi Yoshiko.I have hard time deciphering the last one, and I cannot use the former two because the algorithm presented there (the same one) is in my opinion incorrect. If you don’t have a copy of Aho&Ullman please refer to the mentioned page by Sean E. O’Connor because the notation is the same.

The algorithm uses helper

sets to iteratively compute *F _{i}*

*FIRST*_{k}

sets. We have:*F _{i}(a) = { a }*

for every terminal

and index *a*

. For non terminals starting point is defined as:*i ≥ 0*

*F _{0}(A) = { x | ∃ A := x α, |x| ≤ k }*

denotes a sequence (possibly empty) of terminals, *x*

denotes a sequence of symbols (terminals or non terminals) — also possibly empty. *α*

Two things worth noting — there has to be such production

defined, a derived production is not enough, and second thing — *A := x α*

is a greedy sequence, if it can match *x*

terminals, it does.*k*

At each iteration we get from

set no smaller *F _{i}(A)*

*F*_{i+1}(A)

set — i.e. each element from *F*_{i}(A)

is present in *F*_{i+1}(A)

. The claim is when we reach to the point when *F*_{i}(A)

is equal to *F*_{i+1}(A)

we stop because *F*_{i}(A)

is our desired *FIRST*_{k}(A)

set.And I don’t think this claim holds.

Since it says

to support my doubt it is enough to find an element from *FIRST _{k}(A) = F_{i}(A) ⊆ F_{i-1}(A) ⊆ ... ⊆ F_{0}(A)*

*F*_{0}(A)

that does not belong to *FIRST*_{k}(A)

. Consider such productions:*A := a B
B := b*

For

what is the *k=2*

set? *FIRST _{k}(A)*

*FIRST _{k}(A) = { a b } *

Note, there is no other string, the

set contains just one element. And how does the *FIRST _{k}(A)*

*F*_{0}(A)

look like?*F _{0}(A) = { a } *

This element is not part of the

and as counter example it shows the algorithm is incorrect. *FIRST _{k}(A)*

The real question is — where did **I** make a mistake? Because when it comes to finding errors by Mr.Nobody in well established publications by well know authors, let’s face it — Mr.Nobody is more often wrong than right. I will be grateful for your comments.

**Solution:** I misinterpreted the way

’es are obtained for *x*

’s — they can be shorter than *A*

if and only if *k*

is empty.*α*