Update: Thanks to help by Hendrik Jan after 4 days of struggle I can see my mistake — problem solved.
Three days ago I started working on LALR(k>1) to save a weekend solely for Skila development — unfortunately I underestimated the problem and now not only I feel dizzy but also I don’t see the end of the work. All three days went on algorithm for computing FIRSTk
sets.
Surprisingly I didn’t find too many sources explaining how to compute FIRSTk
sets for k>1
. There is old “The Theory of Parsing, Translation, and Compiling (vol.1)” by Alfred V. Aho, and Jeffrey D. Ullman (yes, that’s 1972), a page “Review of LR(k) and LALR(k) Parsing Theory” by Sean E. O’Connor and a paper “An Algorithm for Constructing a Semi-LL(2) Grammar’s Parsing Table” by Yoshida Keiichi, and Takeuchi Yoshiko.
I have hard time deciphering the last one, and I cannot use the former two because the algorithm presented there (the same one) is in my opinion incorrect. If you don’t have a copy of Aho&Ullman please refer to the mentioned page by Sean E. O’Connor because the notation is the same.
The algorithm uses helper Fi
sets to iteratively compute FIRSTk
sets. We have:
Fi(a) = { a }
for every terminal a
and index i ≥ 0
. For non terminals starting point is defined as:
F0(A) = { x | ∃ A := x α, |x| ≤ k }
x
denotes a sequence (possibly empty) of terminals, α
denotes a sequence of symbols (terminals or non terminals) — also possibly empty.
Two things worth noting — there has to be such production A := x α
defined, a derived production is not enough, and second thing — x
is a greedy sequence, if it can match k
terminals, it does.
At each iteration we get from Fi(A)
set no smaller Fi+1(A)
set — i.e. each element from Fi(A)
is present in Fi+1(A)
. The claim is when we reach to the point when Fi(A)
is equal to Fi+1(A)
we stop because Fi(A)
is our desired FIRSTk(A)
set.
And I don’t think this claim holds.
Since it says FIRSTk(A) = Fi(A) ⊆ Fi-1(A) ⊆ ... ⊆ F0(A)
to support my doubt it is enough to find an element from F0(A)
that does not belong to FIRSTk(A)
. Consider such productions:
A := a B
B := b
For k=2
what is the FIRSTk(A)
set?
FIRSTk(A) = { a b }
Note, there is no other string, the FIRSTk(A)
set contains just one element. And how does the F0(A)
look like?
F0(A) = { a }
This element is not part of the FIRSTk(A)
and as counter example it shows the algorithm is incorrect.
The real question is — where did I make a mistake? Because when it comes to finding errors by Mr.Nobody in well established publications by well know authors, let’s face it — Mr.Nobody is more often wrong than right. I will be grateful for your comments.
Solution: I misinterpreted the way x
’es are obtained for A
’s — they can be shorter than k
if and only if α
is empty.