[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Problem with sablecc4.beta2
Hi Paco,
See below.
Our problem is that we want to implement a cobol grammar, by pieces.
First we implement five producctions, other day will include another
ones, an so on. Thus, we need a token like "line" that matches almost
every line (for the sections we dont want to parse, by now), but we
thought that token "IDENTIFICATION" has greater priority that token
'line'. One more thing to note is that token "line" doesn't appear as
a production.
Perhaps we must identify on the grammar every keyword, sentence,
identifier, etc instead of letting the parser scan 'lines' for some
chunks of source.
On a general note, what you are trying to do isn't the easiest project,
for a first attempt at writing a parser. I don't know enough about COBOL
to say how easy or difficult it is to parse, yet I would suggest that
you look around, on the internet, for an existing grammar (or even
parser) for it. At least, this could give you some ideas.
Now, you can tell SableCC 4 (but not SableCC 3) not to match lines that
contain the string "IDENTIFICATION":
Lexer
...
identification = 'IDENTIFICATION';
line = ((#x0000..#xffff)* cr? lf) Diff (Any* identification Any*);
...
Or, it may be more appropriate not to match lines that start with
"IDENTIFICATION", something like:
Lexer
...
identification = 'IDENTIFICATION';
line = ((#x0000..#xffff)* cr? lf) Diff (blank* identification Any*);
blank = ' ' | #9;
...
That should help you get started.
Have fun!
Etienne
--
Etienne M. Gagnon, Ph.D.
SableCC: http://sablecc.org