[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Problem with sablecc4.beta2



Hi Paco,

See below.

Our problem is that we want to implement a cobol grammar, by pieces. First we implement five producctions, other day will include another ones, an so on. Thus, we need a token like "line" that matches almost every line (for the sections we dont want to parse, by now), but we thought that token "IDENTIFICATION" has greater priority that token 'line'. One more thing to note is that token "line" doesn't appear as a production.

Perhaps we must identify on the grammar every keyword, sentence, identifier, etc instead of letting the parser scan 'lines' for some chunks of source.

On a general note, what you are trying to do isn't the easiest project, for a first attempt at writing a parser. I don't know enough about COBOL to say how easy or difficult it is to parse, yet I would suggest that you look around, on the internet, for an existing grammar (or even parser) for it. At least, this could give you some ideas.

Now, you can tell SableCC 4 (but not SableCC 3) not to match lines that contain the string "IDENTIFICATION":

Lexer

... identification = 'IDENTIFICATION';
 line = ((#x0000..#xffff)* cr? lf) Diff (Any* identification Any*);
...

Or, it may be more appropriate not to match lines that start with "IDENTIFICATION", something like:

Lexer

... identification = 'IDENTIFICATION';
 line = ((#x0000..#xffff)* cr? lf) Diff (blank* identification Any*);
 blank = ' ' | #9;
...

That should help you get started.

Have fun!

Etienne

--
Etienne M. Gagnon, Ph.D.
SableCC:                                            http://sablecc.org