[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Problem with sablecc4.beta2



Thanks for your fast response, and sorry that we are late (weekend, you know).

We are trying to validate near twenty thousand Cobol programs and we need to parse them to make a simple migration to another platform / language.

We like sablecc because of its clean generated classes, among other things (you did a good work).

Our problem is that we want to implement a cobol grammar, by pieces. First we implement five producctions, other day will include another ones, an so on. Thus, we need a token like "line" that matches almost every line (for the sections we dont want to parse, by now), but we thought that token "IDENTIFICATION" has greater priority that token 'line'. One more thing to note is that token "line" doesn't appear as a production.

Perhaps we must identify on the grammar every keyword, sentence, identifier, etc instead of letting the parser scan 'lines' for some chunks of source.

Probably we are a bit impatient and we should read your thesis ;-)

Thanks in advance.


On Fri, Jun 11, 2010 at 5:35 PM, Etienne M. Gagnon <egagnon@xxxxxxxxx> wrote:
Hi Paco,

SableCC seeks longest string that matches a token definition and then uses precedence to select a token among those whose definition is matched by this string.

Now, ' IDENTIFICATION\n' is longer than ' '. What were you expecting? Was it: ' ', 'IDENTIFICATION', and '\n' ? In other words, were you expecting 3 tokens?

Could you tell us more about what you are trying to do (e.g. what the general language you're trying to parse look like)?

Have fun!

Etienne

Paco Belloso wrote:
We have a problem with the new beta version of sablecc with the following rules (also attached to message).
The parser throws a ParserException:

Exception in thread "main" language_testcase.ParserException: unexpected token '  IDENTIFICATION
 ' on line 1, pos 1
    at language_testcase.L_0.getTarget(Parser.java:151)
    at language_testcase.Parser.shift(Parser.java:51)
    at language_testcase.L_0.apply(Parser.java:132)
    at language_testcase.Parser.parse(Parser.java:22)
    at TestCase.main(TestCase.java:13)

But if we remove the token "line" and its references, the problem disappears.

Note that the same grammar adapted to version 3.2 of sablecc produces the same behaviour.

What erroneous rule are we setting that causes this problem?. What can we do to make sablecc to take the tokens "tag_identification" and "ignorable" instead of "line" token.

-----------------------------------------------------------------------------------------------

The grammar rules:

Language testcase;

Lexer

   unicode_input_character = #0..#xffff;
   tb  = #x0009;
   lf  = #x000a;
   cr  = #x000d;
   sp  = ' ';

   tag_identification = 'IDENTIFICATION';
   line = (#x0000..#xffff)* cr? lf;
     
   ignorable = (sp | cr | lf | tb)+;

Token
   tag_identification,
   line;
       
Ignored
   ignorable;
     
Priority

   line > ignorable;

Parser

  source =
    identification;
   
  identification =
    tag_identification;




 The test case source file:

  IDENTIFICATION
 

The java Walker:

 
import java.io.FileReader;

import language_testcase.Node;
import language_testcase.Parser;
import language_testcase.Walker;

public class TestCase extends Walker
{
  public static void main(String args[]) throws Exception
  {
    TestCase tc = new TestCase();
    Parser parser = new Parser(new FileReader(args[0]));
    Node ast = parser.parse();
          
    ast.apply(tc);
        
  }
}


_______________________________________________ SableCC-Discussion mailing list

-- 
Etienne M. Gagnon, Ph.D.
SableCC:                                            http://sablecc.org

_______________________________________________
SableCC-Discussion mailing list
SableCC-Discussion@xxxxxxxxxxxxxxxxx
http://lists.sablecc.org/listinfo/sablecc-discussion