[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Parsing sub-grammars
First, thanks for your suggestion.
On Tue 2006-10-17 at 10:28h, you wrote:
:
> SableCC uses the visitor design pattern to handle interpretation of
> a program. You can write two different walkers; one will correctly
> parse every production; the other will sub-class it and will
> over-ride only the illegal productions (illegal according to the
> "small" grammar) and produce some error. This way maximizes code
> re-use.
This doesn't work, because the sub-grammar doesn't necessarily define
a sub-language in the set-theoretic sense. For example a valid Java
expression is not a valid Java class definition, hence you cannot use
the grammar for Java class definitions to parse Java expressions,
although the latter occur within the former.
Another problem with restricting a grammar ex post is that you can get
inappropriate error messages if the input is invalid. For example it's
possible to get the error message "expected: FOO" for a token FOO that
doesn't occur at all in the restricted grammar.
It seems that SableCC 4.0 will have a "Start Productions"
section where you can specify several start productions (see
http://sablecc.org/lists/sablecc-user/2006-June/000389.html).
In the meantime I found a workaround that works as follows:
For each of the (sub-)grammars to be parsed, define a special token,
then add a new start production as follows:
new_start =
special_token_1 sub_grammar_1_start |
special_token_2 sub_grammar_2_start |
special_token_3 sub_grammar_3_start ;
Add an initial lexer state for recognizing the special token and then
switch to the appropriate regular lexer state. (This ensures that the
special tokens don't interfere with the subsequent lexing.)
Finally write a FilterReader that inserts a special token at the
beginning of the character sequence being read. When invoking the
parser, wrap the source Reader in the appropriate FilterReader
according to the sub-grammar you want to parse.
-- Niklas Matthies