[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Excluding sequences of character using SableCC



Hi Cameron,

There is no "elegant" solution without a subtraction operator on regular
expressions.  Yet, as long as you can write the regular expression using
"-", there exists an equivalent (often inelegant) regular expression.

As an example of how to write such a regular expression, look at the
definition of C-like comments in the Java grammar on the web site.  It
is equivalent to the following regular expression:

 comment = '/*' (any* - ( any* '*/' any*)) '*/' ;

Etienne

Cameron Ross wrote:
> I'm trying to derive a sablecc grammar from an existing EBNF specification.
> 
> The EBNF contains something like this:
> 
>    backslash= '\';
>    digit = '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9';
>    numeral = digit, {digit};
>    char = digit | 'A' | 'B' | 'C' | 'D' | 'E' | 'F' | 'G' | 'H' | 'I' |
> 'J' | 'K' | 'L' | 'M' | 'N' | 'O' | 'P' | 'Q' | 'R' | 'S' | 'T' | 'U' |
> 'V' | 'W' | 'X' | 'Y' | 'Z';
>    dotschar = '...', {char};
>    reserved = 'and' | 'or' | 'not';
>    sequence = (char {char | backslash}), - (reservedelement | numeral |
> dotschar);
> 
> I believe that the intention of the last statement is as follows:
>    - sequence should not be reserved
>    - sequence should not contain only digits
>    - sequence should not start with 3 dots
> 
> The sablecc example I've seen only use the except-symbol (-) for
> excluding characters and character sets from other character sets.  How
> would sequence be defined in a sablecc grammar?

-- 
Etienne M. Gagnon, Ph.D.            http://www.info2.uqam.ca/~egagnon/
SableVM:                                       http://www.sablevm.org/
SableCC:                                       http://www.sablecc.org/