[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Line by Line Mode error messages



Hi Roger,

See below.

Pomeroy, Roger wrote:
I have an application that needs to read line by line from a console... so I was using the approach described previously in http://lists.sablecc.org/pipermail/sablecc-discussion/msg00316.html 
in which Rowan suggested:
Rowan Worth wrote:
  I think I have a general solution for 3.2:
1) read one line
2) try and parse it
3) if you got an AST then you are done
4) if you got an exception and the lexer is at EOF, read another line
and go back to (2)
[...]
 
That approach has been working just fine, except for one problem... how to report errors?  If I get an error at the end of a line, it may be because of a "real" syntax error, or just because some construct (like an IF...ENDIF block) is incomplete.

No, there's no such ambiguity. Let me explain.

When an LALR parser eats a token (shifts a token, in parser theory), it is because this token (and all the previously eaten token) form a valid prefix of a sentence of the parsed language. In other words, there exists some suffix that would result in a valid parse. The parser reports an error as soon* as it reaches a token that would break the valid prefix assumption.

Given this knowledge, we can now see that Rowan's trick works:
  1. If a parse error is found and the culprit token is not EOF, then the error is not at the end of the line, and we should report it to the user.
  2. If a parse error is found and the culprit token is EOF, we know that all the line that preceded this EOF forms a valid prefix. So, there is a chance that the next line can fix the problem. We cannot report an error to the user, as we do not know for sure that the next line won't fix the problem.
So, there you are. Simple, isn't it? Rowan's trick is really brilliant.

*: Actually, an LALR parser reports the error when a shift of the erroneous token is attempted.

I don't want to put out lots of "spurious" error messages to the user.  I was wondering if there is a reasonably easy way to know when I get an error building the AST what production it thought it was working on when it failed. ( For example, if I knew that it was an IF statement that was ill-formed, I could ignore the error message because I know that IF statements require multiple lines)

An incomplete IF will cause a parse error on the EOF token, not before (thanks to the valid prefix rule). You're safe here. :)
Have fun and let me know how it works out!

Etienne
-- 
Etienne M. Gagnon, Ph.D.
SableCC:                                            http://sablecc.org