Hi Roger,
See below.
Pomeroy, Roger wrote:
in which Rowan suggested:
Rowan Worth wrote:
I think I have a general solution for 3.2:
1) read one line
2) try and parse it
3) if you got an AST then you are done
4) if you got an exception and the lexer is at EOF, read another line
and go back to (2)
[...]
That approach has been working just fine,
except for one problem... how to report errors? If I get an error at
the end of a line, it may be because of a "real" syntax error, or just
because some construct (like an IF...ENDIF block) is incomplete.
No, there's no such ambiguity. Let me explain.
When an LALR parser eats a token (shifts a token, in parser
theory), it is because this token (and all the previously eaten token)
form a valid prefix of a sentence of the parsed language. In
other words, there exists some suffix that would result in a valid
parse. The parser reports an error as soon* as it reaches a token that
would break the valid prefix assumption.
Given this knowledge, we can now see that Rowan's trick works:
- If a parse error is found and the culprit token is not EOF, then
the error is not at the end of the line, and we should report it to the
user.
- If a parse error is found and the culprit token is EOF, we know
that all the line that preceded this EOF forms a valid prefix.
So, there is a chance that the next line can fix the problem. We cannot
report an error to the user, as we do not know for sure that the next
line won't fix the problem.
So, there you are. Simple, isn't it? Rowan's trick is really brilliant.
*: Actually, an LALR parser reports the error when a shift of the
erroneous token is attempted.
I don't want to put out lots of "spurious"
error messages to the user. I was wondering if there is a reasonably
easy way to know when I get an error building the AST what production
it thought it was working on when it failed. ( For example, if I knew
that it was an IF statement that was ill-formed, I could ignore the
error message because I know that IF statements require multiple lines)
An incomplete IF will cause a parse error on the EOF token, not before
(thanks to the valid prefix rule). You're safe here. :)
Have fun and let me know how it works out!
Etienne
--
Etienne M. Gagnon, Ph.D.
SableCC: http://sablecc.org
|