In the middle of writing my interpreter for ArabicBASIC, I was testing the expression grammar. I accidentally forgot a parenthesis which gave me an expression like “LET A = ((2 + 3) * 4“. Notice how the parentheses are mismatched: the final one after the 4 is missing. This should have provoked an error, but magically it still executed and calculated the correct result. I panicked a little thinking that I had made a fundamental error in my grammar.
Even worse, syntax such as “LET H = ” actually ran and assigned null to “H”. Oh, what in the world did I do? However, I quickly grasped that somehow the parser was overly greedy and consumed as many characters as needed for a partial match and then ran with it. Usually, this means one has forgotten to end the start rule with “EOF” or End of File token, per some unexpectedly polite people on StackOverflow. But I hadn’t forgotten it:

So, why the heck did I not get an error message. I scrolled up to the top of the console’s long output, and indeed there was an error message there :-) But, the interpreter didn’t bail out. Well, it’s a good thing I bought ANTLR4’s official documentation book: “The Definitive ANTLR4 Reference” which mentioned exactly this.
It turns out I have to implement an abstract method and throw a Java exception to force the parser to bail out upon encountering a syntax error. This is because ANTLR4’s runtime has automatic recovery mechanisms which can only be short-circuited by throwing a Java-level exception. Well, who would have thunk it.
Leave a comment