Monday 25 May 2015

CFML / Lucee: beware of "optional" semi-colons

G'day:
Not what I had in mind writing up today, but as often is the way... the easiest way to find bugs in CFML is it try to use it.

Railo (all the way back in 3.1! "RAILO-186", "What's new in Railo 3.1") and then Lucee have made claims that semi-colons are optional in their flavour of CFML. This would be cool if it was true, but given it's only kinda true, it kinda makes the feature unusable. I don't think a syntax rule that is kind of true is a valid one.

Here's an example of where it doesn't work:

param foo=1
param bar=2

CFML doesn't get much simpler than that, but this yields:



If one then goes and adds the semi-colons back in, the code works.

Basically the parser doesn't seem to work out where the statement ends, and tries to process the whole lot as a single statement. This is a bit of a logic flaw in the implementation. I think the logic behind how the parser determines whether it's got a statement:

  • an unquoted semi-colon is definitely a statement terminator
  • a new line character might be a statement terminator, if what's been captured since the last statement terminator is a syntactically valid statement
  • if it's not (and only if it's not) then continue the statement onto the next line.

This would mean, though, that there would be some constructs that could not be split over multiple lines if a given line break might make a syntactically valid construct, eg:

param    name="foo"
        default=1;

So maybe the parser needs to be "greedy" first, and only if what it captures isn't compilable does it then - instead of just giving up - backtrack until it's got something compilable; or fail if it backtracks all the way to the beginning of the statement.

I dunno... I'm not in the business of writing code parsers.

I can't help thinking, btw, that this is an artifact of adding this shitty "tags without angle brackets" syntax into CFScript. Had they just done the initial drop of implementing CFScript thoughtfully instead of hastily, then this would never have happened.

Still, as the truism goes: we are where we are.

I'm used to seeing parsing errors when I try to not use semi-colons (which, for some reason, I still do on occasion), and this would not be comment-worthy in and of itself.

Today though I fell victim of rather odd behaviour wherein the parser didn't choke, but it compiled the code wrong. Which then worked... but not properly. This was easy to spot in my example as I was still on the first statement of my code, but this could lead to more complicated unexpected behaviour for people in other situations, so I thought I'd mention it.

[1,2,3].map(function(){
    dump(arguments)
    abort
})

What do you reckon this dumps out? The first iteration's arguments, right? Wrong. It dumps out this:



So the abort was being ignored. Initially I thought that - for some reason - abort was not allowed in the callback for an iteration method (which seemed dumb, but still... it'd not be the first dumb thing in CFML), so I ran the equivalent code on ColdFusion, and it worked fine (ie: it aborted after the first dump).

Then I tried it on Lucee 4.5 instead, in case it was a bug in Lucee 5. This time I got the tip-off from the result:


This sort of error message is a tip-off that Lucee's messed up its parsing, so I popped the semi-colons back in, and then the code worked fine. The thing is, on Lucee 5 without the error message, this becomes a bit harder to troubleshoot. I'm just lucky I've got other CFML installs available to compare and contrast results with. I really shouldn't have to do this though.

I'll raise some bugs for this lot shortly (one each for 4.5 and 5)... actually I rolled them into the one bug: LDEV-365.

One might think that this sort of thing is an edge case, but in my original code I was working on, I had just started and I had a total of three statements: two params and a map(). And the map() was pretty much what you saw above (I can never remember which order map() passes in the args to the callback, so always dump 'em out).

So in a file which is 11 lines long, I was affected by this issue three times.

In conclusion: the "optional semi-colon" thing is not fit for purpose, and should be avoided. Given it's been around for over five years, I can only assume no-one uses it anyhow, or the people that do are the unhelpful sorts who don't pipe-up when they encounter issues.

Anyway, on with the code I was meaning to write today (for a different blog article, which won't now be ready today. Grumble).

Update:

I've also raised an issue (copying an already-existing Railo one) to be able to disable the optionality of semi-colons in Lucee: LDEV-369.

--
Adam