Sunday, 6 July 2014

Both ColdFusion and Railo make an incomplete job of validating email addresses

A question came up on StackOverflow just now "new to coldfusion. is there any function that defines Email is typed correctly" (how hard, btw, is it to give a coherent title to one's question? [furrow]).

My initial reaction was along the lines of "RTFM, and if you had, you'd see isValid() validates email addresses". However I also know that isValid() is very poorly implemented (see a google of " isvalid" by way of my findings/rantings on the matter).

So I decided to check how reliable it actually is.

The rules for what constitutes a valid email address a quite simply, and a laid out very clearly in various RFCs. Or if one can't be bothered doing that (although the engineers & Adobe and Railo should be bothered), then they're fairly comprehensively defined even on Wikipedia: "Email_address (Syntax)". I found that page by googling for it.

Anyway, I won't bore you by replicating the rules here, but I wrote some TestBox tests to test all the rules, and how isValid() supports them. The answer to this on both ColdFusion (11) and Railo (4.2) is "only to a moderate level". This is kinda what I expect of the Adobe ColdFusion Team (this is the level of quality they seem to aim for, generally, it seems), but am pretty let down by the Railo guys here.

I won't reproduce all the code here as it's long and boring, but it's on GitHub: "TestIsValidEmail.cfc".

Anyhow, the bottom lines are as follows:



Railo did slightly better, but ran substantially slower (running on same machine, same spec JVM).

I hasten to add that 50 of those tests were testing the validity of individual invalid characters, so that blows the failures out a bit (given they pretty much all failed). But both also failed on some fairly bog-standard stuff.

It's seriously as if it didn't occur to the engineers involved to actually implement the functionality as per the published specification; instead they simply made some shit up, based on what their own personal knowledge of what email addresses kinda look like. This is a bit slack, I'm afraid, chaps. And would just be so easy to do properly / thoroughly.

NB: there are rules around comments in email addresses that I only gave superficial coverage of, but I covered all the other rules at least superficially in these tests.

One new TestBox thing which I learned / noticed / twigged-to today... one can write test cases within loops. One cannot do that with MXUnit! eg:

describe("Tests for invalid ASCII characters", function(){
    for (var char in [" ", "(", ")", ",", ":", ";", "<", ">", "@", "[", "]"]){
        var testAddress = "";
        it("rejects [#char#]: [#testAddress#]", function(){
                isValid("email", testAddress)
        testAddress = '"example#char#example"';
        it("accepts [#char#] when quoted: [#testAddress#]", function(){ 
                isValid("email", testAddress)

Here I loop through a bunch of characters, and Here for each character perform two tests. This is possible because testBox uses function expressions for its tests, whereas MXUnit uses declared functions (which cannot be put in loops). This is not a slight on MXUnit at all, because it predates CFML's support for function expressions; it's more just that I'm so used to having to write individual test functions it did not occur to me that I could do this in TestBox until today. I guess there's nothing stopping one from writing one's test functions as function expressions in MXUnit either, that said. But this is the encouraged approach with TestBox.

Nice one, TestBox.

Anyway, this is not helping me prep for my presentation tomorrow. Give me a ballocking if you hear from me again today, OK?