Wednesday 28 August 2013

ColdFusion guessing at query column types

G'day:
I've been aware of this for a while, but I've only just got around to investigating it. ColdFusion guesses column types of queries when doing QoQ.

Consider this code:

thisUrlPath = CGI.script_name;
thisURlDir    = listDeleteAt(thisUrlPath, listLen(thisUrlPath ,"/"), "/") & "/";
remote = createObject("webservice", "http://#CGI.http_host##thisURlDir#Data.cfc?wsdl");

for (rows in [50,51]){
    try {
        records = remote.getRecords(rows);
        writeOutput("Test query has #records.recordCount# rows<br>");
        writeOutput("Is version column numeric: #isColumnNumeric(records, 'version')#<br>");
        lastId = new Query(dbtype="query", records=records, sql="SELECT MAX(id) AS maxId FROM records", maxrows=1).execute().getResult();
        writeOutput("last ID is: #lastId.maxId#<br>");
    }
    catch (any e){
        writeOutput("#e.type# exception: #e.message# #e.detail#<br>");
    }
    writeOutput("<hr>");
}

function isColumnNumeric(required query query, required string column){
    try {
        javacast("double[]", query[column]);
        return true;
    } catch (any e){
        return false;
    }
}

The remote service is thus:

component {

    remote query function getRecords(required numeric rows){
        var records = queryNew("id,version", "integer,varchar");    // I'm saying VARCHAR
        for (var i=1; i < rows; i++){
            queryAddRow(records);
            querySetCell(records, "id", i);
            querySetCell(records, "version", "9.0");                // could be numeric though
        }
        queryAddRow(records);
        querySetCell(records, "id", records.recordCount);
        querySetCell(records, "version", "9.0.1");                    // def not numeric

        return records;
    }

}

OK, so our remote service returns a recordset with id and version columns. The version column is varchar, and whilst some of the data could be considered numeric, it's not. And certainly not all the data is numeric.

The calling code takes this recordset and uses QoQ to get the maximum value in the id column. It does not reference the version column at all.

I have a loop that calls this code twice: one time testing with 50 rows in the query; one with 51.

And here is the output:

Test query has 50 rows
Is version column numeric: false
last ID is: 50

Test query has 51 rows
Is version column numeric: false
Expression exception: 9.0.1 must be interpretable as a valid number in the current locale.


So if there are 50 rows... CF correctly works out that whilst the data looks quite numeric... it's not, because the last row is clearly not numeric. However if there are 51 rows? No. Because it would seem that CF manually checks the first 50 rows and goes "yeah, close enough", and runs with that. Close enough? No, there is no "close enough" in computing. For pete's sake. What were they thinking? And in this case: what does it bloody matter? I'm not even using that column in my QoQ.

Also in the test code is a check to see whether the column data actually is numeric (or double, close enough). This tests the whole column. In one fell swoop. How come they can't do something like this?  NB: doing the try/catch approach is slightly grim, I know; but surely it's better than simply checking the first 50 rows?

Unsurprisingly, Railo is not quite so slipshod as ColdFusion, and it works properly.

All of this is down to the poor way CF serialises queries. Both via WDDX and JSON (I'm not sure which it uses by default when making web service calls? It doesn't matter) do not preserve the column types. This is the wrong thing to do given in CFML, query columns are one of the few places that data is (or can be ~) strongly-typed. That being the case: the data interchange mechanism needs to respect the type of the data as well.

Grumble.

And that's it for my lunch break.

--
Adam