Friday, 28 June 2013

ColdFusion vs JSON. How to make a developer cry

G'day:
In general on this blog, I've tried to steer clear of discussing what I'm currently doing at work, but I'm so pissed-off by all this, I have to share.

Right, so I'm in the unfortunate position of having to use ColdFusion 9's JSON support. Which is woeful.

Firstly, check this out...

// helper function for populating a whole query row in one hit
function queryAppendRow(required query query, required struct data){
    queryAddRow(query);
    for (var key in data){
        querySetCell(query, key, data[key]);
    }
}


// create a query with some sample data, including some null values
records = queryNew("id,strData,intData", "Integer,Varchar,Integer");
queryAppendRow(records, {id=1, strData="tahi", intData=1});    // no nulls
queryAppendRow(records, {id=2, intData=2});                    // null in string column
queryAppendRow(records, {id=3, strData="toru"});            // null in integer column

// serialise it, then deserialise it again
json = serializeJson(records, true);
deserialisedRecords = deserializeJson(json, false);

// what have we got?
writeDump([
    records,
    json,
    deserialisedRecords
]);

And this outputs:

array
1
query
IDINTDATASTRDATA
111tahi
222[empty string]
33[empty string]toru
2{"ROWCOUNT":3,"COLUMNS":["ID","STRDATA","INTDATA"],"DATA":{"ID":[1,2,3],"STRDATA":["tahi",null,"toru"],"INTDATA":[1,2,null]}}
3
query
IDINTDATASTRDATA
111tahi
222null
33nulltoru

The key thing to note here is the nulls in the initial data, and the string "null" in the deserialised data.

Now... before I start I will point out that this has been fixed in CF10 (3043777). But still... it being fixed in CF10 is no bloody help to me in CF9, and - seriously - WHAT THE F*** WERE ADOBE THINKING? It's bad enough that ColdFusion is stuck with this half-arsed null support and seemingly thinking "an empty string is close enough to a null so that'll be fine" in places. But deciding that using the word "null" in place of a null was something that should be done? You're having a laugh. Except no-one else is laughing. This isn't, btw, just an accidental glitch. This behaviour is documented. And in fact it's still documented - erroneously - as being the behaviour in CF10. The docs say this:

Usage
[...]
The DeserializeJSON function converts each JSON data type directly into the equivalent ColdFusion data type, as follows:
[...]
  • The JSON null value becomes the string null.
So they actually meant to do that. I despair.

We needed to fix this, because - for caching purposes - we need to (de)serialise query data reliably, so we came up with a function to modify the null entries in the JSON, so they would not get changed to "null" in the deserialisation process. It's important to observe that the serialisation is correct, it's the deserialisation that's broken. However it's easier to "fix" this by altering the JSON (which we can do in one hit), rather than having to re-parse the recordsets once they're converted back. Our UDF is as follows:

public string function cleanJson(required string json){
    return reReplaceNoCase(json, "([\[,:]\s*)null(?=\s*[\],\}])", '\1""', "ALL");
}

Side note: I'm glad I lifted this code from our codebase - I'd normally be hesitant due to copyright & NDA nonsense - because I spotted a bug in this as I did so. Now fixed. Heh.

This is not foolproof for all situations, but for our data it's good enough: we can be certain there'll almost never (if that's not an oxymoron) be any wayward sequences of "n-u-l-l" which is not actually a JSON null.

So what this means is instead of the serialized JSON being this:

{"ROWCOUNT":3,"COLUMNS":["ID","STRDATA","INTDATA"],"DATA":{"ID":[1,2,3],"STRDATA":["tahi",null,"toru"],"INTDATA":[1,2,null]}}

It ends up being this:

{"ROWCOUNT":3,"COLUMNS":["ID","STRDATA","INTDATA"],"DATA":{"ID":[1,2,3],"STRDATA":["tahi","","toru"],"INTDATA":[1,2,""]}}

That seemed fine. And was fine. Superficially. Until we needed to do a query of query on a numeric column.

Consider this:

<cfscript>
// same recordset as previously...


// serialise it, CLEAN it, then deserialise it again
json = serializeJson(records, true);
cleanedJson = cleanJson(json);
deserialisedRecords = deserializeJson(cleanedJson, false);
</cfscript>

<cfdump var="#records#">
<cfquery name="qoqRecords" dbtype="query">
    SELECT    *
    FROM    records
    WHERE    intData > 1
</cfquery>
<cfdump var="#qoqRecords#" label="qoqRecords">
<hr>

<cfdump var="#deserialisedRecords#">
<cftry>
    <cfquery name="qoqDeserialisedRecords" dbtype="query">
        SELECT    *
        FROM    deserialisedRecords
        WHERE    intData > 1
    </cfquery>
    <cfdump var="#qoqDeserialisedRecords#">
    <cfcatch>
        <cfdump var="#[cfcatch.message,cfcatch.detail]#">
    </cfcatch>
</cftry>

I've kinda given away the ending a bit there, by including the try/catch, but here's what gets output:


query
IDINTDATASTRDATA
111tahi
222[empty string]
33[empty string]toru
qoqRecords - query
RESULTSET
qoqRecords - query
IDINTDATASTRDATA
122[empty string]
CACHEDfalse
EXECUTIONTIME2
SQLSELECT * FROM records WHERE intData > 1

query
IDINTDATASTRDATA
111tahi
222[empty string]
33[empty string]toru
array
1Query Of Queries runtime error.
2Comparison exception while executing >.<br> Unsupported Type Comparison Exception: The > operator does not support comparison between the following types: <br> Left hand side expression type = "STRING".<br> Right hand side expression type = "LONG".<br>

This stands to reason, on reflection. Just 'cos the original and serialised/cleaned/deserialised queries look the same, doesn't mean they are. In the original query, "[empty string]" is just ColdFusion's misreporting of reality: that it's null. However in deserialisedRecords, as we know, it's actually an empty string.

Digression:

"ColdFusion's misreporting of reality [regarding null values]"? Well... yeah. Here's one of the more egregious examples of ColdFusion just failing miserably at dealing with nulls:

<cfoutput>
isNull(records.intData[3]): #isNull(records.intData[3])#<br><br>
</cfoutput>

<cfquery name="qoqRecords" dbtype="query">
    SELECT    *
    FROM    records
    WHERE    intData IS NULL
</cfquery>
<cfdump var="#qoqRecords#" label="qoqRecords">

Output:

isNull(records.intData[3]): NO

qoqRecords - query
RESULTSET
qoqRecords - query
IDINTDATASTRDATA
13[empty string]toru
CACHEDfalse
EXECUTIONTIME2
SQLSELECT * FROM records WHERE intData IS NULL

So come on, ColdFusion: is intData in the third record null or not? Depending on how one asks, the answer is either "no" or "yes". What is this: "Schrödinger's null?"

End of digression.

The ramification of this is that when ColdFusion comes to guess the column type of the intData column, it sees a string in there, so it goes: must be a string. And, accordingly, the error message is entirely right in that content: one cannot use > to compare strings and longs.

Hang on again... guess the column type? But I specified the column types when I created the query, remember:

records = queryNew("id,strData,intData", "Integer,Varchar,Integer");

This is lovely, but doesn't help once one has serialised and deserialised the query. Remember this is all we get having serialised the query:

{"ROWCOUNT":3,"COLUMNS":["ID","STRDATA","INTDATA"],"DATA":{"ID":[1,2,3],"STRDATA":["tahi",null,"toru"],"INTDATA":[1,2,null]}}

No data types there: we've lost the information. And to belabour the point, see this code:

<cfdump var="#getMetadata(records)#" label="getMetadata(records)">
<br>
<cfdump var="#getMetadata(deserialisedRecords)#" label="getMetadata(deserialisedRecords)">

getMetadata(records) - array
1
getMetadata(records) - struct
IsCaseSensitiveNO
Nameid
TypeNameINTEGER
2
getMetadata(records) - struct
IsCaseSensitiveNO
NamestrData
TypeNameVARCHAR
3
getMetadata(records) - struct
IsCaseSensitiveNO
NameintData
TypeNameINTEGER

getMetadata(deserialisedRecords) - array
1
getMetadata(deserialisedRecords) - struct
IsCaseSensitiveNO
NameID
2
getMetadata(deserialisedRecords) - struct
IsCaseSensitiveNO
NameSTRDATA
3
getMetadata(deserialisedRecords) - struct
IsCaseSensitiveNO
NameINTDATA

Obviously the type information has been lost. And - I've only just noticed this - the casing on the columns has been lost too, it seems. On one hand losing the casing on the column names is inconsequential. But on the other hand it's just slack-arse to not have done this properly.

I think there's a fault in CF's approach to serialising queries to JSON here. Given there's no such JSON type as "CFML Query", it's up to Adobe how they chose to serialise it, and they could well have serialised it with the data type as well. One could argue that CFML is "typeless" (which it isn't, but people have a habit of thinking this) so it shouldn't matter, but queries are typeful, and have been since at least CF5 when query of query was introduced.

So I think instead of this:

{
    "ROWCOUNT"    : 3,
    "COLUMNS"    : ["ID","STRDATA","INTDATA"],
    "DATA"        :{
        "ID"        : [1,2,3],
        "STRDATA"    : ["tahi",null,"toru"],
        "INTDATA"    : [1,2,null]
    }
}

They should have done this:

{
    "ROWCOUNT"    : 3,
    "COLUMNS"    : ["ID","STRDATA","INTDATA"],
    "TYPES"        : ["Integer","Varchar","Integer"],
    "DATA"        :{
        "ID"        : [1,2,3],
        "STRDATA"    : ["tahi",null,"toru"],
        "INTDATA"    : [1,2,null]
    }
}

I wonder if it's too late to add this stuff in? Anyway, perhaps; perhaps not. I suspect the horse has bolted on this one. And, anyway, I need this to work now, so I need to work around it.

What we have discovered is that WDDX deals with all this a bit better. Here's a WDDXed version of my earlier query of query code:

<cfwddx action="cfml2wddx" input="#records#" output="wddx">
<cfwddx action="wddx2cfml" input="#wddx#" output="deserialisedRecords">

<cfdump var="#records#">
<cfquery name="qoqRecords" dbtype="query">
    SELECT    *
    FROM    records
    WHERE    intData > 1
</cfquery>
<cfdump var="#qoqRecords#" label="qoqRecords">
<hr>

<cfdump var="#deserialisedRecords#">
<cfdump var="#getMetadata(deserialisedRecords)#">
<cftry>
    <cfquery name="qoqDeserialisedRecords" dbtype="query">
        SELECT    *
        FROM    deserialisedRecords
        WHERE    intData > 1
    </cfquery>
    <cfdump var="#qoqDeserialisedRecords#">
    <cfcatch>
        <cfdump var="#[cfcatch.message,cfcatch.detail]#">
    </cfcatch>
</cftry>

Obviously I've included spoilers again in this, given I've still got the try/catch in there.  Sigh. Here's the output (just the bottom half of it, for the sake of brevity):

query
IDINTDATASTRDATA
111tahi
222[empty string]
33[empty string]toru
array
1
struct
IsCaseSensitiveNO
Nameid
TypeNameDOUBLE
2
struct
IsCaseSensitiveNO
NamestrData
TypeNameVARCHAR
3
struct
IsCaseSensitiveNO
NameintData
TypeNameDOUBLE
array
1Query Of Queries runtime error.
2Comparison exception while executing >.<br> ''

This time CF has done a better job of guessing the column types: at least the numeric columns have a numeric type. But the bloody thing still errors.This is due to the NULL in that column... if I take the row with the null out, it all works fine.

So I bloody despair. We need a reliable way of serialising & deserialising query data, which really shouldn't be that hard... but I'm currently drawing a blank.


NB: none of any of this stuff is a problem on Railo. Yay for Railo. This is no help to me, unfortunately :-(



Has anyone else dealt with making JSON actually work reliably in CF9?

--
Adam