Sunday, 30 May 2021

CFML: weirdness with… <cfquery> and single-quote escaping

G'day:

I was looking around for a good (read: bad) example of one of those exercises in "building a dynamic SQL string inside a <cfquery>". Firstly, one should pretty much never do this. You are almost certainly tightly-coupling business logic with DB-tier logic, and that's an anti-pattern right there. And it also means you can't test it without hitting an actual database, which is rubbish. So just: don't do it. Build the SQL string first, testing all the logic as you go, and once the string is built, pass the completed thing to queryExecute (let's all just pretend <cfquery> doesn't exist: it's a horrific construct, and should have been left to die in the C20th where it belongs).

I found an excellent example of this in the CFWheels codebase.

<cfquery attributeCollection="#arguments.queryAttributes#">

    <cfset local.$wheels.pos = 0>

    <cfloop array="#arguments.sql#" index="local.$wheels.i">

        <cfset local.$wheels.pos = local.$wheels.pos + 1>

        <cfif IsStruct(local.$wheels.i)>
            <cfset local.$wheels.queryParamAttributes = $queryParams(local.$wheels.i)>

            <cfif
                NOT IsBinary(local.$wheels.i.value)
                AND local.$wheels.i.value IS "null"
                AND local.$wheels.pos GT 1
                AND (
                    Right(arguments.sql[local.$wheels.pos-1], 2) IS "IS"
                    OR
                    Right(arguments.sql[local.$wheels.pos-1], 6) IS "IS NOT"
                )
            >
                NULL
            <cfelseif StructKeyExists(local.$wheels.queryParamAttributes, "list")>
                <cfif arguments.parameterize>
                    (<cfqueryparam attributeCollection="#local.$wheels.queryParamAttributes#">)
                <cfelse>
                    (#PreserveSingleQuotes(local.$wheels.i.value)#)
                </cfif>
            <cfelse>
                <cfif arguments.parameterize>
                    <cfqueryparam attributeCollection="#local.$wheels.queryParamAttributes#">
                <cfelse>
                    #$quoteValue(str=local.$wheels.i.value, sqlType=local.$wheels.i.type)#
                </cfif> 
            </cfif>
        <cfelse>
            <cfset local.$wheels.i = Replace(PreserveSingleQuotes(local.$wheels.i), "[[comma]]", ",", "all")>
            #PreserveSingleQuotes(local.$wheels.i)#
        </cfif>
        #chr(13)##chr(10)#
    </cfloop>
    <cfif arguments.limit>
        LIMIT #arguments.limit#
        <cfif arguments.offset>
            #chr(13)##chr(10)#
            OFFSET #arguments.offset#
        </cfif>
    </cfif>
    <cfif Len(arguments.comment)>
        #arguments.comment#
    </cfif>
</cfquery>

(note that in the CFWheels codebase, that is all one single line of code, which I find to be… astonishing)

Also note that there are more "opportunities for improvement" there than just embedding very complicated string-building logic into a DB call. But it's a perfect example of what I'm talking about. And it's by no means the only example I've seen of this; it's all over the place in the CFML world. In its defence, at least the logic here is not really business logic; it's all SQL-string-building-logic. So it could be homed close to the DB call - maybe even a private method the same class - but it def doesn't belong inline like that.

Why? Because it's really complicated, ought to have been developed using TDD strategies, but to test this logic, one needs to actually hit a database, because all the logic is munged into the DB call! Ugh.

I thought to myself "but it's so easy to have built this in a purposeful string-builder method; I wonder why it wasn't done that way?". There's a coupla things that offer a challenge to this:

  • it's slinging the parameter values inline with <cfqueryparam> which is a bit "old skool", and also hasn't been necessary since CF11 (and its Railo / Lucee counterparts), provided one uses queryExecute which can correctly separate the SQL statement from the parameters for the statement.
  • It's got that preserveSingleQuotes in there.

preserveSingleQuotes is harder to deal with. Firstly a refresher. For some reason best known to someone at Allaire, all single quotes in any expressions within <cfquery> tags are automatically escaped. Whether appropriate to do so or not. To prevent this from happening, one needs to wrap the expression with a call to preserveSingleQuotes. But this only works if the call to preserveSingleQuotes is made within the <cfquery> tag. One cannot use preserveSingleQuotes outside that context (something not mentioned in any of the ColdFusion, Lucee or CFDocs, btw). This is such a daft thing for ColdFusion to have instigated. <cfquery> should just pass the string it's given to the DB. The end. If there's an expression within the SQL string that has single quotes that need to be escaped, then the code building the SQL string should do the escaping. Do it when it's needed. Don't do it unilaterally and then provide a mechanism to not do it. Blimey.

Anyway, that means that with all that string-building logic within that <cfquery>, it's not just a matter of lifting the string out into its own method, and building it in a (unit) testable fashion.

This got me looking at a way of stopping <cfquery> from escaping stuff, so that the preserveSingleQuotes call ain't needed.

Lucee provides a tag attribute on <cfquery> to disable it: psq, which if false, will not do the escaping. I do so wish they'd put more thought into the naming of that, but hey. Maybe escapeSingleQuotes or something. Not just a mostly meaningless TLA. Ah well: at least it's there and it works.

ColdFusion implements no such functionality, as far as I can tell.

During my googling as to how to address this, I came across one of my own old articles: "UnexpectedBuggy ColdFusion behaviour with CFML statements in <cfquery> tags". And within the grey box down at the bottom, there was mention of using UDFs within a <cfquery> tag varying the behaviour of quote-escaping. Then I thought… "hang on a sec…?" and tried an experiment.

Here's a baseline example showing the problem CFML causes for itself with quote escaping:

<cfscript>
numbers = queryNew("id,en,mi", "integer,varchar,varchar", [
    [1,"one","tahi"],
    [2,"two","rua"],
    [3,"three","toru"],
    [4,"four","wha"],
    [5,"five","rima"],
    [6,"six","ono"],
    [7,"seven","whitu"],
    [8,"eight","waru"],
    [9,"nine","iwa"],
    [10,"ten","tekau"]
])

sqlStatement = "
    SELECT *
    FROM numbers
    WHERE mi IN ('rua', 'toru', 'rima', 'whitu')
"
</cfscript>

<cftry>
    <cfquery name="primes" dbtype="query">#sqlStatement#</cfquery>
    <cfset dump =  primes>
    <cfcatch>
        <cfset dump = [cfcatch.message, cfcatch.detail, cfcatch.extendedinfo]>
    </cfcatch>
    <cffinally>
        <cfdump var="#dump#" label="no handling">
    </cffinally>
</cftry>

Here my SQL string has single quotes in it. It's supposed to have. They're legit. That's syntactically correct. But ColdFusion (and Lucee, but I'm showing this on ColdFusion) thinks it knows better, and messes it up:

It's "helpfully" escaped all the single quotes for me. Thus breaking the SQL statement. Cheers for that.

I can prevent this from happening by using preserveSingleQuotes:

<cfquery name="primes" dbtype="query">#preserveSingleQuotes(sqlStatement)#</cfquery>

Fine. And in reality, this could almost certainly be the way CFWheels handles the string building exercise. Build the string in a string builder method, then preserveSingleQuotes the whole lot.

But what's this thing about UDFs messing with stuff. This is the thing that kinda surprises me. Check this out:

<cfscript>
    function returnTheArgument(theArgument) {
        return theArgument
    }
</cfscript>

<!--- ... --->
<cfquery name="primes" dbtype="query">#returnTheArgument(sqlStatement)#</cfquery>
<!--- ... --->

This… works. My function doesn't do anything… but simply calling it on the SQL statement inline within the <cfquery> tags is enough to prevent the CFML engine from escaping any single quotes.

W … T.a. … F?

And this works on both CF and Lucee.


Anyway. That's "interesting". My conclusion as to how to solve the problem with the embedded preserveSingleQuotes thing is probably to just not embed it into the middle of things; just to wrap the whole statement with them. Which means all that nasty untestable logic can be lifted out of being inline there, and put in its own function that does the one thing. "Single Responsibility Principle" craziness, I know. But the road to this conclusion today was a winding one, that's for sure.

[mutters to self] I dunno what the CFML engines are thinking here. I really don't.

Righto.

--
Adam