Sunday 11 January 2015

CFML: design brain-fart in Application-specific DSN definitions on ColdFusion

G'day:
This article is just some analysis (and opinion, unsurprisingly) on the situation described in this bug ticket: "THIS.datasources changes ignored until CF restart".

In ColdFusion 11 (and Railo 4) one can specify data source definitions in Application.cfc, for example:

// Application.cfc
component {
    this.name = "myApp";

    this.datasources = {
        myDsn    = {
            database    = "dbName",
            host        = "localhost",
            port        = "3306",
            driver        = "MySQL5",
            username    = "dbUser",
            password    = "dbPassword"
        }
    };
    this.datasource    = "myDsn";
}

(That's for ColdFusion 11. Railo's syntax differs slightly: there's an example of it further down).



In the example above I define only one data source in this.datasources, but one can specify as many as one wants.

The issue Aaron raised in the bug ticket is that ColdFusion only pays attention to that setting on application start-up. It is otherwise ignored. And, believe it or not: this is by design. First things first: that's frickin' shithouse design, Adobe.

The rationale is that there is overhead in defining a data source (they've actually said this), so to avoid that overhead, they just ignore that setting after app start-up.

My question then is: what the hell is it doing in the Application.cfc pseudo-constructor then? That code is supposed to be run every request. Not "every request, except... err... this line here". That's f***ing stupid (I dunno why I'm self-censoring my usage of the word "f***" today, sorry).

"ignoring code" is not how to deal with this, Adobe. Here are some ways of dealing with this:

  • test the null hypothesis (which I will do shortly). Is there actually a performance issue here? I suspect there isn't.
  • Let us decide if the performance hit matters. It might not, in which case there is no problem. You've wasted development time by basically prematurely optimising this.
  • If we decide there's a performance issue to deal with, facilitate us dealing with it. It's reasonable to conclude that DSNs are an application-specific thing generally, so allow us to shift that code into onApplicationStart(). That's where application-lifetime-specific code should go. Adobe: the hint is in the name, guys.
  • If you've got a performance issue in your code: make your code faster. Don't code around it.
  • Document what you've done.

It beggars belief that the language "design" people at Adobe decided that "oh, yeah that line of code is slow... let's just not run it. But don't make it seem any different that any other line of code there, so people can't infer that's what will happen. And  for f***'s sake don't tell anyone". This is CFML; not Fight Club.

Code.

So here's some code that demonstrates this in action. Firstly I've set up two DBs, imaginatively called "db_dsn1" and "db_dsn2". They each have a user "db_user1" (and "db_user2"). The users have the same password. I have put data in a table - records - which is DB specific (you'll see my fantastic use of imagination for the data values shortly).

My application uses the one data source (called "dsn"), but I can optionally set that to use either/or of those DBs, via URL params. This is very contrived, I know, but it explicitly demonstrates the situation.

Here's Application.cfc:

// Application.cfc
component {

    param URL.dsnId = 1;
    param URL.appId = 1;
    param URL.logId = 1;

    variables.dsnHandler = variables[structKeyExists(server, "railo") ? "getDsnDetailsRailo":"getDsnDetailsCF"];

    request.startTicks = getTickCount();
    logIt("Pseudo-constructor start of request");

    this.name = "needsAppRestart_#URL.appId#";

    logIt("Pseudo-constructor before datasources set");
    this.datasources = {
        dsn    = variables.dsnHandler(URL.dsnId)
    };
    this.datasource    = "dsn";
    logIt("Pseudo-constructor after datasources set");
    logIt("========================================");



    function getDsnDetailsCF(id){
        logIt("#getFunctionCalledName()#() called");
        return {
            database    = "db_dsn#id#",
            host        = "localhost",
            port        = "3306",
            driver        = "MySQL5",
            username    = "user_dsn#id#",
            password    = "user_dsn#id#"
        };
    }

    function getDsnDetailsRailo(id){
        logIt("#getFunctionCalledName()#() called");
        return {
            class            = "org.gjt.mm.mysql.Driver",
            connectionString= "jdbc:mysql://localhost:3306/db_dsn#id#?useUnicode=true&characterEncoding=UTF-8&useLegacyDatetimeCode=true",
            username        = "user_dsn#id#",
            password        = "user_dsn#id#"
        };
    }

    function logIt(message){
        var requestDuration = getTickCount() - request.startTicks;
        writeLog(file="needsAppRestart_#URL.logId#", text="(#URL.appId#) #requestDuration#ms #message# (#URL.dsnId#)");
    }

}

There's a bit of monkeying around so that I can run this on both ColdFusion and Railo. And also some metrics-gathering code. But the important bits are around setting which DSN to use.

Firstly I hit the site using ?dsnid=1&appid=10&logid=10. The key bit is that I'm using the first database here.

And the output is:

query
RESULTSET
query
IDRECORD
11record 1 from db 1
22record 2 from db 1
CACHEDfalse
EXECUTIONTIME41
SQLSELECT * FROM records


Fine, as is expected. Now I switch to using the second DB (?dsnid=2&appid=10&logid=10), and the output is...

query
RESULTSET
query
IDRECORD
11record 1 from db 1
22record 2 from db 1
CACHEDfalse
EXECUTIONTIME21
SQLSELECT * FROM records

We're still hitting the first DB. It's not until I change application names (thus running the app-init code again) that I see the DSN change (?dsnid=2&appid=12&logid=10):

query
RESULTSET
query
IDRECORD
11record 1 from db 2
22record 2 from db 2
CACHEDfalse
EXECUTIONTIME30
SQLSELECT * FROM records

Contrast, btw, the same exercise using Railo:

?dsnid=1&appid=10&logid=10

Query
Template: queryExecute.cfm
Execution Time: 1.42 ms
Record Count: 2
Cached: No
Lazy: No
SQL:
SELECT * FROM records 
idrecord
11record 1 from db 1
22record 2 from db 1


?dsnid=2&appid=10&logid=10

Query
Template: queryExecute.cfm
Execution Time: 0.509 ms
Record Count: 2
Cached: No
Lazy: No
SQL:
SELECT * FROM records 
idrecord
11record 1 from db 2
22record 2 from db 2

So the data source is re-evaluated every request, as one would expect.

Let's have a look at those metrics I captured on ColdFusion:

(10) 0ms Pseudo-constructor start of request (1)
(10) 15ms Pseudo-constructor before datasources set (1)
(10) 15ms dsnHandler() called (1)
(10) 15ms Pseudo-constructor after datasources set (1)
(10) 15ms ======================================== (1)
(10) 0ms Pseudo-constructor start of request (2)
(10) 0ms Pseudo-constructor before datasources set (2)
(10) 0ms dsnHandler() called (2)
(10) 0ms Pseudo-constructor after datasources set (2)
(10) 0ms ======================================== (2)
(12) 0ms Pseudo-constructor start of request (2)
(12) 0ms Pseudo-constructor before datasources set (2)
(12) 0ms dsnHandler() called (2)
(12) 0ms Pseudo-constructor after datasources set (2)

(12) 0ms ======================================== (2)

One thing this demonstrates is that the entire line of code is not being skipped (Adobe have actually done this in the past... added an exception into the runtime to ignore some lines of code!). But the result is not being applied.

I ran a whole bunch of tests, as they execution times seemed to basically be subsumed in the background processing noise. But this is representative of the situation: sometimes the requests had a blip when loading the data source... sometimes they did not. Sometimes the second request was slower than the ones in which the data sources needed to load. Sometimes they were all just zeros. There was nothing in it.

Railo was much the same: more variation from test to test than there was request to request on the same test round.

Conclusions

  • Code should never be ignored by way of optimisation.
  • I'm not sure there's a need for optimisation here anyhow.
  • If something is ideally only run on application start-up, the code belongs in onApplicationStart().
  • If one finds one's self writing an exception into the language (eg: "run these lines, but not this specific line"), you're almost certainly not doing it right.
  • The current behaviour on ColdFusion is a design bug, and needs to be fixed. I would not consider it high priority though.

--
Adam