Monday, 16 December 2013

CFML: A rare mishit from Railo: function caching

G'day:
Railo are pretty much where it's at as far as enhancing CFML goes. Adobe - due to a criminally long dev cycle, and seemingly complete ignorance of web development trends & practices - just drag the chain as far as keeping CFML competetive with other web dev options goes; and I believe Adobe are singularly responsible for diluting the relevance of CFML as a language ("I've come to a conclusion"). Railo's increased confidence with leading CFML enhancement - rather than just following in Adobe's shadow - gives me faith than CFML isn't "dead" as far as being an option for new projects and new developer interest goes.

However Railo is not a panacea, and sometimes they balls things up. Here's a situation in which I think they have let themselves down.

A while back Railo added function-result-caching to CFML. This was - I think - due to a suggestion that bubbled out from the community. It's good to listen to the community, but I think this is an example where there's a difference between "hearing" and "listening". They should have heard what the community said, but they shouldn't necessarily have listened. The issue is here - and obviously this is just my own opinion - the suggestion from the community was a bloody stupid one. But Railo implemented it.

Now I am not suggesting function-result-caching is not a desirable feature, but the way Railo chose to implement was basically to take exactly what the community discussion suggested, didn't sanity check it or assess how best a solution might be implemented in a programming language (rather than... for example... an application, or basically some implementation code), and built a very situation-specific solution into CFML.


So what's all this function-result-caching carry on? Sometimes when running code in a give situation, we know the "question being asked" (read: the method being called) is generally going to have the same "answer (read: "method result"). Take my workplace for example: we have a lot of info which is geographically-specific.So for a method which fetches some data about "Galway" - for example - odds-on the info is not going to change from one minute to the next. And if we have thousands of hits to the site from one minute to the next... there's simply no reason to re-query that information. It won't have changed: the previous answer will still be accurate. And, indeed, we can probably safely say that general information about Galway won't change that often. So if we only update the answer to the question once a day... that's fine. This is the theory behind having cachedwithin on <cfquery> and <cfstoredproc> tags.

Not everything is a database call though, and the same idea exists for any sort of code being run: it might take a while for the process to run, and the end result might reasonably be expected to be the same within a given period of time. This is fair enough. And deciding to deal with this in CFML was a laudable idea.

Anyway, to address this perceived method-result-caching requirement, Railo added the capability to a function definition to have a cachedwithin attribute:

function someFunction() cachedwithin=createTimespan(/* etc */) {
    // stuff that can take a wee while to run
}


Superficially, this seems like a good idea. However it kind of implements an anti-pattern: it couples how a function is intended to be used with the definition of the functionality. Just like being oblivious to its calling environment as far as other variables goes (a function should be "black box": pass values it needs in, don't rely on it to know calling code data exists), this too is coupling the function to its calling environment: that the calling environment doesn't care whether its result is actually correct: a stale version could be OK. But this consideration is not the business of the function itself, it's the business of the environment in which it's being called.

If we were to compare the <cfquery cachedwithin> example I alluded to before,  this approach seems legit: the <cfquery> tag has a cachedwithin attribute, and so does the function keyword. And I think this was how Railo were thinking when they did this. However it's not the same thing. A <cfquery> is not defining a reusuable block of code, it's very much an inline "calling code" thing. So if one dictates a cachedwithin attribute in the the <cfquery>, it's because as far as the calling code is concerned the result of this doesn't need to be fresh. A closer parallel to what Railo have implemented here would be <cfstoredproc>, where in the cachedwithin value was part of the procedure code, not the <cfstoredproc> call. Because whilst <cfstoredproc> has a cachedwithin attribute, this - again - is very much a call-time thing, not a define-time thing. It's simply... wrong... to apply caching / staleness considerations to the definition of functionality. It should be coupled to the usage of functionality.

So what Railo ought to have done is this:

function someFunction()  {
     // stuff that can take a wee while to run
}

result = someFunction() cachedwithin=createTimespan(/* etc */);

IE: the function definition doesn't dictate how it ought to be cached, but the call to the function -  as per the calling code, which is what knows the kind of environment and requirement - dictates it. That'd be cool. That makes sense.

Fortunately... this is actually already really easy to do. There's already a memoize "design pattern" in which one uses a wrapper around a function call to "memoize" it, ie: cache the result of it.

// memoizeWithLifetime.cfm

// base on same-named function from Underscore.cfc (https://github.com/russplaysguitar/UnderscoreCF/)
public function function memoize(required function func, numeric cachedWithin=0, function hasher) {
    var memo = {};
    if (!structKeyExists(arguments, "hasher")) {
        arguments.hasher = function(hashArgs) {
            return hashArgs[1];
        };
    }
    return function() {
        var key = hasher(arguments);
        if (structKeyExists(memo, key)){
            if (dateDiff("s", memo[key].timestamp + cachedWithin, now()) < 0){
                return memo[key].result;
            }
            structDelete(memo, key);
        }
        memo[key] = {
            result = func(argumentCollection = arguments),
            timestamp = now()
        };
        return memo[key].result;
    };
}

This takes a function (be it inline or predefined), a timespan within which to cache the result, and an optional helper function to determine the uniqueness of the call (the hasher). This function returns a cache-aware version of the passed-in function. Provided the passed-in hasher function correctly identifies varying argument combinations, the returned function will behave the same as the passed-in function, except cache the result for the specified period of time.

If one wants a version of the same function which has different caching criteria or duration: simply call memoize() again with a different cachedwithin argument, and/or hasher.

// memoizeDemoForBlog.cfm

include "memoizeWithLifetime.cfm";

// memoize the function
heavyLifting = memoize(
    function(required string label) {
        var msg = "Executed for #label# at: #ts()#<br>";
        sleep(1000);
        return msg;
    },
    createTimespan(0,0,0,5)
);


// demonstrate the memoisation
testCalls("Not Cached");
testCalls("Cached");
sleep(5000);
testCalls("Decached");


// helper functions
function testCalls(message){
    writeOutput("<h3>#message#</h3>");
    writeOutput("Called at: #ts()#<br>");
    writeOutput(heavyLifting(label="first"));
    sleep(1000);
    writeOutput("Called at: #ts()#<br>");
    writeOutput(heavyLifting(label="second"));
    sleep(1000);    
}
function ts(){
    return timeFormat(now(), "HH:MM:SS");
}

Output:

Not Cached

Called at: 08:53:45
Executed for first at: 08:53:45
Called at: 08:53:47
Executed for second at: 08:53:47

Cached

Called at: 08:53:49
Executed for first at: 08:53:45
Called at: 08:53:50
Executed for second at: 08:53:47

Decached

Called at: 08:53:56
Executed for first at: 08:53:56
Called at: 08:53:58
Executed for second at: 08:53:58


See how the cached versions of the function call reflect the result from the initial pass. However once we wait for the timeout to pass, the cache is flushed and the function is actually called afresh.

That seems like a lot of code, but the important bit is fairly minimal. Also note that this has nothing to do with "closure" or callbacks or anything high-fallutin' like that. I've used an inline function expression there because I could, but memoize() just takes "a function". Any function. For example:

function someFunction(required string label) {
    var msg = "Executed for #label# at: #ts()#<br>";
    sleep(1000);
    return msg;
};


heavyLifting = memoize(
    someFunction,
    createTimespan(0,0,0,5)
);

This is how caching function results should be handled at language level (or, hey, just roll yer own. It's not exactly rocket science). So I question whether Railo's functionality was even required.

One other drawback to Railo's implementation of this that really shocked me (in how shoddy the implementation is), consider this function:

// complexArgs.cfm
function heavyLifting(required struct someArg) cachedwithin=createTimespan(0,0,0,5) {
    sleep(1000);
    return "Executed for #structKeyList(someArg)# at: #now()#<br>";
}

writeOutput(heavyLifting({key="value"}));

Output:

Railo 4.1.2.005 Error (application)
Messageonly simple values are allowed as parameter for a function with cachedWithin
StacktraceThe Error Occurred in
C:\Apps\railo-express-jre-win64\webapps\railo\www.scribble.local\shared\git\blogExamples\railo\cachedFunctions\complexArgs.cfm: line 8 
6: }
7: 
8: writeOutput(heavyLifting({key="value"}));
9: </cfscript>


Railo, are you bloody joking? If I use cachedwithin on a function definition, I can only use simple-value arguments? Doesn't this invalidate this functionality for almost all situations?

How embarrassing. Sorry to come down on you guys, but your standard of work is usually so high, I'm pretty gobsmacked by this.

I find it interesting to say this, but based on this and a coupla other things Railo have implemented at community request... I really think they need some sort of fail-safe apparatus in place so that stuff gets thought through a bit better before it's implemented. Micha seems to want to please everyone, but I really don't think "due diligence" is being performed sometimes as to whether a given community-floated feature actually has merit.

Equally I often see someone raise an issue on the forums, and Micha seems to have fixed it within a few minutes to a few hours later. Which - whilst admirable - makes me wonder exactly what QA & UAT is going on here. There's simply not enough time to actually do the minimum  required amount of work to actually release a feature in that timeframe. Given there's more to a solution to an issue than simply writing the code.

In conclusion, this feature should never have gone into the language at all, and the way it's been implemented is just shoddy.

Perhaps Railo needs a "Language Council" or something that features need to go through before being given to the dev team to implement? And then a process to test that things have been done properly before they're released. It might slow stuff down a bit, but it would mean stuff like this doesn't pollute the language. Because it's in there now, so we're stuck with it.

:-|

--
Adam