Monday, 12 August 2013

"evalulate() is really slow". Is it now?

G'day:
I'm always the first person to screw up my nose when I see someone using evaluate() in any CFML written for versions of CF after CF5. But generally because it's an unnecessary function call as there's almost always native syntactical constructs which will do the job without dynamically evaluating a string. And also in the back of my mind there's that perennial notion that it's really slow. I'd not investigated this since I was using CFMX7, and never very rigorously. So I decided to have a look at it.

Firstly, there's a few common areas I see evaluate() being used. I think these were recommended approaches back in the pre-CFMX days, but they've been unnecessary since CF was revamped to run on Java.

Dynamic variable names

This is the most common one. It often stems from a form that has an iteration of form fields with numeric suffixes on the name attribute value, to kinda emulate an array. EG:

<input name="teamMember1">
<input name="teamMember3">
<input name="teamMember3">

And on the receiving end of the form post, we have this:

<cfloop index="i" from="1" to="3">
    <cfoutput>#evaluate("teamMember#i#")#<br></cfoutput>
</cfloop> 

This is shonky from the outset and the person has done themselves no favours by not scoping their variables. And had they scoped their variables, it might have occurred to them that evaluate() is simply not necessary here, and one can write much nicer code, thus:

form["teamMember#i#"]

Associative array notation will allow any variable to be accessed via a string containing its name, and obviously a string can be composed at runtime to contain dynamic values.

Working around the shortcomings of valueList()

valueList() is a bit of a rubbish function in its implementation as it is required to take a column reference, in the form query.columnName. This is the only place in CFML that this is the case, so - from the outset - alarm bells should have gone off with the implementors that this isn't a great approach. Furthermore, that is the only way of referencing a column that is permissible in valueList(). One cannot use query["columnName"] (which is also a reference to the entire column), nor any other sort of runtime expression. This causes problems if one is not in a position to hard-code the column name like that, for example if the column names are unknown at code-time, or indeed if even the query name is unknown at runtime. The latter is less often the case, granted. I have come across it though.

Firstly, the dynamic column name:

evaluate("valueList(query.#columnName#)");


That's what one has to do if one is using valueList(). But one does not need to use valueList() to effect this:

arrayToList(query[columnName]);

If one wants to emulate quotedValueList(); use listQualify() after arrayToList().

Secondly... what if the query name is a dynamic concern. One might think one can employ the previous tactic, of referencing the dynamically-named query thus:

valueList(variables[dynamicName].column);

This fails with:

Complex constructs are not supported with function valueList.

Use simple variable instead.

Which is a bit crap: it simply shouldn't matter how an expression is implemented, provided it is syntactically correct and returns the correct data type for the function taking it as an argument, it should work. What's particularly crap here is that this is a parser error (so done at compile time), not a runtime one, so CF has a built-in specific rule for valueList() to fail with expressions of this syntax. How about the compiler just lets the completely valid code through, and see if the value actually represents the desired data? Anyway,  I have digressed a bit.

Traditionally the fallback here is to do this:

evaluate("valueList(#dynamicName#.column)");

This is not necessary, because one can avoid using evaluate() by simply using a known variable:

knownVariable = variables[dynamicName];
valueList(knownVariable.column));

Still no need for evaluate().

Dynamically referencing scopes

This is a less common one, but I have seen this in the past. Sometimes the runtime part of a variable reference might be the scope, rather than the variable within the scope. I've seen this being done in the past (ages ago), but can't recall the situation. However here's a quick contrived example... a lightweight scope cache:

component {

    function init(scope){
        variables.scope = arguments.scope;
        "#variables.scope#.scopeCache" = {};

        return this;
    }

    function put(key, value){
        "#variables.scope#.scopeCache.#key#" = value;
    }

    function get(key){
        return evaluate("#variables.scope#.scopeCache.#key#");
    }

}

This could be used like this:

cache = new Cache("application");
cache.put(key="foo", value="bar"); // now cached in application scope
moo = cache.get(key="foo");

Output:

struct
applicationnamescopeCache022
scopecache
struct
foobar

So within the cache, we can create the variables without resorting to evaluate(), but fishing them back out again, we resort to using evaluate(). For example we can't do this:

theScope = "application";
theVariable = "foo";
moo = [theScope][theVariable];

But we can kinda do that. We can create a reference to scope easily enough, and then we can use a dynamic reference to that reference. Here's the non-evaluate() version of that code:

component {

    function init(scope){
        variables.scopeReferences = {
            request = request,
            session    = session,
            application = application
        };

        variables.scope = arguments.scope;

        variables.scopeReferences[variables.scope].scopeCache = {};

        return this;
    }

    function put(key, value){
        variables.scopeReferences[variables.scope].scopeCache[key] = value;
    }

    function get(key){
        return variables.scopeReferences[variables.scope].scopeCache[key];
    }

}

Here we set up some variables-scoped references to actual scopes (yeah, it breaks encapsulation... this is just example code), which means we can then use associative-array notation to reference the scopes dynamically at runtime.

Dynamically referencing object methods

Another situation in which one might think to use evaluate() in the past was to call a dynamically-named method on a CFC object, eg:

theMethodName = "theMethod";
theAnswer = someObject[theMethodName](someArg="someValue");


Railo supports that syntax, and bloody ColdFusion should too (people have lobbied for it, deaf ears have ensued from Adobe, thusfar). We've always had a tag-based way of effecting this, in <cfinvoke>:

<cfinvoke component="#someObject#" method="#theMethodName#" somearg="someValue" returnvariable="theAnswer">

However a fat lot of use that is in a CFScript block, which is where one's object-method-calling code generally ought to be.

ColdFusion 10 added a completely inelegant offering in lieu of the bracket-based syntax work as it should, in invoke():

theAnswer = invoke(theObject, theMethodName, {someArg="someValue"});

However prior to that, one might use evaluate():

theAnswer = evaluate('someObject.#theMethodName#(someArg="someValue")');

One can still get away without using evaluate() here, with a bit of messing around:

theObject.___someStaticMethodName = theObject[theMethodName]; // the ___ is just cos we want something unlikely to overwrite another method already in the object
theAnswer = theObject.___someStaticMethodName(someArg="someValue");

That's a bit more code that with evaluate(), but I still think it's a bit more aesthetically pleasing that parsing & executing CFML at runtime... because bear in mind that at runtime, the natural order of things is that we're running Java bytecode, not CFML. So it goes a bit against the grain to be running CFML as well. Still: that's mostly just aesthetics.

Performance

I think the reason most people acting avoid using evaluate() is because it runs so damned slowly. But... um... here's the thing. It doesn't. It runs marginally slower than most of the non-evaluate() approaches (but not all of them!), but we're talking on a scale of 10ths / 100ths of milliseconds difference per operation. IE: on my machine 10000 iterations of my test rig for comparing dynamic variable evaluation compared to using associative array syntax, the difference was about 100ms. so 0.01ms (10 microseconds) per iteration difference. Another way of looking at it - if you really like wringing your hands over performance - is the evaluate() version is twice as slow the associative array notation, but at the scale we're talking about it simply doesn't matter.

Most of the operations I tested were about the same sort of comparison, except surprisingly the approach of injecting a statically-named reference to a method representing a dynamically named one was actually "significantly" slower than using evaluate(). I have no idea why, but there you go.

Testing Performance

My test rig for this was to do the usual approach of "loop over things a lot to 'amplify' the time an operation takes" whilst performing the test operation. I used 10000 iterations.

As well as that looping, I also used JMeter to run that code using five test threads each running five separate users to run the test code. So each test round was 25 sets of 10000 iterations, running as simultaneously as JMeter, my computer, and my CF install would allow. I was not timing the requests themselves, I was only timing the code within the requests, so there was no warm-up or tear-down overhead in my measurements.

I did two comparison tests: one round using one of the evaluate() techniques listed above, and the other using the non-evaluate() technique.

I can't stress enough that if - to test the performance of some code - you need to do this "loop over it heaps to see how long a lot of iterations take" approach, because the performance differences between the control and the test code is neglible when only doing a single operation, then just stop testing. Save yerself some time, cos the answer is: "it doesn't matter". Don't contrive an artificial situation in which you can see a difference. If you can't see the difference in a normal situation, then there is no difference worth worrying about. Doing an actual load test is different, because code can behave differently when it's run more than once at a time. Load testing for performance is essential.

Code

Here's the code I used.

Dynamic variable names

Firstly using evaluate():

tally=0;
start=getTickCount();
for (i=1; i <= URL.iterations; i++){
    tally +=evaluate("application.variable#i#");
}
writeLog(file="#application.applicationname#_evaluate", text="Elapsed: #getTickCount()-start#ms");

Then using associative array notation:

tally=0;
start=getTickCount();
for (i=1; i <= URL.iterations; i++){
    tally += application["variable#i#"];
}
writeLog(file="#application.applicationname#_array", text="Elapsed: #getTickCount()-start#ms");


There's no surprises here.

valueList() with dynamic values

Using evaluate():

coloursInMaori      = makeQuery("colour", ["whero","karaka","kowhai","kakariki","kikorangi","tawatawa","mawhero"]);
coloursInEnglish    = makeQuery("colour", ["red","orange","yellow","green","blue","indigo","violet"]);

start=getTickCount();
for (i=1; i <= URL.iterations; i++){
    languageToUse = "English";
    rainbow = evaluate("valueList(coloursIn#languageToUse#.colour)");

    dynamicColumn = "id";
    englishColumnData = evaluate("valueList(coloursInEnglish.#dynamicColumn#)");
}
writeLog(file="#application.applicationname#_queryEvaluate", text="Elapsed: #getTickCount()-start#ms");

And avoiding it:
coloursInMaori      = makeQuery("colour", ["whero","karaka","kowhai","kakariki","kikorangi","tawatawa","mawhero"]);
coloursInEnglish    = makeQuery("colour", ["red","orange","yellow","green","blue","indigo","violet"]);

start=getTickCount();
for (i=1; i <= URL.iterations; i++){
    languageToUse = "English";
    queryToUse = variables["coloursIn#languageToUse#"];
    rainbow = valueList(queryToUse.colour);

    dynamicColumn = "id";
    englishColumnData = arrayToList(coloursInEnglish[dynamicColumn]);
}
writeLog(file="#application.applicationname#_queryReference", text="Elapsed: #getTickCount()-start#ms");

NB: makeQuery() is just a UDF which makes a query with cols ID and [as specified in the function call], with a row for each element in the array.

Dynamicially referencing scopes

This lot is a bit more complicated. We've got an Application.cfc which sets some things into the application scope (so as to not need to do so each test), a coupla implementations of that scope cache thing I mentioned above, and the two test scripts.

// Application.cfc
component {

    this.name                = "scopeCache021";
    this.sessionManagement    = true;

    function onRequestStart(){
        if (!structKeyExists(application, "scopeCaches")){
            application.scopeCaches = {
                evaluate    = new ScopeCacheEvaluate("application"),
                reference    = new ScopeCacheReference("application")
            };
        }
    }
}

I need to put that code in onRequestStart() because the reference-based cache CFC (see below) refers to the session scope, which doesn't exist yet in onApplicationStart().

// ScopeCacheEvaluate.cfc
component {

    function init(scope){
        variables.scope = arguments.scope;
        "#variables.scope#.scopeCache" = {};
        
        return this;
    }
    
    function put(key, value){
        "#variables.scope#.scopeCache.#key#" = value;
    }
    
    function get(key){
        return evaluate("#variables.scope#.scopeCache.#key#");
    }

}

// ScopeCacheReference.cfc
component {

    function init(scope){
        variables.scopeReferences = {
            request = request,
            session    = session,
            application = application
        };
        
        variables.scope = arguments.scope;

        variables.scopeReferences[variables.scope].scopeCache = {};
        
        return this;
    }
    
    function put(key, value){
        variables.scopeReferences[variables.scope].scopeCache[key] = value;
    }
    
    function get(key){
        return variables.scopeReferences[variables.scope].scopeCache[key];
    }

}

// usingEvaluate.cfm
for (i=1; i <= URL.iterations; i++){
    application.scopeCaches.evaluate.put(key="variables#i#", value="value #i#");
}
start=getTickCount();
for (i=1; i <= URL.iterations; i++){
    application.scopeCaches.evaluate.get(key="variables#i#");
}
writeLog(file="#application.applicationname#_evaluateCache", text="Elapsed: #getTickCount()-start#ms");

// usingScopeReferences.cfm
for (i=1; i <= URL.iterations; i++){
    application.scopeCaches.reference.put(key="variables#i#", value="value #i#");
}
start=getTickCount();
for (i=1; i <= URL.iterations; i++){
    application.scopeCaches.reference.get(key="variables#i#");
}
writeLog(file="#application.applicationname#_referenceCache", text="Elapsed: #getTickCount()-start#ms");

Object method tests

This time we have a Numbers.cfc which we are calling methods from, plus three tests: using evaluate(), using method reference injection, as well as a "control" which just uses <cfinvoke>.

// Numbers.cfc
component {

    variables.maori        = ["tahi", "rua", "toru", "wha"];
    variables.english    = ["one", "two", "three", "four"];

    public function inMaori(number){
        return variables.maori[number];
    }

    public function inEnglish(number){
        return variables.english[number];
    }

}

<!--- usingEvaluate.cfm --->
<cfparam name="URL.iterations" type="integer">
<cfset start=getTickCount()>
<cfloop index="i" from="1" to="#URL.iterations#">
    <cfset languageToUse = "Maori">
    <cfset evaluate("application.numbers.in#languageToUse#(number=1)")>

    <cfset languageToUse = "English">
    <cfset evaluate("application.numbers.in#languageToUse#(number=2)")>
</cfloop>
<cflog file="#application.applicationname#_evaluate" text="Elapsed: #getTickCount()-start#ms">

<!--- usingReferenceInjection.cfm --->
<cfparam name="URL.iterations" type="integer">
<cfset start=getTickCount()>
<cfloop index="i" from="1" to="#URL.iterations#">
    <cfset languageToUse = "Maori">
    <cfset application.numbers.__staticallyNamedMethod = application.numbers["in#languageToUse#"]>
    <cfset application.numbers.__staticallyNamedMethod(number=1)>

    <cfset languageToUse = "English">
    <cfset application.numbers.__staticallyNamedMethod = application.numbers["in#languageToUse#"]>
    <cfset application.numbers.__staticallyNamedMethod(number=2)>
</cfloop>
<cflog file="#application.applicationname#_reference" text="Elapsed: #getTickCount()-start#ms">

<!--- usingCfinvoke.cfm--->
<cfparam name="URL.iterations" type="integer">
<cfset start=getTickCount()>
<cfloop index="i" from="1" to="#URL.iterations#">
    <cfset languageToUse = "Maori">
    <cfinvoke component="#application.numbers#" method="in#languageToUse#" number="1">

    <cfset languageToUse = "English">
    <cfinvoke component="#application.numbers#" method="in#languageToUse#" number="2">
</cfloop>
<cflog file="#application.applicationname#_cfinvoke" text="Elapsed: #getTickCount()-start#ms">

I should do another test using invoke(), I guess, just to see how that fares.


That's it for the code. That's really just there in case anyone spots where my tests are unbalanced. I've tried to make each test analogous to the other test(s) it's being measured against.

Summary of results

Here's a sample test run's logged output:

"Severity","ThreadID","Date","Time","Application","Message"
"Information","catalina-exec-8","08/08/13","20:26:45",,"C:\apps\adobe\ColdFusion\10\cfusion\logs\evaluatePerformance010_array.log initialized"
"Information","catalina-exec-8","08/08/13","20:26:45","EVALUATEPERFORMANCE010","Elapsed: 98ms"
"Information","catalina-exec-13","08/08/13","20:26:45","EVALUATEPERFORMANCE010","Elapsed: 41ms"
[and continued in the same vein for 75 records]


And a summary of each of the tests...

Dynamic variable names



Stat\Testevaluate()Array
Min3221
25%3422
Mean3622
75%3923
Max19998

Here the Max values are definite outliers, as everything else is grouped around 30-odd or 20-odd milliseconds. On the face of it that's a dramatic "improvement" using associative array notation, but bear in mind that's over 10000 iterations. So an insignificant "real world" difference. If one is processing 10000 iterations of something on a regular basis, there are going to be more relevant things to be investigating than this difference.

valueList()

This test both dynamic query names and dynamic columns name.


Stat\Testevaluate()Array
Min6545
25%6847
Mean7048
75%72.550
Max10958

This is about the same difference. The values are higher across the board, I guess, because there were two tests being performed in this test rig, rather than the single one previously.

Dynamic scope referencing

These are the tests using those pared down "scope cached". We're using CFC method calls here, so that seems to blow out performance in general.

Stat\Testevaluate()Reference
Min12077
25%13988
Mean176107
75%280175.5
Max536438

But it's still a similar sort of performance comparison. Still bear in mind that whilst 176ms vs 107ms sounds like a lot compared to the other results so far, bear in mind that that equates to 0.0069ms per iteration difference. So: nothing.

Dynamic object methods

This time around we have three columns: evaluate(), inserting object references, and using <cfinvoke>.

Update:

I've redone the test and the results table to include invoke().
Stat\Testevaluate()Reference<cfinvoke>invoke()
Min1211719383
25%141181.2510290
Mean16024210898
75%232.5334.25126.25132
Max475731245216

This is the interesting result: using method-reference injection is slower than using evaluate(). I was surprised by this, so ran the tests repeatedly to see if there was a difference, but the results were generally the same. I'm not on the same machine currently as I did those tests, so I cannot add another column for invoke() as well, but I'll do that this evening (update: done... invoke is fractionally faster than <cfinvoke>, it seems. But inconsequentially so). Still: I don't think the difference is worth worrying about, but it's interesting that creating a function reference and inserting it into an object has any sort of overhead at all. And <cfinvoke> is most performant, but not really so much ahead of evaluate() in this case. I guess the overhead of calling methods on objects overwhelms the difference between doing if with/without evaluate().

Conclusion

I think using evaluate() is always a poor choice syntactically, because there's almost always an approach that uses more elegant code that has the benefit of being compiled before it's run. And I think even the more verbose "work-arounds" like with the scope cache or injecting methods into an object are tidier code.

But one consideration I will not be dwelling on from now on is that I will not give any thought to performance considerations. Because they just aren't there. The performance consideration did used to be there on older versions of ColdFusion. I wonder how long it's not been the case though? Certainly by ColdFusion 9, which I used to test at work yesterday, before gathering my more comprehensive results on CF10 last night. I've yet to try Railo.

You?

Have you got any other use cases in which you're using evaluate(), or have seen it in use? Or do you have your own opinions on this? Sing out!

Cheers.

--
Adam