Saturday 11 July 2015

CFML: Are collection iteration methods really significantly slower than looping statements?

G'day:
I've heard people say a coupla times that one should think twice before using CFML's new collection iteration methods (eg: .each(), .map() etc) should be treated with caution, because they are substantially slower than using looping statements (eg: a for() loop). Now I don't doubt that these iteration methods are doing more work that a loop statement, but I really was skeptical as to whether the difference really matters. I believe on should err towards writing the clearest code first, and worry about performance when it becomes something to worry about. And I not really think that the difference in performance here would be worth worry about, generally. And if there was a significant performance difference: it's more an indication that the language engineers have work to do, rather than simply dismissing a good coding tactic out of hand.



This came up again yesterday, and given I've never really tested it before, so I decided to give it a bash this morning whilst waiting to head for a dentist appointment.

Here's the code for my first test:

// compareWithNumericOperation.cfm

param name="URL.iterations" type="integer" default=0;

writeoutput("Running #URL.iterations# iterations<br><br>");

array = [];
for (i=1; i <= URL.iterations; i++){
    array.append(i);
}

include "numericOperationTagBasedTests.cfm";

sum=0;
start = getTickCount();
for (i=0; i <= URL.iterations; i++){
    sum = sum + i;
}
end=getTickCount();
writeOutput("for (general) took #end-start#ms<hr>");

sum=0;
start = getTickCount();
for (i in array){
    sum = sum + i;
}
end=getTickCount();
writeOutput("for (array) took #end-start#ms<hr>");

handler = function(i){
    sum = sum + i;
};

sum = 0;
start = getTickCount();
array.each(handler);
end=getTickCount();
writeOutput("each() with pre-defined handler took #end-start#ms<hr>");

sum = 0;
start = getTickCount();
array.each(function(i){
    sum = sum + i;
});
end=getTickCount();
writeOutput("each() with inline handler took #end-start#ms<hr>");

start = getTickCount();
array.reduce(function(sum, i){
    return sum + i;
}, 0);
end=getTickCount();
writeOutput("reduce() took #end-start#ms<hr>");


<!--- numericOperationTagBasedTests.cfm --->
<cfset sum = 0>
<cfset start = getTickCount()>
<cfloop index="i" from="1" to="#URL.iterations#">
    <cfset sum = sum + i>
</cfloop>
<cfset end = getTickCount()>
<cfoutput>cfloop (general) took #end-start#ms<hr></cfoutput>

<cfset sum = 0>
<cfset start = getTickCount()>
<cfloop index="i" array="#array#">
    <cfset sum = sum + i>
</cfloop>
<cfset end = getTickCount()>
<cfoutput>cfloop (array) took #end-start#ms<hr></cfoutput>


All I'm doing here is testing a few looping strategies to tally a sequence of numbers (for 1 to whatever one passes on the URL).

The tests I'm doing are:
  • indexed <cfloop>;
  • array <cfloop>;
  • indexed for() loop;
  • array for() loop;
  • .each() using a predefined callback;
  • .each() using an inline callback;
  • .reduce()

I have tried to keep the operation in the loop as simple as possible, so as to emphasise how long the loop itself takes.

Now I have said this before, but I'll say it again. Performance tests in which one just loops over some experimental code like a crack-addled banshee are not really much use to anyone. One needs to consider request concurrency, what other work a real-world application might be doing at the same time, and use actual code. The tests above are stupid. I also think that if one needs to loopreallyshitloadsoftimes to see a measurable/meaningful result, one already has one's answer: it doesn't matter.

Still... here goes.

Firstly, I'm running this on ColdFusion 11 (I'll get to Lucee later)... doing 100 iterations yields this to be a pointless exercise:

Running 100 iterations

cfloop (general) took 0ms


cfloop (array) took 0ms

for (general) took 1ms

for (array) took 0ms

each() with pre-defined handler took 2ms

each() with inline handler took 2ms

reduce() took 2ms


So it's the difference between 0-1ms and 2-4ms. So who reallly gives a shit?

But let's continue playing this silly game, and ramp it up to 1000 iterations to see if we get much divergence:

Running 1000 iterations

cfloop (general) took 0ms


cfloop (array) took 0ms

for (general) took 1ms

for (array) took 1ms

each() with pre-defined handler took 40ms

each() with inline handler took 39ms

reduce() took 41ms



OK, so the looping statements continue to not have any real processing impact. And now the iteration methods are taking some time to run. Good to know. The moral to the story so far is that if one needs to do like 1000 completely inconsequential loops, and one really has to have it all sorted out as fast as possible... use a for() loop. One still should not use a <cfloop> because one use <cfloop> in views, and I'd raise an eyebrow if you needed to do this sort of operation in a view.

On whim, I decided to make the operation slightly more complex, and use string operations (intrinsically slower than numeric operations) instead. My theory being that the looking overhead would disappear into the background noise. I'll not repeat all the code, but here's the central operation I'm performing in the loop:

string = string & repeatString(chr(64+i), 10);


I'm always cool with being wrong, and I was here. Here're the results:

Running 1000 iterations

cfloop (general) took 6ms


cfloop (array) took 6ms

for (general) took 6ms

for (array) took 5ms

each() with pre-defined handler took 50ms

each() with inline handler took 50ms

reduce() took 50ms



What I get from this is that there's a distinct looping overheard difference between statements and iteration methods, and this is more significant that the difference in a trivial operation and a slightly more complex process just pads this.

OK, so my conclusion on ColdFusion is that there's nothing to worry about here, for the sort of code one is likely to be running most of the time. But obviously bear in mind that in situations where speed is more important that your code being descriptive, then fall back to a for() loop. I think this will seldom be the case though. Just use the facilities CFML offers to make your code cleaner.

But what about Lucee. Here's the results of the 1000-iteration run on the numeric-based tests:

Running 1000 iterations

cfloop (general) took 2ms


cfloop (array) took 4ms

for (general) took 6ms

for (array) took 2ms

each() with pre-defined handler took 561ms

each() with inline handler took 587ms
reduce() took 534ms


That's a problem. Lucee is an order of magnitude slower than ColdFusion here. This is on Lucee 5, btw. I'll just run it on Lucee 4.5. Hold on...

Running 1000 iterations

cfloop (general) took 2ms


cfloop (array) took 4ms

for (general) took 4ms

for (array) took 5ms

each() with pre-defined handler took 4ms
each() with inline handler took 4ms
reduce() took 4ms


Phew. That's better. I guess there is an issue somewhere in Lucee 5 (which, remember, is still in beta). I'll flag this with their devs though (LDEV-427).

If I run this same test on Lucee 5 over 10000 iterations, it gets (proportionately) worse!

Running 10000 iterations

cfloop (general) took 5ms


cfloop (array) took 4ms

for (general) took 8ms

for (array) took 3ms
each() with pre-defined handler took 59233ms
each() with inline handler took 59723ms
reduce() took 61849ms


Something is very wrong there.

If we forget about Lucee 5, and focus on Lucee 4.5... this just goes to show that there really is nothing to worry about,performance-wise, here. Which is interesting as the person who has aired this concern most often is Igal, one of Lucee's own developers. I suspect he is - like a lot of devs tend to be - too hung-up on micro-optimisation, and not basing his conclusions on real-world situations or realistic degrees of concern.

One other odd thing about Lucee (both versions) is the emitted mark-up for those tests is about 4000 (!) lines long... mostly haemorrhaged whitespace. This is usually an issue with tag-based code, but it's a bit bloody weird to see it in script-based code. This is perhaps something else for them to look at (ie: WTF they are actually doing for all that whitespace to get in there... something is clearly not right). Now I'm not one to give a shit about wayward whitespace, but I think script emitting whitespace that isn't actually even there in the code is a cause for concern from a procedural perspective on their part (LDEV-428).

Anyway... don't worry about iteration performance on Lucee 4.5 or even ColdFusion 11. But let's hope they sort out their issues on Lucee 5.

Righto.

--
Adam