Wednesday 1 October 2014

Nano-optimsation of looping code. Fascinating but pointless

G'day:
It's PHP, Railo and ColdFusion today. A few days ago I was researching my "PHP: looking at some interesting "unexpected" behaviour with references" article which related to a StackOverflow question: "Assign by reference bug". One statement in one of the answers fascinated me:
(using reverse looping mostly because not enough people remember that's a thing you can do, and is more efficient than a forward loop =)
Interesting. And sounded like something I could get my teeth into and have a look at.

I didn't immediately get why this would be the case, but did some Googling and found a bunch of interesting articles / Stack Overflow questions/answers, and the reasoning made sense.

Consider a generic for loop:

for(initialisationStatement; conditionalExpression; iterationExpression)

EG:

for (i=1; i <= 10; i++)

  • The initialisationStatement is executed once at the beginning of the code block.
  • The conditionalExpression is evaluated at the beginning of every iteration. When it's false, processing skips to after the end of the block.
  • The iterationExpression is executed at the end of each iteration.
  • All three are optional.
  • The conditionalExpression needs to evaluate to a boolean value.

Let's consider one of these loops, over an array:

for ($i=0; $i < count($array); $i++)


  • $i=0 is run once
  • count($array) is run ten times
  • and its result is compared to $i ten times
  • $i is incremented ten times

Loosely speaking there's 31 operations there (some are weightier than others, I know).

Now consider this:

for ($size=count($array)-1; $size-- >= 0;)
  • $size is initialised once
  • count($array) is called once
  • $size-- is compared to a static value ten times

12 operations. With the weightiest one - count() - now being run once instead of ten times.

Oh, and yeah, the boolean check and the decrement could be considered two operations, but you get my point: there's still fewer operations. Also remember that as a boolean expression, it's lighter weight than the previous example as it's not comparing two variables.

This can be "improved" even further. Consider what happens when we do i--:
  • The current value of i is stored
  • i is decremented
  • The previously stored value is returned
If we used --i instead, it's this:
  • i is decremented
  • and returned

More relevant-ish reading on quirks of these sort of operators: No, the ++ operators are not thread safe.

If you have any sense, you'll be rolling your eyes about all this as the sort of optimisations we're talking about here are shrinkingly inconsequential, and require somewhat unorthodox code. So is it worth it?

Let's see. Here's some tests in CFML:

// baseline.cfm

URL.size = URL.size ?: 1

function getSize(size){
    return size
}

timeJob = function(message, job) {
    var start = getTickCount()
    job()
    var end = getTickCount()
    var duration = end - start
    echo("#message#: #duration#ms<br>")
}

echo("<h3>Running #size# iterations</h3>");

timeJob("Baseline (ascending)", function(){
    for (var i=1; i <= getSize(URL.size); i++){
        var b = i
    }
})

timeJob("Using variable for length (ascending)", function(){
    var iterations = getSize(URL.size)
    for (var i=1; i <= iterations; i++){
        var b = i
    }
})

timeJob("Post decrement (descending)", function(){
    for (var i=getSize(URL.size)+1; i--;){ // (n-1)-0
        var b = i
    }
})

timeJob("Pre decrement (descending)", function(){
    for (var i=getSize(URL.size)+1; --i;){
        var b = i
    }
})

timeJob("Pre decrement with additional operation (descending)", function(){
    for (var i=getSize(URL.size)+1; --i;){
        var b = i
        var c = i
    }
})

Notes:
  • There's a ColdFusion 11 version of this file here: baselineColdFusion.cfm.
  • I'm not using an array here, but a faux function call which gets the number of iterations to run. The reason being that the number of iterations I need to do to amplify the results until they were distinct was killing my PHP install (out of memory). Pleasingly, Railo coped OK, but I wanted a like-for-like test, so just ran with code which would work on both platforms
  • I have a function which times a job.
  • And a bunch of calls to it. All the jobs simply loop the number of times requested, doing a small amount of work.
  • Note that one of the loops iterates (n-1) to 0 instead of n to 1; the important thing in these loops is the number of iterations, not the index value.
  • Each job in turn refines the looping strategy, inline with my discussion above.
  • The last job has twice as much work to do within each iteration.
Here's the equivalent PHP code:

// baseline.php

$size = isset($_GET["size"]) ? $_GET["size"] : 1;

function getSize($size){
    return $size;
}

$timeJob = function($message, $job) {
    $start = microtime(true);
    $job();
    $end = microtime(true);
    $duration = (int)(($end - $start) * 1000);
    echo sprintf("%s: %sms<br>", $message, $duration);
};

echo "<h3>Running $size iterations</h3>";

$timeJob("Baseline (ascending)", function() use ($size){
    for ($i=1; $i <= getSize($size); $i++){
        $b = $i;
    }
});

$timeJob("Using variable for length (ascending)", function() use ($size){
    $iterations = getSize($size);
    for ($i=1; $i <= $iterations; $i++){
        $b = $i;
    }
});

$timeJob("Post decrement (descending)", function() use ($size){
    for ($i=getSize($size); $i--;){ // (n-1)-0
        $b = $i;
    }
});

$timeJob("Pre decrement (descending)", function() use ($size){
    for ($i=getSize($size)+1; --$i;){
        $b = $i;
    }
});

$timeJob("Pre decrement with additional operation (descending)", function() use ($size){
    for ($i=getSize($size)+1; --$i;){
        $b = $i;
        $c = $i;
    }
});

OK, so how does this lot fare? I had to run 100000 iterations to get results I could see, so the TL;DR of all this is "it's simply not worth worrying about". But there is a real difference. Check this out:

Railo:

Running 100000 iterations

Baseline (ascending): 120ms
Using variable for length (ascending): 51ms
Post decrement (descending): 57ms
Pre decrement (descending): 51ms
Pre decrement with additional operation (descending): 46ms


ColdFusion:

Running 100000 iterations

Baseline (ascending): 207ms
Using variable for length (ascending): 9ms
Post decrement (descending): 3ms
Pre decrement (descending): 3ms
Pre decrement with additional operation (descending): 3ms


PHP:

Running 100000 iterations

Baseline (ascending): 72ms
Using variable for length (ascending): 29ms
Post decrement (descending): 7ms
Pre decrement (descending): 7ms
Pre decrement with additional operation (descending): 8ms


All these results are a fair "standard" result, when checking about 20-30 runs for each platform.

Interestingly... ColdFusion is fastest. Also interestingly: Railo is overall a lot slower (comparatively). Well: Railo seems better at calling functions (the top result): but for basic looping operations, it's quite a bit slower.

But... come on... a few (or few tens) of milliseconds over 100000 iterations? Who the hell cares?

I included the last test in there - the one with the doubling-up of the operations - as I wondered whether any looping improvement would be lost in the noise of what the looped code was actually doing. In this example it was not really the case. Everything just ran fast. Obviously though, if the loop had heavier duty operations in it like DB hits or a bunch of function calls... a better place to make gains would be too optimise that code. It's simply not worth optimising the looping construct.

What I would attempt to do is to make the looping code clear in its intent. I'd generally never use one of these general purpose loops to - for example - loop over an array. If I was wanting to do something with each element, I'd call Array.each() (perhaps array_walk() in PHP); if I was wanting to update each element of an array, I'd call a map() function; if I wanted to derive a value from an array, I'd call reduce(). etc. Write code that best describes what you're doing. Don't piss about writing non-descriptive code because you think it might be a bit faster. Because - almost always - that would not matter. And I'd not even use a generic solution (for(value in array) in CFML or foreach($array as $key=>$value) in PHP); I'd use the functionality that best describes the operation.

Interesting stuff, but also a lesson in "what not to bother with", I reckon. It's kinda nice to be aware of the differences of the "inner" workings of things though, eh?

Righto.

--
Adam