Sunday 12 October 2014

CFML: prime number generator UDF, and overhead of using inline function expressions

G'day:
I continued to play around with faux generators in CFML after I finished "CFML: emulating generators with closure" yesterday. Whilst I didn't think the world would benefit from there being a palindrome generator on CFLib.org, it might benefit slightly by having one that generated primes. CFLib.org already has a function which returns an array of primes up to a threshold (GetPrimes()), but that's a slightly different requirement than being able to generate them one at a time. And I wanted to fiddle about anyhow, so that became my mission.

The actual code is not mention-worthy in and of itself, but a coupla things I noticed along the way probably are.


Warning:

There are two logic errors in the code samples here, but I don't have enough time @ the moment to fix it. The general discussion here is still sound, and the code works, but it could be improved. As the error is in all examples, it does not impact any of the conclusions I draw here. I'll update the code and post a new article once I have time.

Without thinking, my first take at writing the UDF used Railo-only syntax: no semicolons, and used the Array.every() method. Which I forgot ColdFusion hasn't implemented yet:

function createPrimeNumberSequence(){
    var primes = []
    var potential = 1
    return function(){
        while(true) {
            var upperThresholdToCheck = sqr(++potential)
            var potentialIsPrime = primes.every(function(prime){
                return potential mod prime != 0 || potential <= upperThresholdToCheck
            });
            if (potentialIsPrime) {
                primes.append(potential)
                return potential
            }
        }
    }
}

This works fine, but I cannot submit it to CFLib, as Ray only wants to support ColdFusion there (I disagree with this, but he's the boss, so be it). So I knocked together a more verbose version, but one that would run on ColdFusion 11 (and 10 as well, I think):

function createPrimeNumberSequence(){
    var primes = [];
    var potential = 1;
    return function(){
        while(true) {
            var upperThresholdToCheck = sqr(++potential);
            var potentialIsPrime = true;
            for (var prime in primes){
                if (potential mod prime == 0) {
                    potentialIsPrime = false;
                    break;
                }
                if (prime > upperThresholdToCheck){
                    break;
                }
            }
            if (potentialIsPrime) {
                primes.append(potential);
                return potential;
            }
        }
    };
}

This uses a for(value in array) loop rather than Array.every(). It's not too bad.

But...

... I went to check the performance of both, and was horrified at the results, which were along these lines:

Original using every() and inline handler

Execution time: 45723ms


Using for() loop

Execution time: 10ms



This was over 1000 iteration. OK. I kinda expected the every() version to be a bit slower, but not to this degree. Which - I think we can agree - is no in the realms of "inconsequential" like a lot of performance tests seem to be.

But then it was time to get on with other things in my Saturday afternoon, so I decided a) I had my blog article topic for Sunday sorted out; b) I need to put some metrics in to work out why the difference was so huge.

I had an epiphany this morning (I awoke with the thought "oh f***... you dick", which is an annoying way to wake up on a Sunday), and decided it was likely to be that I'm redeclaring the callback for the every() call every iteration, which was not very smart (and completely unnecessary), so I did another refactoring:

function createPrimeNumberSequence(){
    var primes = [];
    var potential = 1;
    var upperThresholdToCheck = false;

    var everyHandler = function(prime){
        return potential mod prime != 0 || potential <= upperThresholdToCheck;
    }

    return function(){
        while(true) {
            upperThresholdToCheck = sqr(++potential);
            var potentialIsPrime = primes.every(everyHandler);
            if (potentialIsPrime) {
                primes.append(potential);
                return potential;
            }
        }
    };
}

Notice how I've moved the declaration of the every() callback out of the loop. This means the handler only gets created once, and just gets called multiple times. So how does this change things?


Original using every() and inline handler

Execution time: 45723ms


Using for() loop

Execution time: 10ms


Original using every() and predefined handler

Execution time: 768ms



768ms is much better, but still a way over just using the for() loop.

One last thing I noticed is the exit conditions for the for() version are separate, whereas with the every() versions they're combined. This is clearly now in the realms of micro-optimisation, but it is doing unnecessary work, so I thought to refactor that to be more like-for-like too:

function createPrimeNumberSequence4(){
    var primes = [];
    var potential = 1;
    var upperThresholdToCheck = false;

    var everyHandler = function(prime){
        if (potential mod prime != 0) return true;
        if (potential <= upperThresholdToCheck) return true;
        return false;
    }

    return function(){
        while(true) {
            upperThresholdToCheck = sqr(++potential);
            var potentialIsPrime = primes.every(everyHandler);
            if (potentialIsPrime) {
                primes.append(potential);
                return potential;
            }
        }
    };
}

Now we have this:


Original using every() and inline handler

Execution time: 45723ms


Using for() loop

Execution time: 10ms


Original using every() and predefined handler

Execution time: 768ms


Original using every() and predefined handler and separate exit conditions

Execution time: 655ms



So there's not much in it. And, TBH, sometimes the latter two are reversed, so I'd say there's no real difference that rises above the background noise of general processing.

My conclusion here is that I'd say these callback based iteration functions are the way one should go in low-scale sort of situations, but where there's a lot of processing going on, the fact they're slower than just a general loop does begin to matter.

I'm gonna submit that second version to CFLib (and I still have the keys to the kingdom over there, so I'm going to approve it too ;-), and will link back here once I'm done (done: createPrimeNumberSequence()).

Righto.

--
Adam