Friday, 29 March 2013

Unexpectedly performance differences between listFind() and arrayFind() in ColdFusion

G'day:
This just cropped up whilst I was doing some research for work (yeah... on Good Friday. Working remotely is being a right pain, so I'm well behind on my stretch for the week, so playing catch-up today and tomorrow).

The main website I work on is a multi-lingual one, represented in 11 different languages. On the whole all language sites are treated equally, but in some situations there are exceptions (I'm wary about discussing work stuff on my blog, so I'm going to stay vague here).


We have a Language object which has methods like isGermanicLanguage() (that's completely fictitious: we do not have a method called that, it's just an example), and we call that to determine whether - in a given situation - we need to react differently, eg:

if (Language.isGermanicLanguage(currentLanguage)){
    // do this
} else {
    // do that
}

To know whether a language is germanic or not, we maintain a list within Language.cfc which contains the germanic languages, eg:

component {

    variables.germanicLanguages = "12,345,6,78"; // these are languageIds

    public boolean function isGermanicLanguage(required numeric languageId){
        return listFind(variables.germanicLanguages);
    }

}

Simple stuff.

Anyway, at the moment I'm scoping out some work that will be changing the way the exception lists are maintained, adding some more, and generally revisiting this code.

I detest lists. They just seem like a really crappy way of creating a sort of faux compound data structure. So when doing this I thought... hmmm... maybe we should use arrays there instead? I mean there's not much in it, but if I could not use a list, that must be a good thing.

Tangentially a thought popped into my mind... hmmm... I wonder how an arrayFind() will perform compared to a listFind()? Note: performance is not a real world consideration here because these lists are all very short, so either a list look-up or an array look-up is going to take an inconsequential amount of time, and there are better ways to improve performance than looking in places like this. So this is not a consideration for the work I'm doing, but I wanted to know the answer.

I also suspected that the performance of a listFind() would drop off greatly as the lists got longer, and that the arrayFind() would probably have fairly static performance. Just cos string handling is slow.

So I concocted some test code which did a find in lists / arrays of varying lengths, and looked at the results.

param name="URL.size" type="integer" default=0;

a = [];
l = "";
for (i=1; i <= URL.size; i++){
    element = createUuid();
    arrayAppend(a, element);
    l = listAppend(l, element);
}

which = randRange(1, URL.size);
value = a[which];

startTime = getTickCount();
listResult = listFind(l, value);
listTime = getTickCount() - startTime;

startTime = getTickCount();
arrayResult = arrayFind(a, value);
arrayTime = getTickCount() - startTime;

writeDump({
    which    = which,
    value    = value,
    results    = {
        list    = listResult,
        array    = arrayResult
    },
    times    = {
        list    = listTime,
        array    = arrayTime
    }
});

This makes a list & array of a specified size, with each element being a UUID. I then pick a random element to do a find on, and then go find it from each of the list and then the array. And report on the results.

For 10000 elements, I get this sort of thing:

struct
RESULTS
struct
ARRAY4926
LIST4926
TIMES
struct
ARRAY134
LIST1
VALUE0B4D2FA0-215D-AD32-CE2AAFA8220BF7A3
WHICH4926

For larger or smaller lists, the order of magnitude of the difference was about equivalent.

I was rather surprised to see that arrayFind() was such a poor performer (comparatively). I altered the code to use the native Vector indexOf() method by way of comparison, adding this in, and outputting it in the dump:

startTime = getTickCount();
vectorResult = a.indexOf(value) + 1;
vectorTime = getTickCount() - startTime;

And I get this:

struct
RESULTS
struct
ARRAY4600
LIST4600
VECTOR4600
TIMES
struct
ARRAY119
LIST2
VECTOR1
VALUE0BC2C005-215D-AD32-CE05E2C74336366F
WHICH4600

Given a ColdFusion Array is a subclass of Vector, I expected the arrayFind() operation to be roughly the same as the indexOf() operation. But seemingly not.

So that's odd. I can't help but think that arrayFind() is doing something slightly wrong in its inner workings.

The bottom line is that for lists of non-stupid lengths, there's not really a consideration here at all: looking through 100 elements takes single-milliseconds for the array and less-than-one milliseconds for the list and Vector options, and one would be a loony to worry about that.

But that was today's mostly pointless investigation from me.



OH: I ran all this on Railo, and everything ran orders of magnitude faster. On the 10k-element test, all three options ran in 0-1ms every time I ran them. Also even as I ramped the list/array sizes up, there was no real difference in performance between the two. So good work Railo guys there! They're definitely doing something more cleverly than their Adobe compadres are.

And lastly... building the list and array took way longer than doing the find operation in them. But that's most likely the time it takes to generate a UUID.

--
Adam