Monday 2 November 2015

ColdFusion 2016: ordered/sorted structs

G'day:
One interesting new feature of ColdFusion 2016 is the ability to affect key ordering in structs, in two different ways. (This, btw, is something that has been exposed in the public domain, and I have made sure I am not breaking any NDA in writing this up).

Traditional structs do not have a sense of ordering in their keys. This is by design: a struct is an unordered collection. When iterating over a struct by key, obviously the keys will be iterated in a specific order, but there is no guarantee what that ordering will be as the specification does not... well... specify one, so it's down to the implementation. ColdFusion might expose the keys in alphabetical order; Lucee might do it in order of creation; some other CFML implementation might do it some other way. If one wants to control the order of how keys are accessed, one needs to extract the keys and sort them, then iterate over that. EG:

categoryTallies = {
    "gold" = 2,
    "premium" = 5309,
    "1st" = 37,
    "second" = 68113,
    "general" = 491
}; 

categoryTallies.each(function(category,tally){
    writeOutput("Category: #category#; tally: #tally#<br>");
});

writeOutput("<hr>");

sorted = categoryTallies.sort(function(category1, category2){
    return sgn(category1 - category2); 
}).each(function(category,tally){
    writeOutput("Category: #category#; tally: #categoryTallies[category]#<br>");
});

Looking at the first part of this output in ColdFusion gives this:

Category: gold; tally: 2
Category: 1st; tally: 37
Category: premium; tally: 5309
Category: second; tally: 68113
Category: general; tally: 491


And pleasingly Lucee returns a different order, demonstrating my point:

Category: second; tally: 68113
Category: premium; tally: 5309
Category: general; tally: 491
Category: gold; tally: 2
Category: 1st; tally: 37


So ColdFusion's struct implementation returns the keys in the order they were added; Lucee... dunno. I dunno what order that it. But that's fine, I don't need to know because it's not relevant.

Looking further into the code, I get myself an array of keys sorted by tally, and then use those as the basis for an iteration outputing the same info, but now sorted:

Category: gold; tally: 2
Category: 1st; tally: 37
Category: general; tally: 491
Category: premium; tally: 5309
Category: second; tally: 68113


BTW, I was initially confused as to why that code only worked on ColdFusion, and not Lucee. I had not realised that structSort() (and therefore Struct.sort()) doesn't work the way its array equivalent does, and doesn't take a callback for the comparator. That's a bit rubbish that the implementation wasn't uniform across all the collection iteration methods. However it seems that in ColdFusion 2016, it's now been implemented. I guess this is part of this new struct implementation.



So now ColdFusion has added in two new types of struct: ordered and sorted. Firstly I'll get ordered structs out of the way as they're not very interesting. An ordered struct preserves the order in which its keys were added, eg:

normal = structNew();
normal.first = 1;
normal.second = 2;
normal.third = 3;
normal.fourth = 4;
normal.fifth = 5;    

normal.each(function(k,v){
    writeOutput("#k#: #v#<br>");
});
writeOutput("<hr>");

ordered = structNew("ordered");
ordered.first = 1;
ordered.second = 2;
ordered.third = 3;
ordered.fourth = 4;
ordered.fifth = 5;    

ordered.each(function(k,v){
    writeOutput("#k#: #v#<br>");
});


The first example is a baseline, and demonstrates that a normal CFML struct does not preserve the order that its keys were set:

THIRD: 3
SECOND: 2
FIFTH: 5
FIRST: 1
FOURTH: 4


The second part of the listing shows that using an ordered struct that the key order is preserved:


FIRST: 1
SECOND: 2
THIRD: 3
FOURTH: 4
FIFTH: 5


I dunno how useful that is, but there's precedent in other languages to have ordered collections, so fair enough.

There is no literal syntax for this construct yet - ie: the equivalent of {key=value} for normal structs. This might be in the pipeline.

Sorted structs are slightly more interesting. A baseline example is:

sorted = structNew("sorted");
sorted.azure = "blue";
sorted.adze = "tool";
sorted.adam = "dork";
sorted.alabama = 3;

writeDump(sorted);

And this outputs:


The default order is alphabetical on key.

There's an alternative to sort it descending:

sorted = structNew("sorted", "desc");

Output:


(the explicit value for sorting ascending is "asc", btw)

Not interesting.

One can also supply a callback which is used as a sorting comparator:

sorted = structNew("sorted", function(e1,e2){
    return e1.len() - e2.len();
});

Now this should sort the struct's keys based on key length. And it kinda does:


Yikes! I've lost one of my keys. I revised my code to just do this:

sorted = structNew("sorted", function(e1,e2){
    return compare(e1,e2);
});

And that did return all the keys (and in alpha order, as expected), so I guess there's a glitch when the comparator function returns zero (eg in my first example where the key lengths are the same). I will raise a bug for this with Adobe (but it'll be on the pre-release bugbase, so I cannot post a link to it here, sorry).

Furthermore, if you pay attention to the comparator function you'll perhaps spot something less than ideal... here's what's passed into the callback:


Just the key name. This is kinda useless. It's entirely reasonable for one to want to order the struct by the values of the elements too. Indeed I think it's really only useful to sort on values, rather than keys (because other than alphabetical... what order is one really going to want, if the key is all that's sortable?)

Another thing that I'm left to wonder is is there any real point in having both ordered and sorted structs? ordering and sorting in this context is pretty much the same thing, and really insertion-order, alphabetical-order-ascending and alphabetical-order descending are all just ways of ordering things. So why do we need two variations? All we need is three method signatures:

orderedByInsertion = structNew("ordered");
orderedByAlpha = structNew("ordered", "asc"); // or "desc"
orderedByComparator = structNew("ordered, function);


We don't need two separate implementations here. I reckon they should get rid of one.

The last thing I will observe is that this tests shows something interesting:


param URL.iterations=1;

timeIt("Populate standard struct", function(){
    standardStruct = structNew();
    for (var i=1; i <= URL.iterations; i++){
        standardStruct[i] = i;
    }
    writeOutput("Added #standardStruct.count()# keys to standardStruct<br>");
});

timeIt("Populate sorted struct", function(){
    sortedStruct = structNew("sorted", function(e1, e2){
        return sgn(e1 - e2);
    });
    for (var i=1; i <= URL.iterations; i++){
        sortedStruct[i] = i;
    }
    writeOutput("Added #sortedStruct.count()# keys to sortedStruct<br>");
});

timeIt("Get keys from standard struct", function(){
    var keys = standardStruct.keyArray();
    writeOutput("#keys.len()# keys extracted from standardStruct<br>");
});

timeIt("Get keys from sorted struct", function(){
    var keys = sortedStruct.keyArray();
    writeOutput("#keys.len()# keys extracted from sortedStruct<br>");
});


function timeIt(label, task){
    writeOutput("<h3>#label#</h3>");
    var startTime = getTickCount();
    task();
    var endTime = getTickCount();
    var executionTime = endTime - startTime;
    writeOutput("Execution time: #executionTime#ms<hr>");
};

This code times how long it takes to populate a normal struct, and a sorted struct. And if we run it over a lot of iterations, we see this:


Populate standard struct

Added 10000 keys to standardStruct
Execution time: 17ms



Populate sorted struct

Added 10000 keys to sortedStruct
Execution time: 223ms


Get keys from standard struct

10000 keys extracted from standardStruct
Execution time: 3ms

Get keys from sorted struct

10000 keys extracted from sortedStruct
Execution time: 20ms


It seems to me that the sorting is being done as the struct is populated. This strikes me as the wrong time to do it, and incurs a lot of extra work. It doesn't matter what order the keys are in until one iterates over them. It's only when one comes to use adjacent keys that it's important to have them in the prescribed order. Whereas sorting them on every key insertion is n-1 times more work than is necessary (where n is the number of keys in the struct). I think the sorting should be deferred until it comes time to iterate over the struct. That said, it seems to run pretty quickly, so there's no real performance overhead. I think this, though, is why only the key is exposed to the comparator function: if the value was also considered, sorting would also need to be done if a value is changed. But, still, only when it comes time to iterate over said struct.

So I'm unconvinced by this feature. I think as it stands, it's not very useful, and the implementation is lacking. But let's bear in mind that we're only in the alpha, so things can still be improved.

Righto.

--
Adam