Wednesday, 26 March 2014

Using a sort iteration function to implement a custom ranking

G'day:
It occurred to me last night that - in all my examples of using a callback-based sort on a collection - I am always just using the natural ordering of a field; eg I sort an array of structs on a key in the struct which needs either alphabetical or numeric sorting. EG:

numbers = [
    {maori="wha", digit=4},
    {maori="toru", digit=3},
    {maori="rua", digit=2},
    {maori="tahi", digit=1}
]

dump(numbers.sort(function(v1,v2){
    return sgn(v1.digit - v2.digit)
}))


This is selling the functionality short, somewhat.

Here's an example of using a callback sort to effect sorting on a custom sequence: not numeric or alphabetical, but any custom sequence one might want. I achieve this by using closure to create a custom sort callback, then using the callback in the sort operation:

function customRankings(required array options, required string field, boolean throwOnNoMatch=false){
    return function(v1,v2){
        var v1Place = options.findNoCase(v1[field])
        var v2Place = options.findNoCase(v2[field])

        if (v1Place && v2Place){
            return sgn(v1Place - v2Place)
        }
        if (throwOnNoMatch){
            throw(type="IllegalValueException", message="Incorrect value in sort field", detail="Sort field must contain only #options.toString()#")
        }

        if (v1Place){
            return -1
        }
        
        if (v2Place){
            return 1
        }

        if (isNumeric(v1[field]) && isNumeric(v2[field])) {
            return sgn(v1[field] - v2[field])
        }

        return compareNoCase(v1[field], v2[field])
    }
}

medalSorter = customRankings(["gold","silver","bronze"], "place")

raceResults = [
    {lane=1, name="Angus", place="silver"},
    {lane=2, name="Barbara", place="gold"},
    {lane=3, name="Colin", place="5th"},
    {lane=4, name="Ethel", place="4th"},
    {lane=5, name="Faisal", place="bronze"}
]

dump(raceResults.sort(medalSorter))

There are three sections to this:
  1. a function which returns a sort callback based on the rankings provided;
  2. creating a sorter for the given rankings;
  3. sorting data by the sorter.
Looking closer at the function:

function customRankings(required array options, required string field, boolean throwOnNoMatch=false){
    return function(v1,v2){
        var v1Place = options.findNoCase(v1[field])
        var v2Place = options.findNoCase(v2[field])

        if (v1Place && v2Place){
            return sgn(v1Place - v2Place)
        }
        if (throwOnNoMatch){
            throw(type="IllegalValueException", message="Incorrect value in sort field", detail="Sort field must contain only #options.toString()#")
        }

        if (v1Place){
            return -1
        }
        
        if (v2Place){
            return 1
        }

        if (isNumeric(v1[field]) && isNumeric(v2[field])) {
            return sgn(v1[field] - v2[field])
        }

        return compareNoCase(v1[field], v2[field])
    }
}

It takes up to three arguments:
  1. an array of options to sort on. The order in the array represents highest to lowest (eg: ["gold","silver","bronze"]).
  2. A field in each structure to use for sorting. This example only works on an array of structs, obviously.
  3. Optionally whether to allow any value in that field, or to throw an error if there's a different value.
 The logic then is fairly simple:
  • find the values in the array of options
  • if they're both found, sort on those two as per the normal sorting rules
  • if they're not both found, and if we're flagged to raise an error if this happens, do so
  • otherwise if either item contains one of the sort options, it is ranked higher than the other value
  • finally - so neither value is one of the options - either treat them numerically if possible, otherwise as strings, and sort on that.
Once we've done that, it's just a matter of calling that function to create a custom sorter, then using it.

We use closure in customRankings() to bind references to the options and field values when creating medalSorter(). Note that we only need to specify those when we call customRankings()... they're enclosed within medalSort() at that point, so medalSorter() "remembers" what the options and field were. Hence it's specifically a medalSorter() not simply some generic sorter. So it pretty much binds the input data and the code to process the data together in one. Also note that the function returned also takes two arguments: v1 and v2 which are what the callback passed to an array sort will receive.

Let's revisit the actual sort() call:

raceResults.sort(medalSorter)

sort() takes a function. Often we see the function being passed inline as a function expression, eg:

numbers.sort(function(v1,v2){
    return sgn(v2.digit - v1.digit)
})

But sort() doesn't take a function expression, per se, it just takes a function. A function expression returns a function, but we can use an already-existing function if we want. In fact we could also have done without creating the medalSorter() function at all, and simply do this:

raceResults.sort(customRankings(["gold","silver","bronze"], "place"))

Here the value passed to sort() is neither an existing function nor a function expression, but we call a function which returns a function. So sort() still receives the function it needs.

One thing that's not immediately clear with my example above is that customRankings() can be reused to provide more custom sorters. We can build on that code above:

raceResults = [
    {lane=1, name="Angus", place="silver"},
    {lane=2, name="Barbara", place="gold"},
    {lane=3, name="Colin", place="5th"},
    {lane=4, name="Ethel", place="4th"},
    {lane=5, name="Faisal", place="bronze"}
]
medalSorter = customRankings(["gold","silver","bronze"], "place")
dump(raceResults.sort(medalSorter))

rugbyRankings = [
    {country="Australia", ranking="3rd"},
    {country="England", ranking="4th"},
    {country="New Zealand", ranking="1st"},
    {country="South Africa", ranking="2nd"}
]
ordinalSorter = customRankings(["1st","2nd","3rd","4th"], "ranking")
dump(rugbyRankings.sort(ordinalSorter))

So here we're using customRankings() a second time to create another custom sorter. The code for customRankings() might be a bit long, but once it's there: using it is a simple one liner. Which is pretty cool, I reckon.

Oh, and the output for all this is:

Array
1
Struct
LANE
number2
NAME
stringBarbara
PLACE
stringgold
2
Struct
LANE
number1
NAME
stringAngus
PLACE
stringsilver
3
Struct
LANE
number5
NAME
stringFaisal
PLACE
stringbronze
4
Struct
LANE
number4
NAME
stringEthel
PLACE
string4th
5
Struct
LANE
number3
NAME
stringColin
PLACE
string5th


Array
1
Struct
COUNTRY
stringNew Zealand
RANKING
string1st
2
Struct
COUNTRY
stringSouth Africa
RANKING
string2nd
3
Struct
COUNTRY
stringAustralia
RANKING
string3rd
4
Struct
COUNTRY
stringEngland
RANKING
string4th

I hope this goes some way to better demonstrate how cool these iteration functions are. And - for a change - actually using closure with them too.

--
Adam