Saturday 28 September 2013

arrayEach() could stand some improvement

G'day:
I'm going to lobby to get Saturday renamed "Sidetrackday". I sat down to write up a bunch of stuff one Ruby, and so far have written some CFML, some JS, some PHP and some Ruby... but not the Ruby I was intending do. Oh well.

I've touched in the past on my opinion that ColdFusion 10's new arrayEach() function is rather poorly/superficially implemented (here too), but today decided to check out how other languages deal with the same thing, and try to conclude what's the best approach for CFML to take.

So what's the story with arrayEach()?

Digression:

Those docs, btw, are frickin' useless:
  • Where's the code example?
  • Where's mention of the arguments passed to the callback? How many? What are they?
  • See also "other closure functions". Should that be a hyperlink, or is it just meant to be a pointless exercise of stating the obvious?
  • Category "Closure functions". Again: no link. And there isn't a category elsewhere in the docs "Closure Functions".
  • The function argument doesn't need to be inline. It just needs to be a function reference.
There's more wrong about those docs that there is that is right. That's quite an achievement.

End of digression.

So in lieu of the docs explaining this, what does it do? Here's a baseline demonstration:

rainbow = ["Whero","Karaka","Kowhai","Kakariki","Kikorangi","Tawatawa","Mawhero"];

arrayEach(
    rainbow,
    function(v){
        writeOutput(v & "<br>");
    }
);

This outputs:

Whero
Karaka
Kowhai
Kakariki
Kikorangi
Tawatawa
Mawhero


Basically what arrayEach() does is loop over the passed-in array, and then passes the value of each element to the callback function. The callback function receives one argument, that value. That's all it receives. And it can return a value if it likes, but nothing is expecting the value, so for all intents and purposes, the callback's method signature is:

void function callback(Any value)

Superficially that's fine. However it's a very isolated-element-centric approach. What I mean is that as far as the callback goes, the value it receives - for all intents and purposes - is not an element of an array, it's simply a value. You cannot act on that value in its content in the array. What? Well let's say I want to transform the array's elements. A simple example would be to upper-case the elements of an array of strings. Maybe like this:

rainbow = ["Whero","Karaka","Kowhai","Kakariki","Kikorangi","Tawatawa","Mawhero"];

arrayEach(
    rainbow,
    function(v){
        return ucase(v);
    }
);

writeOutput(rainbow.toString());

Well... no. Remember how I said that nothing is listening for a return value? Well it's not. So you're returning your upper-cases element back to the ether. v is a disconnected value here: the callback doesn't get it as part of the array, it just gets it as a value.



Due to complex objects being passed by reference value, one can leverage this to inline-alter values in the array:

rainbow = [{colour="Whero"},{colour="Karaka"},{colour="Kowhai"},{colour="Kakariki"},{colour="Kikorangi"},{colour="Tawatawa"},{colour="Mawhero"}];

arrayeach(
    rainbow,
    function(v){
        v.colour = ucase(v.colour);
    }
);
writeDump(rainbow);

This outputs:

array
1
struct
COLOURWHERO
2
struct
COLOURKARAKA
3
struct
COLOURKOWHAI
4
struct
COLOURKAKARIKI
5
struct
COLOURKIKORANGI
6
struct
COLOURTAWATAWA
7
struct
COLOURMAWHERO

This is not a "feature" of arrayEach(), though. It's just a feature of structs being passed by reference values, not by actual value. So the ref passed to the callback refers to the same struct as the ref in the array does.

To modify the array's values from within the arrayEach() callback, one needs to "break encapsulation" and reference the calling code directly:

rainbow = ["Whero","Karaka","Kowhai","Kakariki","Kikorangi","Tawatawa","Mawhero"];
arrayeach(
    rainbow,
    function(v){
        param i=0;
        rainbow[++i] = ucase(v);
    }
);
writeOutput(rainbow.toString() & "<br>");
writeOutput(i); 

Result:
[WHERO, KARAKA, KOWHAI, KAKARIKI, KIKORANGI, TAWATAWA, MAWHERO]
7

But it's a bit grim that we need to break out into the calling code with the direct references to rainbow and i (NB: i is not VARed, and this technique would not work if it was).

So hopefully you can see the shortcomings here... I mean to say I think it's fairly common to want to update the array as one loops over it? It's not the only thing one wants to do with the construct, but it would be a fairly common use case that CFML have kinda closed the door on.

So how do other languages effect the same thing?

JavaScript

There's two relevant options with JavaScript we can look at (there are more ways to skin this particular cat, I know, but these are the most relevant / analogous to CFML's situation).

forEach()

JavaScript's forEach() is a direct analogy to the CFML version, except the callback was better-thought-out in its implementation. The callback is implemented thus:

function(value, index, array)

So in JS, one implements pretty much the same code as the CFML example immediately above, but one doesn't need to break out into the calling code as JS provides all the relevant information right to the callback itself:

rainbow = ["Whero","Karaka","Kowhai","Kakariki","Kikorangi","Tawatawa","Mawhero"];

rainbow.forEach(
    function(v,i,a){
        a[i] = v.toUpperCase();
    }
);
console.log(rainbow);

Yields the familiar:

["WHERO", "KARAKA", "KOWHAI", "KAKARIKI", "KIKORANGI", "TAWATAWA", "MAWHERO"]

So that's quite nice and all-encapsulated. As this is very close to how CFML has implemented this functionality, I think CFML's arrayEach() should be fixed to also pass these same arguments in. This should  be an easy fix, too.

map()

JavaScript also provides the increasingly-industry-standard for altering collections in-place; map().

rainbow = ["Whero","Karaka","Kowhai","Kakariki","Kikorangi","Tawatawa","Mawhero"];

RAINBOW = rainbow.map(
    function(v,i,a){
        return v.toUpperCase();
    }
);
console.log(RAINBOW);

This outputs the same-ole.

Note that the callback takes the same three arguments: value, index and array, but this time returns a value. This value is then used to populate a new array sequentially.

One thing to note here is that JavaScript has the gumption to understand that arrays can be sparse - have index positions with no elements in them - and the callback is not called in that situation:

rainbow = [];
rainbow[1] = "Whero"; // yeah yeah: JS arrays start at zero. Bite me.
// no [2]
rainbow[3] = "Kowhai";
// no [4]
rainbow[5] = "Kikorangi";
// no [6]
rainbow[7] = "Mawhero";

callCounter = 0;
rainbow.forEach(
    function(v,i,a){
        callCounter++;
        console.log("Called for [" + i + "]: " + v);
    }
);
console.log("callback was called: " + callCounter + " times");

This outputs:
Called for [1]: Whero
Called for [3]: Kowhai
Called for [5]: Kikorangi
Called for [7]: Mawhero
callback was called: 4 times


ColdFusion, as is typical, really doesn't deal with sparse arrays properly. Here's the CFML equivalent (using arrayEach(), as CFML has no map() function, but it's analogous):
rainbow = [];
rainbow[1] = "Whero";
// no [2]
rainbow[3] = "Kowhai";
// no [4]
rainbow[5] = "Kikorangi";
// no [6]
rainbow[7] = "Mawhero";

callCounter = 0;
arrayEach(
    rainbow,
    function(v){
        callCounter++;
        if (structKeyExists(arguments, "v")){
            writeOutput(v & "<br>");
        }
    }
);
writeOutput("callback was called: " & callCounter & " times");

Output:
Whero
Kowhai
Kikorangi
Mawhero
callback was called: 7 times


Note that if I didn't have the structkeyExists() in there the CFML would have errored as CF is calling the callback and passing a null value in as the v argument value. I can't help but think CF is missing a trick here, and it should behave like JavaScript does.

Ruby

Next I had a look at Ruby, which offers both the each() and map() approaches too:

each()

each() behaves the same as we'd expect:

rainbow = ["Whero","Karaka","Kowhai","Kakariki","Kikorangi","Tawatawa","Mawhero"]

rainbow.each do |v|
    v.replace(v.upcase)
end

puts rainbow

(I'll spare you the output from now on...)

The thing to note here is that whilst only the value is passed into each(), we can use replace() to replace the value of the variable. This means the reference stays the same, but the value the reference points to is changed. This is as opposed to reassigning the variable, which would break the connection between the value in the array, and its corresponding value in the callback.

Ruby also has a separate method, each_index(). We didn't need the index here, so I didn't bother with that, but here's an example using it:

rainbow.each_index do |i|
    rainbow[i] = rainbow[i].upcase
end

Here we only get the index of the array, so we break out and force the array element at that index to have an upper-cased version of itself.

Both of these approaches would make people in the Ruby community frown, I think.

map()

The preferred approach to changing an array in Ruby is to use map():

newRainbow = rainbow.map do |v|
    v.upcase
end

Remember that in Ruby that all statements are expressions, and the last expression in a block is returned from said block. So I do not need a return statement to return v.upcase. It's implicit.

This returns a new array. If for some reason one wanted to alter the array in place, use map!() instead:

rainbow.map! do |v|
    v.upcase
end

Here the map! expression doesn't get assigned to a new array, it updates rainbow itself.

Sparse arrays

How does Ruby handle sparse arrays? Seamlessly:

rainbow = []
rainbow[0] = "Whero"
rainbow[2] = "Kowhai"
rainbow[4] = "Kikorangi"
rainbow[6] = "Mawhero"

callCounter = 0
rainbow.each do |v|
        callCounter += 1
        puts("Called for #{v}")
end
puts("callback was called: #{callCounter} times")

Output:
Called for Whero
Called for
Called for Kowhai
Called for
Called for Kikorangi
Called for
Called for Mawhero
callback was called: 7 times


Note how this still calls the callback when the array element doesn't exist, but Ruby at least knows how to elegantly deal with null values.

PHP

PHP's probably closer in intent to CFML, and it's the only other language I know (kinda... but only a bit...) so I had a look at that too.

foreach()

PHP has a foreach() construct which is more like CFMLs for (element in array) construct, but is slightly more functional:

foreach ($rainbow as &$v) {
    $v = strtoupper($v);
}

Notice here we are specifically saying that $v is a reference, by using the & operator. This means we can set the value at &$v to have a new value, and this will intrinsically be reflected back in the array as well. CFML doesn't have this concept, although I have suggested perhaps it should.

PHP doesn't seem to have an each() construct which takes a callback, although it does have a map() implementation.

array_map()

PHP... stop with the underscores!! Bleah! So anyway, PHP has an array_map() function, thus:

$newRainbow = array_map(
    function($v) {
        return strtoupper($v);
    },
    $rainbow
);

array_map() also takes additional array arguments, and the callback will receive an element from each:

<?php
$maoriNumbers = ["Tahi","Rua","Toru","Wha","Rima","Ono","Whitu","Waru","Iwa","Tekau"];
$russianNumbers = ["????","???","???","??????","????","?????","????","??????","??????","??????"];
$digits = [1,2,3,4,5,6,7,8,9,10];

function translation($ma, $i, $ru){
    return "$ma ($i) in Russian is $ru<br>";
}

$translation = array_map("translation", $maoriNumbers, $digits, $russianNumbers);

foreach($translation as $number) {
    echo $number;
}
?>

And this results in:

Tahi (1) in Russian is ????
Rua (2) in Russian is ???
Toru (3) in Russian is ???
Wha (4) in Russian is ??????
Rima (5) in Russian is ????
Ono (6) in Russian is ?????
Whitu (7) in Russian is ????
Waru (8) in Russian is ??????
Iwa (9) in Russian is ??????
Tekau (10) in Russian is ??????


I think that's quite a good feature, actually! One thing I don't like here is how to pass an existing function as a callback, one seems to need to reference it by a string containing its name. That's weird.

Railo

Railo deserves a special mention here because as well as the arrayEach(), it also implements an each() method on the array class, so one can do this:

rainbow.each(
    function(v){
        writeOutput(v & "<br>");
    }
);

Sadly, one cannot do this, which I think one ought to be able to:

["Whero","Karaka","Kowhai","Kakariki","Kikorangi","Tawatawa","Mawhero"].each(
    function(v){
        writeOutput(v & "<br>");
    }
);

This just gives an error:

Railo 4.1.1.004 Error (template)
MessageMissing [;] or [line feed] after expression
StacktraceThe Error Occurred in
C:\webroots\railo-express-4.1.x-jre-win64\webapps\www\shared\git\blogExamples\arrays\arrayEachSux\cfml\railo.cfm: line 2 
1: <cfscript>
2: ["Whero","Karaka","Kowhai","Kakariki","Kikorangi","Tawatawa","Mawhero"].each(
3: function(v){
4: writeOutput(v & "<br>");

Conclusion

Well I think CFML should do three things here:

arrayEach()

This should be adjusted so that the callback receives the index and the array as well, just like JavaScript does.

arrayMap()

Implement this like JavaScript has, except also allow for the PHP idea of being able to map multiple arrays together in one hit: that'd be cool.

As methods

I think Railo is really onto something here with offering functions as methods as well. It's more in-keeping with the general OO way the CFML language has been going for over ten years now. Well: the language itself hasn't made any inroads into OO, but it facilitates OO in our dev code, with components and methods. It's time CFML itself started to catch up!

Ruby

Interestingly, I don't actually think Ruby brings anything interesting to the game here, if I'm honest. Well that replace() function is quite cool, but nothing to do with array iteration, per se.

And that's more than enough on that. Not the blog article I meant to write today: I was gonna put my own spin on Cutter's recent posting "What's Wrong With ColdFusion?"), but that can wait for another day or so now. And I was gonna do a Ruby course on CodeSchool, and continue writing the previous one up. Oh well.

Blimey. It's taken six hours to write this!!!

All the code herein is on github, btw.

--
Adam