Tuesday 28 November 2017

That array_map quandary implemented in other languages

G'day:
A coupla days ago I bleated about array_map [having] a dumb implementation. I had what I thought was an obvious application for array_map in PHP, but it couldn't really accommodate me due to array_map not exposing the array's keys to the callback, and then messing up the keys in the mapped array if one passes array_map more than one array to process.

I needed to remap this:

[
    "2008-11-08" => "Jacinda",
    "1990-10-27" => "Bill",
    "2014-09-20" => "James",
    "1979-05-24" => "Winston"
]

To this:

array(4) {
  '2008-11-08' =>
  class IndexedPerson#3 (2) {
    public $date =>
    string(10) "2008-11-08"
    public $name =>
    string(7) "Jacinda"
  }
  '1990-10-27' =>
  class IndexedPerson#4 (2) {
    public $date =>
    string(10) "1990-10-27"
    public $name =>
    string(4) "Bill"
  }
  '2014-09-20' =>
  class IndexedPerson#5 (2) {
    public $date =>
    string(10) "2014-09-20"
    public $name =>
    string(5) "James"
  }
  '1979-05-24' =>
  class IndexedPerson#6 (2) {
    public $date =>
    string(10) "1979-05-24"
    public $name =>
    string(7) "Winston"
  }
}

Note how the remapped object also contains the original key value. That was the sticking point. Go read the article for more detail and more whining.

OK so my expectations of PHP's array higher order functions are based  on  my experience with JS's and CFML's equivalents. Both of which receive the key as well as the value in all callbacks. I decided to see how other languages achieve the same end, and I'll pop the codee in here for shits 'n' giggles.


CFML

Given most of my history is as a CFML dev, that one was easy.

peopleData = ["2008-11-08" = "Jacinda", "1990-10-27" = "Bill", "2014-09-20" = "James", "1979-05-24" = "Winston"]

people = peopleData.map((date, name) => new IndexedPerson(date, name))

people.each((date, person) => echo("#date# => #person#<br>"))

Oh, this presupposes the IndexedPerson component. Due to a shortcoming of how CFML works, components must be declared in a file of their own:

component {

    function init(date, name) {
        this.date = date
        this.name = name
    }

    string function _toString() {
        return "{date:#this.date#; name: #this.name#}"
    }
}


But the key bit is the mapping operation:

people = peopleData.map((date, name) => new IndexedPerson(date, name))

Couldn't be simpler (NB: this is Lucee's CFML implementation, not ColdFusion's which does not yet support arrow functions).

The output is:


2008-11-08 => {date:2008-11-08; name: Jacinda}
1990-10-27 => {date:1990-10-27; name: Bill}
2014-09-20 => {date:2014-09-20; name: James}
1979-05-24 => {date:1979-05-24; name: Winston}

Also note that CFML doesn't have associative arrays, it has structs, so the keys are not ordered. This does not matter here. (Thanks to Zac for correcting me here: CFML does have ordered structs these days).


JS

The next language I turned to was JS as that's the I'm next most familiar with. One thing that hadn't occurred to me is that whilst JS's Array implementation has a map method, we need to use an object here as the keys are values not indexes. And whilst I knew Objects didn't have a map method, I didn't know what the equivalent might be.

Well it turns out that there's no real option to use a map here, so I needed to do a reduce on the object's entries, Still: it's pretty terse and obvious:

class IndexedPerson {
    constructor(date, name) {
        this.date = date
        this.name = name
    }
}

let peopleData = {"2008-11-08": "Jacinda", "1990-10-27": "Bill", "2014-09-20": "James", "1979-05-24": "Winston"}

let people = Object.entries(peopleData).reduce(function (people, personData) {
    people.set(personData[0], new IndexedPerson(personData[0], personData[1]))
    return people
}, new Map())

console.log(people)

This returns what we want:

Map {
  '2008-11-08' => IndexedPerson { date: '2008-11-08', name: 'Jacinda' },
  '1990-10-27' => IndexedPerson { date: '1990-10-27', name: 'Bill' },
  '2014-09-20' => IndexedPerson { date: '2014-09-20', name: 'James' },
  '1979-05-24' => IndexedPerson { date: '1979-05-24', name: 'Winston' } }

TBH I think this is a misuse of an object to contain basically an associative array / struct, but so be it. It's the closest analogy to the PHP requirement. I was able to at least return it as a Map, which I think is better. I tried to have the incoming personData as a map, but the Map prototype's equivalent of entries() used above is unhelpful in that it returns an Iterator, and the prototype for Iterator is a bit spartan.

I think it's slightly clumsy I need to access the entries value via array notation instead of some sort of name, but this is minor.

As with all my code, I welcome people showing me how I should actually be doing this. Post a comment. I'm looking at you Ryan Guill ;-)

Java

Next up was Java. Holy fuck what a morass of boilterplate nonsense I needed to perform this simple operation in Java. Deep breath...

import java.util.HashMap;
import java.util.Map;
import java.util.stream.Collectors;

class IndexedPerson {
    String date;
    String name;
    
    public IndexedPerson(String date, String name) {
        this.date = date;
        this.name = name;
    }
    
    public String toString(){
        return String.format("{date: %s, name: %s}", this.date, this.name);
    }
}

class Collect {

    public static void main(String[] args) {

        HashMap<String,String> peopleData = loadData();

        HashMap<String, IndexedPerson> people = mapToPeople(peopleData);
            
        dumpIdents(people);
    }
    
    private static HashMap<String,String> loadData(){
        HashMap<String,String> peopleData = new HashMap<String,String>();
        
        peopleData.put("2008-11-08", "Jacinda");
        peopleData.put("1990-10-27", "Bill");
        peopleData.put("2014-09-20", "James");
        peopleData.put("1979-05-24", "Winston");
        
        return peopleData;
    }
    
    private static HashMap<String,IndexedPerson> mapToPeople(HashMap<String,String> peopleData) {
        HashMap<String, IndexedPerson> people = (HashMap<String, IndexedPerson>) peopleData.entrySet().stream()
            .collect(Collectors.toMap(
                e -> e.getKey(),
                e -> new IndexedPerson(e.getKey(), e.getValue())
            ));
            
        return people;
    }
    
    private static void dumpIdents(HashMap<String,IndexedPerson> people) {
        for (Map.Entry<String, IndexedPerson> entry : people.entrySet()) {
            System.out.println(String.format("%s => %s", entry.getKey(), entry.getValue()));
        }
    }
    
}

Result:
1979-05-24 => {date: 1979-05-24, name: Winston}
2014-09-20 => {date: 2014-09-20, name: James}
1990-10-27 => {date: 1990-10-27, name: Bill}
2008-11-08 => {date: 2008-11-08, name: Jacinda}

Most of that lot seems to be just messing around telling Java what types everything are. Bleah.

The interesting bit - my grasp of which is tenuous - is the Collectors.toMap. I have to admit I derived that from reading various Stack Overflow articles. But I got it working, and I know the general approach now, so that's good.

Too much code for such a simple thing though, eh?


Groovy

Groovy is my antidote to Java. Groovy makes this shit easy:

class IndexedPerson {
    String date
    String name

    IndexedPerson(String date, String name) {
        this.date = date;
        this.name = name;
    }

    String toString(){
        String.format("date: %s, name: %s", this.date, this.name)
    }
}

peopleData = ["2008-11-08": "Jacinda", "1990-10-27": "Bill", "2014-09-20": "James", "1979-05-24": "Winston"]

people = peopleData.collectEntries {date, name -> [date, new IndexedPerson(date, name)]}

people.each {date, person -> println String.format("%s => {%s}", date, person)}

Bear in mind that most of that is getting the class defined, and the output. The bit that does the mapping is just the one line in the middle. That's more like it.

Again, I don't know much about Groovy… I had to RTFM to find out how to do the collectEntries bit, but it was easy to find and easy to understand.

I really wish I had a job doing Groovy.

Oh yeah, for the sake of completeness, the output was thus:

2008-11-08 => {date: 2008-11-08, name: Jacinda}
1990-10-27 => {date: 1990-10-27, name: Bill}
2014-09-20 => {date: 2014-09-20, name: James}
1979-05-24 => {date: 1979-05-24, name: Winston}


Ruby

Ruby's version was pretty simple too as it turns out. No surprise there as Ruby's all about higher order functions and applying blocks to collections and stuff like that.

class IndexedPerson

    def initialize(date, name)
        @date = date
        @name = name
    end

    def inspect
        "{date:#{@date}; name: #{@name}}\n"
    end
end

peopleData = {"2008-11-08" => "Jacinda", "1990-10-27" => "Bill", "2014-09-20" => "James", "1979-05-24" => "Winston"}

people = peopleData.merge(peopleData) do |date, name|
    IndexedPerson.new(date, name)
end

puts people

Predictable output:

{"2008-11-08"=>{date:2008-11-08; name: Jacinda}
, "1990-10-27"=>{date:1990-10-27; name: Bill}
, "2014-09-20"=>{date:2014-09-20; name: James}
, "1979-05-24"=>{date:1979-05-24; name: Winston}
}

I wasn't too sure about all that block nonsense when I first started looking at Ruby, but I quite like it now. It's easy to read.


Python

My Python skills don't extend much beyond printing G'day World on the screen, but it was surprisingly easy to google-up how to do this. And I finally got to see what Python folk are on about with this "comprehensions" stuff, which I think is quite cool.

class IndexedPerson:
    def __init__(self, date, name):
        self.date = date
        self.name = name

    def __repr__(self):
        return "{{date: {date}, name: {name}}}".format(date=self.date, name=self.name)

people_data = {"2008-11-08": "Jacinda", "1990-10-27": "Bill", "2014-09-20": "James", "1979-05-24": "Winston"}

people = {date: IndexedPerson(date, name) for (date, name) in people_data.items()}

print("\n".join(['%s => %s' % (date, person) for (date, person) in people.items()]))


And now that I am all about Clean Code, I kinda get the "whitespace as indentation" thing too. It's clear enough if yer code is clean in the first place.

The output of this is identical to the Groovy one.

Only one more then I'll stop.

Clojure

I can only barely do G'day World in Clojure, so this took me a while to work out. I also find the Clojure docs to be pretty impentrable. I'm sure they're great if one already knows what one is doing, but I found them pretty inaccessible from the perspective of a n00b. It's like if the PHP docs were solely the user-added stuff at the bottom of each docs page. Most blog articles I saw about Clojure were pretty much just direct regurgitation of the docs, without much value-add, if I'm to be honest.

(defrecord IndexedPerson [date name])

(def people-data (array-map "2008-11-08" "Jacinda" "1990-10-27" "Bill" "2014-09-20" "James" "1979-05-24" "Winston"))

(def people
  (reduce-kv
    (fn [people date name] (conj people (array-map date (IndexedPerson. date name))))
    (array-map)
    people-data))

(print people)

The other thing with Clojure for me is that the code is so alien-looking to me that I can't work out how to indent stuff to make the code clearer. All the examples I've seen don't seem very clear, and the indentation doesn't help either, I think. I guess with more practise it would come to me.

It seems pretty powerful though, cos there's mot much code there to achieve the desired end-goal.

Output for this one:

{2008-11-08 #user.IndexedPerson{:date 2008-11-08, :name Jacinda},
1990-10-27 #user.IndexedPerson{:date 1990-10-27, :name Bill},
2014-09-20 #user.IndexedPerson{:date 2014-09-20, :name James},
1979-05-24 #user.IndexedPerson{:date 1979-05-24, :name Winston}}


Summary

This was actually a very interesting exercise for me, and I learned stuff about all the languages concerned. Even PHP and CFML.

I twitterised a comment regarding how pleasing I found each solution:


This was before I did the Clojure one, and I'd slot that in afer CFML and before JS, making the list:
  1. Python
  2. Ruby
  3. Groovy
  4. CFML
  5. Clojure
  6. JS
  7. PHP
  8. Java

Python's code looks nice and it was easy to find out what to do. Same with Ruby, just not quite so much. And, really same with Groovy. I could order those three any way. I think Python tips the scales slightly with the comprehensions.

CFML came out suprisingly well in this, as it's a bloody easy exercise to achieve with it.

Clojure's fine, just a pain in the arse to understand what's going on, and the code looks a mess to me. But it does a lot in little space.

JS was disappointing because it wasn't nearly so easy as I expected it to be.

PHP is a mess.

And - fuck me - Java. Jesus.

My occasional reader Barry O'Sullivan volunteered some input the other day:


Hopefully he's still up for this, and I'll add it to the list so we can have a look at that code too.

Like I said before, if you know a better or more interesting way to do this in any of the languages above, or any other languages, make a comment and post a link to a Gist (just don't put the code inline in the comment please; it will not render at all well).

I might have another one of these exercises to do soon with another puzzle a friend of mine had to recently endure in a job-interview-related coding test. We'll see.

Righto.

--
Adam

Sunday 26 November 2017

PHP: array_map has a dumb implementation

G'day:
This cropped up a few weeks back, but I've had no motivation to write anything recently so I've sat on it for a bit. It came back to bite me again the other day and my motivation seems to be returning so here we go.

I was working on some code the other day wherein I needed to query some stuff from one DB, and then for the records returned from that, look for records in an entirely different data repository for some "additional data". The best I could do was to get the IDs from the first result and do like a WHERE IN (:ids) of the other data source to get records that were the additional data for some/all of the IDs. There was no one-to-one match between the two data sources, so the second set of results might not have records for all IDs from the first one. It was convenient for me to return the second result not as an indexed array, but as an associative array keyed on the ID. This is easy enough in PHP, and is quite handy:

$peopleData = $conn
    ->query('SELECT date, name FROM mapexampledata')
    ->fetchAll(PDO::FETCH_ASSOC | PDO::FETCH_COLUMN | PDO::FETCH_UNIQUE);

var_dump($peopleData);

Result:

[
    "2008-11-08" => "Jacinda",
    "1990-10-27" => "Bill",
    "2014-09-20" => "James",
    "1979-05-24" => "Winston"
]

Note how the array key is the date column from the query, and the value of the array is the other column (or all the subsequent columns, if there were more). I've just indexed by date here to make it more clear that the index key is not just an integer sequence.

From this data, I needed to remap that to be an array of PersonalMilestone objects, rather than an array of raw data. Note that the object has both date and name values:

class PersonalMilestone {
    public $date;
    public $name;

    function __construct($date, $name) {
        $this->date = $date;
        $this->name = $name;
    }
}

(If it's not obvious... this domain model and data is completely unrelated to the work I was actually doing, it's just something simple I concocted for this blog article. The general gist of the issue is preserved though).

"Easy", I thought; "that's just an array_map operation". Oh hohohoho, how funny I am. I forgot I was using PHP.

So I wrote my tests (OK, I did not write the tests for this example for the blog. "IRL" I did though. Always do TDD, remember?), and then went to write the array_map implementation:

$peopleData = ["2008-11-08" => "Jacinda", "1990-10-27" => "Bill", "2014-09-20" => "James", "1979-05-24" => "Winston"];

$people = array_map(function ($name) {
    return new PersonalMilestone(
        $date, // dammit this won't work: array_map only gives me the value, not the key
        $name
    );
}, $peopleData);

F*** it. This just beggars my belief as the other languages I have done the same operation on passes both key and value into the map call back. What does PHP do instead? Well it allows you to pass multiple arrays to the array_map function, and the callback receives the value from each of those arrays. I dunno why they chose to implement that as even additional behaviour, let along the default (and only) behaviour. Sigh.

But I thought I could leverage this! I could pass the keys of my array in as a second array, and then I'll have both the values I need to create the PersonalMilestone object. Hurrah!

$peopleData = ["2008-11-08" => "Jacinda", "1990-10-27" => "Bill", "2014-09-20" => "James", "1979-05-24" => "Winston"];

$keys = array_keys($peopleData);

$people = array_map(function ($name, $date) {
    return new PersonalMilestone($date, $name);
}, $peopleData, $keys);

var_dump($people);

Result:

array(4) {
  [0]=>
  object(PersonalMilestone)#2 (2) {
    ["date"]=>
    string(10) "2008-11-08"
    ["name"]=>
    string(7) "Jacinda"
  }
  [1]=>
  object(PersonalMilestone)#3 (2) {
    ["date"]=>
    string(10) "1990-10-27"
    ["name"]=>
    string(4) "Bill"
  }
  [2]=>
  object(PersonalMilestone)#4 (2) {
    ["date"]=>
    string(10) "2014-09-20"
    ["name"]=>
    string(5) "James"
  }
  [3]=>
  object(PersonalMilestone)#5 (2) {
    ["date"]=>
    string(10) "1979-05-24"
    ["name"]=>
    string(7) "Winston"
  }
}

Looks OK right? Look again. The array has been changed into an indexed array. So this isn't a remapping, this is just "making a different array". F***'s sake, PHP.

Initially I thought this was because the two arrays I was remapping had different keys, and thought "aah... yeah... if I was daft I could see how I might implement the thing like this", so I changed the second array to be keyed by its own value (whilst shaking my head in disbelief).

$peopleData = ["2008-11-08" => "Jacinda", "1990-10-27" => "Bill", "2014-09-20" => "James", "1979-05-24" => "Winston"];
var_dump($peopleData);

$keys = array_keys($peopleData);
$keysIndexedByKeys = array_combine($keys, $keys);
var_dump($keysIndexedByKeys);

$people = array_map(function ($name, $date) {
    return new PersonalMilestone($date, $name);
}, $peopleData, $keys);

var_dump($people);

(and, yes, this is a lot of horsing around now, even had it worked which (spoiler) it didn't).

Result:
array(4) {
  ["2008-11-08"]=>
  string(7) "Jacinda"
  ["1990-10-27"]=>
  string(4) "Bill"
  ["2014-09-20"]=>
  string(5) "James"
  ["1979-05-24"]=>
  string(7) "Winston"
}
array(4) {
  ["2008-11-08"]=>
  string(10) "2008-11-08"
  ["1990-10-27"]=>
  string(10) "1990-10-27"
  ["2014-09-20"]=>
  string(10) "2014-09-20"
  ["1979-05-24"]=>
  string(10) "1979-05-24"
}
array(4) {
  [0]=>
  object(PersonalMilestone)#2 (2) {
    ["date"]=>
    string(10) "2008-11-08"
    ["name"]=>
    string(7) "Jacinda"
  }
  [1]=>
  object(PersonalMilestone)#3 (2) {
    ["date"]=>
    string(10) "1990-10-27"
    ["name"]=>
    string(4) "Bill"
  }
  [2]=>
  object(PersonalMilestone)#4 (2) {
    ["date"]=>
    string(10) "2014-09-20"
    ["name"]=>
    string(5) "James"
  }
  [3]=>
  object(PersonalMilestone)#5 (2) {
    ["date"]=>
    string(10) "1979-05-24"
    ["name"]=>
    string(7) "Winston"
  }
}

So the first array and the second array now have the same keys, but... the remapped array still doesn't. This is where we get to the subject line of this article: array_map has a dumb implementation.

I gave up on array_map. I really don't think it's fit for purpose beyond the most obviously simple applications. I just used a foreach loop instead:

$people = [];
foreach ($peopleData as $date => $name) {
    $people[$date] = new PersonalMilestone($date, $name);
}

var_dump($people);

And this works:

array(4) {
  ["2008-11-08"]=>
  object(PersonalMilestone)#1 (2) {
    ["date"]=>
    string(10) "2008-11-08"
    ["name"]=>
    string(7) "Jacinda"
  }
  ["1990-10-27"]=>
  object(PersonalMilestone)#2 (2) {
    ["date"]=>
    string(10) "1990-10-27"
    ["name"]=>
    string(4) "Bill"
  }
  ["2014-09-20"]=>
  object(PersonalMilestone)#3 (2) {
    ["date"]=>
    string(10) "2014-09-20"
    ["name"]=>
    string(5) "James"
  }
  ["1979-05-24"]=>
  object(PersonalMilestone)#4 (2) {
    ["date"]=>
    string(10) "1979-05-24"
    ["name"]=>
    string(7) "Winston"
  }
}

This is simple enough code, but I like having my code more descriptive if possible, so if I'm iterating over an array because I want to remap its values, then I prefer to use a mapping call, not a generic loop. But we need to be pragmatic about these things.

Oh... for the sake of completeness I did work out how to do it "properly" (where "properly" == "passes my tests" not "is good code"):

$keys = new ArrayIterator(array_keys($peopleData));

$people = array_map(function ($name) use ($keys) {
    $date = $keys->current();
    $keys->next();
    return new PersonalMilestone($date, $name);
}, $peopleData);

(I'll spare you the output... you get the idea by now).

What I've done here is to pass a separate iterator into the callback, and I manually iterate over that each iteration of the callback loop. This gives me the value for the date key for each element of the source array. But that's just shoehorning square peg into a round hole, so I would note that down as a curio, but not actually advocate its use.

I was left with the feeling "PHP, you're f***in' rubbish" (a feeling I get increasingly the more I use it), but I thought just cos I know other languages which do this in a more clear fashion doesn't mean that they're right and PHP is (unhelpfully) wrong. I decided to have a look at how a few other languages tackle the same issue, to see if PHP really does suck in this regard. However that will be the topic for a follow-up article, as this one is already long enough.

Righto.

--
Adam

Wednesday 22 November 2017

PHP: misuse of nullable arguments

G'day:
PHP 7.1 added nullable types to the PHP language. Here's the RFC. In short, one can specify that a return type or an argument type will accept null instead of an object (or primitive) of the specified type. This code demonstrates:

function f(int $x) : int {
    return $x;
}

try {
    $x = null;
    $y = f($x);
} catch (\TypeError $e) {
    echo $e->getMessage();
}

echo PHP_EOL . PHP_EOL;

function g(?int $x) : ?int {
    return $x;
}

$x = null;
$y = g($x);
var_dump($y);

Ouptut:

Argument 1 passed to f() must be of the type integer, null given, called in [...][...] on line 8

NULL

Why do we want this? Well if we look at a statically-typed language, it makes a bit of sense:

 public class NullableTest {
     
     public static void main(String[] args){
         Integer x = null;
         Integer y = f(x);
         System.out.println(y);
     }
          
     private static Integer f(Integer x){
         return x;
     }
 }

So x is an Integer... it's just not been given a value yet, but it'll pass the type-check on the method signature. (This just outputs null, btw).

I think the merits of this functionality in PHP is limited when it comes to an argument type. If we had that equivalent code in PHP, it's not that x is an int that just happens to not have been given a value; it's a null. Not the same. It's a trivial but key difference I think.

Semantics aside there are occasional times where it's handy and - more importantly - appropriate. Consider the setMethods method of PHPUnit's MockBuilder class. The way this works is that one can pass one of three values to the method:

  • null - no methods will be mocked-out;
  • an empty array - all methods will be mocked-out;
  • an array of method names - just those methods will be mocked-out.

[docs][implementation]

Here the null has meaning, and it's distinct in meaning from an empty array. Even then this only makes "sense" because for convenience the meaning of an empty array is inversed when compared to a partial array: a partial array means "only these ones", an empty array following that logic would mean "only these ones, which is none of them", but it has been given a special meaning (and IMO "special meaning" is something one ought to be hesitant about assigning to a value). Still: it's handy.

Had I been implementing PHPUnit, I probably would not have supported null there. I'd've perhaps had two methods: mockMethods(array subsetToMock) and mockAllMethods(), and if one doesn't specify either of those, then no methods are mocked. No values with special meaning, and clearly-named methods. Shrug. PHPUnit rocks, so this is not an indictment of it.


On the other hand, most of the usage I have seen of nullable arguments is this sort of thing:

function f(?int $x = 0){
    if (is_null($x)){
        $x = 0;
    }
    // rest of function, which requires $x to be an int
}

This is a misuse of the nullable functionality. This function actually requires the argument to be an integer. Null is not a meaningful value to it. So the function simply should not accept a null. It's completely legit for a function to specify what it needs, and require the calling code to actually respect that contract.

I think in the cases I have seen this being done that the dev concerned is also working on the calling code, and they "know" the calling code could quite likely have a null for the value they need to pass to this function, but instead of dealing with that in the calling code, they push the responsibility into the method. This ain't the way to do these things. A method shouldn't have to worry about the vagaries of the situation it's being used in. It should specify what it needs from the calling code to do its job, and that's it.

Another case I've seen is that the dev concerned didn't quite get the difference between "nullable" and having an argument default. They're two different things, and they're conflating the two. When an argument is nullable, one still actually needs to pass a value to it, null or otherwise:

function f(?int $x) : int {
    return $x;
}

try {
    $y = f();
} catch (\TypeError $e) {
    echo $e->getMessage();
}

Output:

Too few arguments to function f(), 0 passed in [...][...] on line 7 and exactly 1 expected

All "nullable" means is it can be null.

I really think the occasions where a nullable argument is legit is pretty rare, so if yer writing code or reviewing someone else's code and come across one being used... stop and think about what's going on, and whether the function is taking on a responsibility the calling code oughta be looking after.

Righto.

--
Adam

Saturday 19 August 2017

Mishmash: code review as a learning exercise, loops vs higher-order-functions and testing

G'day:
There's a few things going on in my head here, and none individually are worthy of an article, but perhaps if I dwell on a bunch of stuff something might shake out. I have no idea as I have not drafted this one in my head like I usually do, I'm just "let's start typing and see if I get anywhere". Eek.

OK so yesterday I was code reviewing one of my mates' work, and breezed past this code (this is not the exact code, but it's a replication of it):

public function getSomeThings($rowsToGet)
{
    $thingsValues = $this->dao->getThingsFromStorage($rowsToGet);
    
    $things = [];
    foreach ($thingsValues as $thingValues) {
        $things[] = $this->thingFactory->createFromRawData($thingValues);
    }

    return $things;
}

I just put a note in the pull request "array_map?".

Now I do a few things in a code review. Firstly I check for obvious flaws, unclean code, bad test coverage, typos, etc, and I will raise those as "tasks" in BitBucket: stuff that needs to be addressed before I'll press the "Approved" button. This is our standard approach to code review, everyone expects it, and it's fine. It works.

But secondly I'll just make observations about the code, ask questions, suggest alternative approaches we could have taken, etc. Just as food for thought, a discussion point, or a chance for some learning. This was the latter. I was not suggesting to my mate they needed to change their code to match my whim... the code is fine as it is. It was just "did you think of this?".

An interesting conversation ensued, which I will paraphrase here. Note that Fred and I do not work in the same physical office.

Fred (his name's not Fred): I'm not sure the benefit of swapping simple code for a more complex solution, unless there's some performance gain I'm unaware of.

Adam (my name is Adam): Hmmm... well array_map is one expression, and it reduces the complexity here to a single variable assignment statement(*). The loop version requires building the array by hand, and requires two assignments and the whole loop thing too. So I think the array_map approach is less complex innit?

return array_map(function ($thingValues) {
    return $this->thingFactory->createFromRawData($thingValues);
}, $thingsValues);

(*) OK it's two statements. Never let it be said I don't hyperbolise to better make my point.

One thing I did not get around to saying is that performance is not a consideration here. One of the calls is hitting an external DB, so any processing time differences are going to be eclipsed - by orders of magnitude - by the external call. And, yes, I know that if we'd be inclined to worry about micro-optimisations, then calling a higher-order function is slower than a loop. One should simply not care.

Adam (cont'ed): further more the array_map version only requires a single test, whereas the loop version needs three. That in itself demonstrates the increased complexity.

Fred: err... why's that then?

I'll break away from the conversation here.

The dev process here is that we identify the need to have a function to get stuff from the DB, and then model that and return it to the calling code. So doing TDD we write a test which uses a mocked-out DAO to return known data, and we test that the values in the resultant models are what we expected them to be based on the mocked DAO data.

Here's a very simplified, self-contained example:

First our test:

describe('comparing testing for foreach vs array_map', function() {
    given('sut', function () {
        return new ForeachVsArrayMap();
    });
    given('allObjects', function () {
        return [
            (object) ['id' => 1, 'mi' => 'tahi'],
            (object) ['id' => 2, 'mi' => 'rua'],
            (object) ['id' => 3, 'mi' => 'toru'],
        ];
    });
    describe('tests of foreach', function() {
        it('returns the numbers as objects', function () {
            $result = $this->sut->getObjects();
            expect($result)->toEqual($this->allObjects);
        });
    });
});

And now our implementation:

class ForeachVsArrayMap
{
    private $allData;

    public function __construct()
    {
        $this->allData = [
            ['id' => 1, 'mi' => 'tahi'],
            ['id' => 2, 'mi' => 'rua'],
            ['id' => 3, 'mi' => 'toru']
        ];
    }

    private function getNumbers($rows)
    {
        return array_slice($this->allData, 0, $rows);
    }

    public function getObjects($rows = 3)
    {
        $data = $this->getNumbers($rows);
        $numbers = [];
        foreach ($data as $row) {
            $numbers[] = (object) $row;
        }

        return $numbers;
    }
}

And that passes. That's our TDD side of things done.

However we also now have to do edge testing as well. We've got a loop in there, and all loops need tests for three things: zero iterations, one iteration, and more than one iterations. There are easily introduced logic errors around the plurality of loop iteration.

Here our TDD test has coverd the "more than one iterations" case, so we only need the other two:

it('handles one result', function () {
    $result = $this->sut->getObjectsUsingForeach(1);
    $expected = [$this->allObjects[0]];

    expect($result)->toEqual($expected);
});

it('handles zero results', function () {
    $result = $this->sut->getObjectsUsingForeach(0);
    $expected = [];

    expect($result)->toEqual($expected);
});

Pretty simple tests, but still: two additional tests.

Why don't we need these two tests when using array_map? Because array_map's iteration is all handled within its implementation. It's PHP's job to test that, not ours. So all we need to test is that the inline callback works. And we only need to test that once: when it's called, it does what we expect it to do. We can safely stipulate that array_map will always return an array with the same number of elements as the input value. PHP deals with that; we don't need to.

One good thing about TDD (and test coverage in general) is now we can safely swap in the array_map version, and the test coverage still passes, so we know - with a high degree of certainty - that our code is A-OK.

public function getObjects($rows = 3)
{
    return array_map(function ($row) {
        return (object) $row;
    }, $this->getNumbers($rows));
}

Coincidentally I had another conversation, in another code review, on Friday about the necessity of always testing loops for 0, 1, >1 iterations. "One can just look at the code and know it's fine". This statement didn't come up in the conversation, but I've heard it before.

Three weeks ago I was show-stopped by a bug in one of our libraries. Taking the analogy above, this was the code:

public function getObjectsUsingForeach($rows = 3)
{
    $data = $this->getNumbers($rows);

    foreach ($data as $row) {
        $numbers[] = (object) $row;
    }

    // do something else with $numbers here
    // ...
}

The code had no tests. Can you see what's wrong? The general expectation when calling a function to get data is that there's actually data to get, given the provided criteria. In the case I fell foul of, the value of $rows was zero. And this was legit. So if $rows is 0, what value does $numbers have after the loop? Well: $numbers was undefined.

Had the dev done their testing properly, then they'd've noticed their bug. They never initialised $numbers outside of the code in the loop, so there's a chance it might never get defined. In this case not only did the dev not do the edge-case testing, they didn't even do the baseline TDD testing either, but that's a different story.

I needed to down-tools and fix the library before I could continue my own work. And it was made very tricky by the fact the code in the library had no tests at all, and wasn't written in a clean-code fashion (so was basically untestable). But we are not allowed to work that way any more, so I needed to do a bunch of work the original developer ought to have done.

I have also fell foul of this brainfart too:

public function getObjectsUsingForeach($rows = 3)
{
    $data = $this->getNumbers($rows);

    foreach ($data as $row) {
        $numbers = [];
        $numbers[] = (object) $row;
    }

    // do something else with $numbers here
    // ...
}

This usually crops up when code that was previously used once suddenly needs to be used for multiple iterations. Note that the initialisation of $numbers is inisde the loop, so it will re-initialise every iteration. This would be caught by the "more than one iterations" test. This is rarer, and usually only crops up in completely untested code: it'd be a shit tester who tested a loop with only one iteration as the test, but I've seen it done. Saves time not having to mock so much data, you see? Berks.

So, anyhow... always test yer loops. And try to avoid loops if there are built-in functions that handle the iteration for you.

Also always do TDD, but also understand that there is more testing to do after the TDD cycle. TDD is not the end of the story there.



In the title of this article I mention "code review as a learning exercise". I greatly summarised the back and forth between Fred and myself here... there was an order of magnitude more discussion than there was code under discussion. this sort of interaction is gold. Some people get the idea that code review is kinda like a teacher marking homework. It's partly that, but that's only part. As programmers and coders our job is one of communication: communication of instructions to a computer, both to the computer itself, but also to our future colleagues and selves about the nature of the instructions. Programming languages are a means to communicate instructions to other humans. It's tangential the computer can also be made to understand them. Similarly reviewing each other's code should be a communications and collaboration process, and we absolutely should be having these discussions. Both Fred and I benefited from this conversation: a lot of what I typed here wasn't formalised in my brain before explaning it, but now I have a much clearly picture of what I do and why. And Fred knows too. This was not a waste of time, as we're both now better developers for it. And it might have seemed like a lot of back and forth in the code review, but it really only represents about 10min of discussion. And all this came from just a conversation point, not me saying "change the code".

If we were in the same office, we might have had the conversation via voice instead of typing in the code review, and some people might think that's better. In a way it is, but it also means that the details of the conversation don't get recorded, so benefit fewer people. That said, when programmers are not agreeing on things, often it is better to switch to voice instead of continue the discussion via text-based back-and-forth. It's just easier and more collaborative sometimes.

Right, that's that. A bit random, but didn't turn out so bad I think. And I seem to have found a point to make.

Righto.

--
Adam

Monday 31 July 2017

Yeah, you do want 100% test coverage

G'day:
I've read a number of blog articles recently that have been eshewing 100% test coverage. The rationale being that there's little difference between 90% and 100%, and that that last 10% can be quite tricky to attain as - we being human 'n' all - sometimes leave the worst to last. Obviously we should all be doing TDD so that situation would not arise, but - again - we're human so don't always do TDD for one reason or another ("can't be arsed" is the usual one).

Another rationale is that it's not a race to 100%. IE: the object of the exercise is delivering working software which adds business value; it's not "getting 100% test coverage".

The latter is true. But we still aim for 100% test coverage for basically one good reason: visibility.

First, look at this:



Not bad I guess. Not great.

Now look at this:



We've accidentally forgotten to cover something new. Can you see it (it's one of the provider methods).

On the other hand look at these two:



Somewhat easier to spot, yeah?

This is bloody handy. In this case I just commented-out one of my tests to get the report off 100%, but in the real world I'd see that and go "huh, WTF?", and perhaps discover I had a logic branch untested, or a test I thought was covering something wasn't actually doing so. Or I just forgot to put the @covers annotation on the test. Whatever the situation: it's handy for the shortfall to be so obvious.

What it also means is part of our pull-request process is running a Bamboo build which runs all our tests (plus some other code quality tools):

 Everything's green here, so it's OK for code review. If it was failing, I'd get a nudge from my reviewers going "erm... you sure this is ready to go?" And even if I thought it was, we can't merge anything whilst the build is failing anyhow. The issue needs to be addressed.

Now the key thing here is this:


We don't test everything, but we cover everything. Where "cover" might be "we don't need to test this". In this case a single variable assignment does not need testing. We trust PHP to follow simple instructions. The thing is: this still shows up as green in the test coverage report.

We do this in a few situations:

  • the method has not branching logic and is overwise pretty simple. Ignore it.
  • we're backfilling coverage on old code we need to maintain, but it's such a mess we simply don't have time to solve all the testing shortfalls right now, so we test what is urgent (the bit we're changing), but just mark as ignore the rest of it. Now that file is green, and we can see if it stops being green.
  • similar to the above but we know the code the code in question is  not long for this world. So we ignore it and stick a TODO cross referencing to the ticket that will be dealing with it. And the ticket must already be lined-up to be worked on "soon".

There might be other situations where we need to be pragmatic and skip some stuff, but this needs to be explained to the code reviewers, and get their agreement.

But being able to run a report that says "100%", and shows up anything that isn't 100% is gold.

Get yer coverage report green, but don't be afraid to be pragmatic and skip stuff, if there's an informed reason to do so.

Now this isn't to say one can just pepper ignore annotations around the place just to get the report green. That's daft. The object of the exercise is not to make some report green / say 100%. It's to get good code quality. This is just a tool to help identify the quality: it's still up to the devs to provide the quality. Don't forget that. It needs to be an informed, honest decision for one to ignore code.

Righto.

--
Adam

Sunday 23 July 2017

PHP: A function that returns two different things

G'day:
I'm gonna plagiarise one of me own answers to a Stack Overflow question here, as I think it's good generic advice.

The question was "PHP 7: Multiple function return types". The subject line sums it up really. The person here wants to return a value or false from a function. I see this a lot in PHP code. Indeed... PHP itself does it a lot.

My answer was as follows:

[...]

From a design perspective, having a function that potentially returns different types of result indicates a potential design flaw:

  • If you're returning your result, or otherwise false if something didn't go to plan; you should probably be throwing an exception instead. If the function processing didn't go according to plan; that's an exceptional situation: leverage the fact. I know PHP itself has a habit of returning false if things didn't work, but that's just indicative of poor design - for the same reason - in PHP.
  • If your function returns potentially different things, then it's quite possible it's doing more than one thing, which is bad design in your function. Functions should do one thing. If your function has an if/else with the true/false block handling different chunks of processing (as opposed to just handling exit situations), then this probably indicates you ought to have two functions, not one. Leave it to the calling code to decide which is the one to use.
  • If your function returns two different object types which then can be used in a similar fashion in the calling code (ie: there's no if this: do that; else do this other thing in the calling code), this could indicate you ought to be returning an interface, not a concrete implementation.
There will possibly be legit situations where returning different types is actually the best thing to do. If so: all good. But that's where the benefit of using a loosely typed language comes in: just don't specify the return type of the function. But… that said… the situation probably isn't legit, so work through the preceding design considerations first to determine if you really do have one of these real edge cases where returning different types is warranted.

I'll add two more considerations here.

There's another option which is kinda legit. In PHP 7.1 one can declare the return type to be nullable.

Consider a function which potentially returns a boolean:

function f($x) {
    if ($x) {
        return true;
    }
}

We can't declare that with a return type of bool as if our logic strays done that falsey branch, we get a fatal error:

function f($x) : bool {
    if ($x) {
        return true;
    }
}

f(1); // OK

f(0); // PHP Fatal error:  Uncaught TypeError: Return value of f() must be of the type boolean, none returned


But in 7.1, one can declare the return type as nullable:

function g($x) : ?bool {
    if ($x) {
        echo "returning true";
        return true;
    }

    echo "returning null";
    return null;
}

g(1); // returning true
g(0); // returning null

Now sometimes this is legit. One might be looking for something from some other resource (eg: a DB record) which quite legitimately might not exist. And there might not be a legit "null implementation" of the given class one is wanting to return: for example returning the object with some/all of its properties not set or something. In this case a null is OK. But remember in this case one is forcing more logic back on the calling code: checking for the null. So not necessarily a good approach. I'd consider whether this situation is better handled with an exception.

The last consideration is that one can still specify multiple return types in a type-hint annotation, eg:

/**
 * @param bool $b
 * @return A|B
 */
function f($b){
    if ($b) {
        return new A();
    }else {
        return new B();
    }
}


This doesn't actually enforce anything, so it's mostly a lie, just like all comments are, but at least one's IDE might be able to make sense of it, and give code assistance and autocomplete suggestions when writing one's code:



See how it offers both the method from an A and a B there.

Still: I really dislike these annotations as they're really just comments, and especially in this case they're lying: f can actually return anything it likes.

Right... must dash. I am supposed to be down @ my ex's place hanging out with my son in 10min. I'll press "send" and proof-read this later ;-)

Righto.

--
Adam

Saturday 8 July 2017

@dac_dev Twitter account

G'day:
Yes, this thing is still alive.

This morning I indicated on Twitter I was discontinuing my usage of the @dac_dev Twitter account. I'll just be using my @adam_cameron one from now on.

@NXRK made this observation just now:


Just to be clear: my twitter account is the @adam_cameron one. The @dac_dev one was created specifically for promoting this blog, and the "brand" that was my existence in the dev world, such as it is. Specifically the CFML dev world.

I don't want to promote that brand any more, or at least not such that I want to keep it separate from my "real" self. Or more that there's simply not enough usage of that account to warrant me having to keep it separate from my own account.

I won't be deleting the @dac_dev one, and I have set it to send me emails if anyone hits me up there, but I am not paying attention to it any more, nor do I intend to post anything to it.

As with anyone's account on Twitter, you are free to keep an eye on my mumblings on the @adam_cameron one. I'm less polite there (if that can be believed...), and it's definitely more "general" commentary. Rugby, cricket, specious social commentary, naive political observations and photos of my meals. Well: not the latter. I'm not a complete twat(*).

In a closely-related moved, I've chucked-in my participation on the CFML Slack channel too. I was not getting anything positive out of my membership there, and she shitness of some elements of the CFML community was doing my head in (if yer reading this, I'm almost certainly not including you in that shit demographic).

Righto.

--
Adam

(*) based on just that metric, I mean.

Saturday 13 May 2017

PHPUnit: using a mocked-method call log to track its activity

G'day:
This is just a quick PHPUnit one, inspired by a trick my team lead suggested to me. I think it's quite handy and works around what seems to be a shortcoming of how PHPUnit mocks stuff. I'm sure there's strong opinions held on this sort of thing somewhere as to how I'm not supposed to want to do what my requirement has me needing to do here. I do so look forward to being told about it.

Anyway, I'm writing a script to repair some bung data we have. The details are irrelevant, but it's fairly "important" data (well: not to me... but to someone, apparently...), and I want to make sure we can track when things go wrong. And they will go wrong. Firstly I'm talking to an external third-party service, and it's foolhardy to trust those; secondly I'm relying on some of our own internal services which... ah... well yeah they were written in a different era by people with an unusual understanding of how to write managable, testable, and reliable code. And in the course of doing this work I turned-up a coupla bugs and had to fix those, so I don't back our own code to be stable here. I don't mean that judgementally: all code bases have better and not so good code in them. Anyway...

The upshot of this is the script logs a bunch of stuff - said script will be run by a cron job - and part of my testing is to verify the right things gets logged at the right time. A lot of it is fairly perfunctory "belt and braces" sort of logging, say "Records 1000-1999 processed OK", and I don't really care about that; but there's other stuff where the script deals with unexpected circumstances and I want to make sure that's done right.

For the purposes of this test I am mocking out all the dependencies so the script doesn't actually do anything, but the mocks return "workable" mocked data sufficient to walk through the stages of the script. And the logger is one of these dependencies, and is also mocked.

The test I was stuck on was one where an unhandled exception was thrown, and the script basically shuts itself down; having first logged what went wrong.

Here's a very pared-back version of the script:

public function myMethod($times){
    try {
        $this->logger->logMessage("Starting off");
        for ($i=1; $i <= $times; $i++) {
            $this->logger->logMessage("Starting processing iteration $i");
            $this->helper->doThing();
            $this->logger->logMessage("Finished processing iteration $i");
        }
        $this->logger->logMessage("Finishing off");
    }
    catch (\Exception $e) {
        $this->logger->logMessage(sprintf("Something went wrong: %s", $e->getMessage()));
        throw $e;
    }
    finally {
        $this->logger->logMessage("All done");
    }
}

Here I've just got the one doThing dependency call in the loop, but it makes the point. And around that, and before and after the the loop I log a line to track progress. And I log a line if there's an unhandled exception. As I said the real script is somewhat more complex than this, but this is an adequate analogy.

I have another test which tests the ongoing logging of a successful run with 0, 1, >1 records, but this test is to make sure the script a) reports the exception; b) still shuts down OK.

PHPUnit's approach to checking a specific argument was passed to a particular mock call is to use the at matcher method when setting an expectation. My first pass at the test used this matcher:

function testMyMethodUsingAt()
{
    $exceptionMessage = "EXPECT_THIS";
    $this->expectException(\RuntimeException::class);
    $this->expectExceptionMessage($exceptionMessage);

    $testIterations = 6;
    $errorAt = $testIterations / 2;

    $this->helper
        ->expects($this->at($errorAt))
        ->method("doThing")
        ->willThrowException(new \RuntimeException($exceptionMessage));

    $this->logger
         ->expects($this->at(7))
         ->method("logMessage")
         ->with("Starting processing iteration 4");
    $this->logger
         ->expects($this->at(8))
         ->method("logMessage")
         ->with("Something went wrong: $exceptionMessage");
    $this->logger
         ->expects($this->at(9))
         ->method("logMessage")
         ->with("All done");

    $this->sut->myMethod($testIterations);
}

Here I worked out the the call history for info would be nine entries long, and the entries I wanted to confirm the exception-tracking logic was correct was the 7th, 8th and 9th ones.

But this is problematic for a number of ways:
  • it's not immediately clear what hat I pulled 7th, 8th and 9th from;
  • it's fragile because it relies on there being precisely seven earlier log entries for it to work. EG: if we were to add another log line into the method, that would break this test, which is rubbish;
  • semantically it's incorrect, as it's not the 7th, 8th and 9th entries I'm interested in; it's the last three. It'd be better to code it accordingly so the code was more readable to maintainers, later on.
  • Lastly the at approach to things in PHPUnit is just a bit crap because the index is based on the mock object, not the mock method, despite almost all use-cases I've had for this sort of thing have been wanting to check sequential calls to the mocked method, not just to any method in the mocked object. So I try to steer clear of at at the best of times.
Other mocking frameworks I've used expose the call-log for a mocked method, but I cannot find any way to get it from PHPUnit. I can see where the data is stored, but to get it out would be deeply "Heath Robinson", and would be even worse than the approach above.

I was stumped.

Then me boss said "just use a callback in the will expectation, and build the log yerself". Initially I just looked at him like he was a loony (he's used to this), but then it dawned on me what he was suggesting. And it was easy, and not altogether awful, either:

function testMyMethodUsingCallLog()
{
    $exceptionMessage = "EXPECT_THIS";
    $this->expectException(\RuntimeException::class);
    $this->expectExceptionMessage($exceptionMessage);

    $testIterations = 6;
    $errorAt = $testIterations / 2;

    $this->helper
        ->expects($this->at($errorAt))
        ->method("doThing")
        ->willThrowException(new \RuntimeException($exceptionMessage));

    $callLog = [];
    $this->logger
        ->method("logMessage")
        ->will($this->returnCallback(function ($message) use (&$callLog) {
            $callLog[] = $message;
        }));

    try {
        $this->sut->myMethod($testIterations);
    }
    catch (\Exception $e) {
        $this->assertSame("All done", array_pop($callLog));
        $this->assertSame("Something went wrong: $exceptionMessage", array_pop($callLog));
        $this->assertNotSame("Finishing off", array_pop($callLog));

        throw $e;
    }
}

The first key bit is capturing the call-log. This creates an array, and with each call to the mocked logging method, we capture what was logged. Note that these expectations don't enforce any rules here. They just capture what's going on.

Because this method call throws an exception, I have to try/catch it so that I can regain control once it's errored, and inspect the call log to see it's what I expect. My approach here is far more semantic: I check the last entry is the "Finishing off" one, the one before that is logging the correct exception, and - this is slightly different now - as a failsafe I check that the one before that is anything other than "All Done". Because if it was that value, it'd mean the logic flow did not branch at the right stage of things. Lastly I throw the exception I caught so PHPUnit's own exception-expectation-checking also validates that too.

One of the best things about this is that this test doesn't care about what goes on in the log prior to the stuff it's testing, it is able to focus on precisely what it cares about. This is much clearer code. An example of this would be to change this line in each test:

$testIterations = 6;

Change it to be any other number, and the first test breaks, whereas the second test keeps working. Win.

That's pretty cool.

I hasten to add I would probably not usually be focusing quite this closely on this sort of sequencing, but in this case we need to know things are working properly.

It's a shame PHPUnit doesn't just expose the call-log like this, as I can't be the first person who's been in this situation. But I guess the tools are there to DIY like I have here, so no harm done.

Righto.

--
Adam

Saturday 29 April 2017

CFML: Lucee 13 v 414 v 6 ColdFusion. We have a "winner"

G'day:
Argh. CFML again. I'm doing this because I was a bit of a meanie to Brad on the CFML Slack Channel, and promised I'd try to play nice with one of our current exercises set out by Ryan:

Todays challenge: post something you can do in CFML that you don’t think the majority of people know. Maybe its a little-known function, or some java integration, or a technique from another language that isn’t common in CFML - or even just something that you’ve learned from this slack group - if you didn’t know it before someone else probably doesn’t know it now. Doesn’t even have to be a good idea or something you’ve ever actually used but bonus points for useful stuff. Can apply to ACF [ed: Adobe ColdFusion, or could just  be ColdFusion or CF. Dunno why ppl say "ACF"] or Lucee but bonus points for something universal.
This is a great idea. And whilst I am trying to not use CFML any more due to it not being a good use of my time, I still try to help other people with it on the Slack channel, and CFML does do some good stuff.

I couldn't think of anything interesting that no-one else would know about, but I figured I'd show people a different approach to something. Just as food for thought.

In CFML, when hitting the DB one receives the data back not as an array of objects or an array of structs, but as a single "Query Object", which internally contains the data, and exposes access to said data by various functions, methods, and statements. This is fine which restricted to CFML code, but increasingly code these days needs to interchange with other systems, and sometimes it's a bit of a pain converting from a Query object to something more interchangeable like... an array of objects (or structs in CFML). There's no native method to do this, but it's easy enough to do with a reduction:

// fake a DB call
numbers = queryNew("id,en,mi", "integer,varchar,varchar", [
    [1,"one","tahi"],
    [2,"two","rua"],
    [3,"three","toru"],
    [4,"four","wha"]
]);

numbersAsArray = numbers.reduce(function(elements, row){
    return elements.append(row);
}, []);

writeDump({
    numbers=numbers,
    numbersAsArray=numbersAsArray
});

(run this yerself on trycf.com)

Here I'm faking the DB hit, but that can be chained together too:

numbersAsArray = queryExecute("SELECT * FROM numbers")
    .reduce(function(rows, row){
        return rows.append(row);
    }, []);

All good, and nice and compact. Not very inspiring though.

So as a next step I decided to add some more method calls in there. Let's say I only wanted numbers greater than 5, I wanted the keys in the structs to be different from the ones in the DB, and I wanted it sorted in reverse order. Obviously this is all easily done in the DB:

numbers = queryExecute("
    SELECT id AS value, en AS english, mi AS maori
    FROM numbers
    WHERE id > 5
    ORDER BY id DESC
");

But sometimes we have to play the hand we've been dealt, and we cannot change the recordset we're getting back.

Of course we could also do this with a query-of-query too, but CFML's internal SQL implementation offers... "challenges" of its own, so let's forget about that.

Anyway, I ended up with this:

numbersAsArray = queryExecute("SELECT * FROM numbers")
    .filter(function(row){
        return row.id > 5;
    })
    .map(function(row){
        return {value=row.id, english=row.en, maori=row.mi};
    }, queryNew("value,english,maori"))
    .reduce(function(rows=[], row){
        return rows.append(row);
    })
    .sort(function(e1,e2){
        return e2.value - e1.value;
    })
;

That's all straight forward. We retain only the rows we want with filter, we change the column names with map, and again convert the result to be an array of structs with reduce, then finally we re-order them with sort.

That's cool. And the result is...

YES

Huh? I mean literally... the value of numbersAsArray was "YES". Groan. For the uninitiated, the string "YES" is a boolean truthy value in CFML (CFML also has true and false, but it favours "YES" and "NO" for some common-sense-defying reason). And indeed some old-school CFML functions which should have been void functions instead return "YES". But the method versions should not: methods - where sensible - should return a result so they can be chained to the next method call. Like I'm trying to do here: the end of the chain should be the answer.

I could see how this would continue, so I decided to start keeping score of the bugs I found whilst undertaking this exercise.

ColdFusion: 1.

I pared my code back to get rid of any irrelevant bits for the sake of a SSCCE, and ran it on Lucee for comparison. I just got an error, but not one I expected.

I have an Application.cfc set up her to define my datasource, and I had this:

component {
    this.name = getCurrentTemplatePath().hash();
    this.datasource = "scratch_mysql";
}

Now I didn't need the DSN for this SSCCE, but the Application.cfc was still running obviously. And it seems Lucee does not implement the hash method:

Lucee 5.1.3.18 Error (expression)
MessageNo matching Method/Function for String.hash() found

Lucee joins the scoring:

ColdFusion 1 - 1 Lucee

I use the hash function instead of the method to name my application, and at least now Lucee gets to the code I want to run.

numbers = queryNew("id,en,mi", "integer,varchar,varchar", [
    [1,"one","tahi"],
    [2,"two","rua"],
    [3,"three","toru"],
    [4,"four","wha"]
]);

reversed = numbers.sort(function(n1,n2){
    return n2.id - n1.id;
});

writeDump([reversed,numbers]);

And the result:

Array
1
Query
Execution Time: 0 ms
Record Count: 4
Cached: No
Lazy: No 
idenmi
14fourwha
23threetoru
32tworua
41onetahi
2
Query
Execution Time: 0 ms
Record Count: 4
Cached: No
Lazy: No 
idenmi
14fourwha
23threetoru
32tworua
41onetahi

Hang on. Lucee's changed the initial query as well. If it's returning the result, then it should not also be changing the initial value. But I'm gonna say this is due to a sort of sideways compatibility with ColdFusion:

array
1YES
2
query
enidmi
1four4wha
2three3toru
3two2rua
4one1tahi

As it doesn't return the value from the method, it makes sense to act on the initial value itself.

But if Lucee's gonna copy ColdFusion (which it should be) then it should be copying it properly.

ColdFusion 1 - 2 Lucee

To mitigate this, I decide to duplicate the initial query first:

reversed = numbers.duplicate();
reversed.sort(function(n1,n2){
    return n2.id - n1.id;
});

This works fine on ColdFusion:

array
1
query
ENIDMI
1four4wha
2three3toru
3two2rua
4one1tahi
2
query
enidmi
1one1tahi
2two2rua
3three3toru
4four4wha

But breaks on Lucee:

 Error:
No matching Method/Function for Query.duplicate() found on line 9

Hmmm. Well I s'pose the duplicate method doesn't seem to be documented, but it was added in CF2016. This is getting in my way, so I'm still chalking it up to an incompat in Lucee:

ColdFusion 1 - 3 Lucee

(I probably should add a documentation bug with ColdFusion too, but that's a separate matter).

Anyway, that's mostly an aside. In my example what I am sorting is an intermediary value anyhow, so it doesn't matter that it gets sorted as well as being returned. For my purposes I am not using ColdFusion any more, just Lucee, as I'm specifically showing the method chaining thing, and we already know ColdFusion messes this up with how sort works.

So here we go, all done:

numbers = queryExecute("SELECT * FROM numbers")
    .map(function(row){
        return {value=row.id, english=row.en, maori=row.mi};
    }, queryNew("value,english,maori"))
    .filter(function(row){
        return row.value > 5;
    })
    .sort(function(e1,e2){
        return e2.value - e1.value;
    })
    .reduce(function(rows, row){
        return rows.append(row);
    }, [])
;

writeDump(numbers);

And the output:

Array
1
Struct
en
Empty:null
english
stringten
id
Empty:null
maori
stringtekau
mi
Empty:null
value
number10
2
Struct
en
Empty:null
english
stringnine
id
Empty:null
maori
stringiwa
mi
Empty:null
value
number9
3
Struct
en
Empty:null
english
stringeight
id
Empty:null
maori
stringwaru
mi
Empty:null
value
number8
4
Struct
en
Empty:null
english
stringseven
id
Empty:null
maori
stringwhitu
mi
Empty:null
value
number7
5
Struct
en
Empty:null
english
stringsix
id
Empty:null
maori
stringono
mi
Empty:null
value
number6

OK, now WTF is going on? Lucee hasn't remapped the columns properly. it's added the new ones, but it's also included the old ones. It ain't supposed to do that. Contrast ColdFusion & Lucee with some more simple code:

numbers = queryNew("id,en,mi", "integer,varchar,varchar", [
    [1,"one","tahi"],
    [2,"two","rua"],
    [3,"three","toru"],
    [4,"four","wha"]
]);
remapTemplate = queryNew("value,english,maori"); 

reMapped = numbers.map(function(row){
    return {value=row.id, english=row.en, maori=row.mi};
}, remapTemplate);

writeDump(reMapped);

ColdFusion:

query
ENGLISHMAORIVALUE
1onetahi1
2tworua2
3threetoru3
4fourwha4

Lucee:

Query
Execution Time: 0 ms
Record Count: 4
Cached: No
Lazy: No 
valueenglishmaoriidenmi
11onetahi
Empty:null
Empty:null
Empty:null
22tworua
Empty:null
Empty:null
Empty:null
33threetoru
Empty:null
Empty:null
Empty:null
44fourwha
Empty:null
Empty:null
Empty:null

Sigh. What's supposed to be returned by a map operation on a query is a new query with only the columns from that remapTemplate query. That's what it's for.

ColdFusion 1 - 4 Lucee

On a whim I decided to check what Lucee did to the remapTemplate:

Query
Execution Time: 0 ms
Record Count: 4
Cached: No
Lazy: No 
valueenglishmaoriidenmi
11onetahi
Empty:null
Empty:null
Empty:null
22tworua
Empty:null
Empty:null
Empty:null
33threetoru
Empty:null
Empty:null
Empty:null
44fourwha
Empty:null
Empty:null
Empty:null

ColdFusion 1 - 5 Lucee

This situation is slightly contrived as I don't care about that query anyhow. But what if I was using an extant query which had important data in it?

remapTemplate = queryNew("value,english,maori", "integer,varchar,varchar", [
    [5, "five", "rima"]
]);

So here I have a query with some data in it, and for whatever reason I want to remap the numbers query to have the same columns as this one. But obviously I don't want it otherwise messed with. Lucee mungs it though:

Query
Execution Time: 0 ms
Record Count: 5
Cached: No
Lazy: No 
valueenglishmaoriidenmi
15fiverima
21onetahi
Empty:null
Empty:null
Empty:null
32tworua
Empty:null
Empty:null
Empty:null
43threetoru
Empty:null
Empty:null
Empty:null
54fourwha
Empty:null
Empty:null
Empty:null

Not cool.

But wait. We're not done yet. Let's go back to some of my original code I was only running on ColdFusion:

numbersAsArray = queryExecute("SELECT id,mi FROM numbers LIMIT 4")
    .reduce(function(rows=[], row){
        return rows.append(row);
    })
;
writeDump(numbersAsArray);

This was part of the first example, I've just ditched the filter, map and sort: focusing on the reduce. On ColdFusion I get what I'd expect:

array
1
struct
ID1
MItahi
2
struct
ID2
MIrua
3
struct
ID3
MItoru
4
struct
ID4
MIwha

On Lucee I get this:

Lucee 5.1.3.18 Error (expression)
Messagecan't call method [append] on object, object is null
StacktraceThe Error Occurred in
queryReduceSimple.cfm: line 4 
2: numbersAsArray = queryExecute("SELECT id,mi FROM numbers LIMIT 4")
3: .reduce(function(rows=[], row){
4: return rows.append(row);
5: })
6: ;

Hmmm. What's wrong now? Oh. It's this:

reduce(function(rows=[], row)

Notice how I am giving a default value to the first argument there. This doesn't work in Lucee.

ColdFusion 1 - 6 Lucee

This is easy to work around, because reduce functions take an optional last argument which is the initial value for that first argument to the callback, so I can just re-adjust the code like this:

.reduce(function(rows , row){
    return rows.append(row);
}, [])

OK, at this point I give up. Neither implementation of CFML here - either ColdFusion's or Lucee's - is good enough to do what I want to do. Oddly: Lucee is far worse on this occasion than ColdFusion is. That's disappointing.

So currently the score is 1-6 to Lucee. How did I get to 4-13?

I decided to write some test cases with TestBox to demonstrate what ought to be happening. And with the case of duplicate, I tested all native data-types I can think of:

  • struct
  • array
  • query
  • string
  • double (I guess "numeric" in CFML)
  • datetime
  • boolean
  • XML

Lucee failed the whole lot, and ColdFusion failed on numerics and booleans. As this is undocumented behaviour this might seem a bit harsh, but I'm not counting documentation errors against ColdFusion in this case. Also there's no way I'd actually expect numerics and booleans to have a duplicate method... except for the fact that strings do. Now this isn't a method bubbling through from java.lang.String, nor is it some Java method of ColdFusion's string implementation (they're just java.lang.Strings). This is an actively-created CFML member function. So it seems to me that - I guess for the sake of completeness - they implemented for "every" data type... I mean it doesn't make a great deal of sense on a datetime either, really, does it? So the omission of it from numerics and booleans is a bug to me.

This leaves the score:

ColdFusion 3 - 13 Lucee

The last ColdFusion point was cos despite the fact that with the sort operation it makes sense to alter the initial object if the method doesn't return the sorted one... it just doesn't make sense that the sort method has been implemented that way. It should leave the original object alone and return a new sorted object.

ColdFusion 4 - 13 Lucee

My test cases are too long to reproduce here, but you can see 'em on Github: Tests.cfc.

Right so...

Ah FFS.

... I was about to say "right, so that's that: not a great experience coming up with something cool to show to the CFMLers about CFML. Cos shit just didn't work. I found 17 bugs instead".

But I just had a thought about how sort methods work in ColdFusion, trying to find examples of where sort methods return the sorted object, rather than doing an inline support. And I I've found more bugs with both ColdFusion and Lucee.

Here are the cases:

component extends="testbox.system.BaseSpec" {
    function run() {
        describe("Other sort tests", function(){
            it("is a baseline showing using BIFs as a callback", function(){
                var testString = "AbCd";
                var applyTo = function(object, operation){
                    return operation(object);
                };

                var result = applyTo(testString, ucase);
                
                expect(result).toBeWithCase("ABCD");
            });
            describe("using arrays", function(){
                it("can use a function expression calling compareNoCase as a string comparator when sorting", function(){
                    var arrayToSort = ["d","C","b","A"];
                    
                    arrayToSort.sort(function(e1,e2){
                        return compareNoCase(e1, e2);
                    });
                    
                    expect(arrayToSort).toBe(["A","b","C","d"]);
                });
                it("can use the compareNoCase BIF as a string comparator when sorting", function(){
                    var arrayToSort = ["d","C","b","A"];
                    
                    arrayToSort.sort(compareNoCase);
                    
                    expect(arrayToSort).toBe(["A","b","C","d"]);
                });
            });
            describe("using lists", function(){
                it("can use a function expression calling compareNoCase as a string comparator when sorting", function(){
                    var listToSort = "d,C,b,A";
                    
                    var sortedList = listToSort.listSort(function(e1,e2){
                        return compareNoCase(e1, e2);
                    });
                    
                    expect(sortedList).toBe("A,b,C,d");
                    expect(listToSort).toBe("d,C,b,A");
                });
                it("can use the compareNoCase BIF as a string comparator when sorting", function(){
                    var listToSort = "d,C,b,A";
                    
                    var sortedList = listToSort.listSort(compareNoCase);
                    
                    expect(sortedList).toBe("A,b,C,d");
                    expect(listToSort).toBe("d,C,b,A");
                });
            });
        });
    }
}

What I'm doing here is using CFML built-in functions as the callbacks for a sort operation. This should work, because the sort operation needs a comparator function which works exactly like compare / compareNoCase: returns <0, 0, >0 depending on whether the first argument is "less than", "equal to" or "greater than" the second object according to the sort rules. As far as strings go, the built-in functions compare and compareNoCase do this. So they should be usable as callbacks. Since I think CF2016 built-in-functions have been first-class functions, so should be usable wherever something expects a function as an argument.

The first test demonstrates this in action. I have a very contrived situation where I have a function applyTo, which takes an object and a function to apply to it. In the test I pass-in the built-in function ucase as the operation. This test passes fine on ColdFusion; fails on Lucee.

ColdFusion 4 - 14 Lucee

So after I've demonstrated the technique should work, I try to use compareNoCase as the comparator for an array sort. it just doesn't work: it does nothing on ColdFusion, and on Lucee it still just errors (not gonna count that against Lucee, as it's the same bug as in the baseline test).

ColdFusion 5 - 14 Lucee

Next I try to use it on a listSort. This time ColdFusion errors as well. So this is a different bug than the doesn't-do-anything one for arrays.

ColdFusion 6 - 14 Lucee

Here are the results for just this latter tranche of test cases:

ColdFusion:



Lucee:



Fuck me, I've giving up.

This has been the most shit experience I've ever had trying to get CFML to do something. I don't think any of this code is edge-case stuff. Those higher-order functions are perhaps not as commonly used as they ought to be by the CFMLers out there, but I'm just... trying to use them.

So... sorry Brad & Ryan... I tried to come up with something worth showing to the mob that's useful in CFML, but I've failed. And my gut reaction to this exercise is that CFML can go fuck itself, basically.

Righto.

--
Adam