Sunday 26 November 2017

PHP: array_map has a dumb implementation

G'day:
This cropped up a few weeks back, but I've had no motivation to write anything recently so I've sat on it for a bit. It came back to bite me again the other day and my motivation seems to be returning so here we go.

I was working on some code the other day wherein I needed to query some stuff from one DB, and then for the records returned from that, look for records in an entirely different data repository for some "additional data". The best I could do was to get the IDs from the first result and do like a WHERE IN (:ids) of the other data source to get records that were the additional data for some/all of the IDs. There was no one-to-one match between the two data sources, so the second set of results might not have records for all IDs from the first one. It was convenient for me to return the second result not as an indexed array, but as an associative array keyed on the ID. This is easy enough in PHP, and is quite handy:

$peopleData = $conn
    ->query('SELECT date, name FROM mapexampledata')
    ->fetchAll(PDO::FETCH_ASSOC | PDO::FETCH_COLUMN | PDO::FETCH_UNIQUE);

var_dump($peopleData);

Result:

[
    "2008-11-08" => "Jacinda",
    "1990-10-27" => "Bill",
    "2014-09-20" => "James",
    "1979-05-24" => "Winston"
]

Note how the array key is the date column from the query, and the value of the array is the other column (or all the subsequent columns, if there were more). I've just indexed by date here to make it more clear that the index key is not just an integer sequence.

From this data, I needed to remap that to be an array of PersonalMilestone objects, rather than an array of raw data. Note that the object has both date and name values:

class PersonalMilestone {
    public $date;
    public $name;

    function __construct($date, $name) {
        $this->date = $date;
        $this->name = $name;
    }
}

(If it's not obvious... this domain model and data is completely unrelated to the work I was actually doing, it's just something simple I concocted for this blog article. The general gist of the issue is preserved though).

"Easy", I thought; "that's just an array_map operation". Oh hohohoho, how funny I am. I forgot I was using PHP.

So I wrote my tests (OK, I did not write the tests for this example for the blog. "IRL" I did though. Always do TDD, remember?), and then went to write the array_map implementation:

$peopleData = ["2008-11-08" => "Jacinda", "1990-10-27" => "Bill", "2014-09-20" => "James", "1979-05-24" => "Winston"];

$people = array_map(function ($name) {
    return new PersonalMilestone(
        $date, // dammit this won't work: array_map only gives me the value, not the key
        $name
    );
}, $peopleData);

F*** it. This just beggars my belief as the other languages I have done the same operation on passes both key and value into the map call back. What does PHP do instead? Well it allows you to pass multiple arrays to the array_map function, and the callback receives the value from each of those arrays. I dunno why they chose to implement that as even additional behaviour, let along the default (and only) behaviour. Sigh.

But I thought I could leverage this! I could pass the keys of my array in as a second array, and then I'll have both the values I need to create the PersonalMilestone object. Hurrah!

$peopleData = ["2008-11-08" => "Jacinda", "1990-10-27" => "Bill", "2014-09-20" => "James", "1979-05-24" => "Winston"];

$keys = array_keys($peopleData);

$people = array_map(function ($name, $date) {
    return new PersonalMilestone($date, $name);
}, $peopleData, $keys);

var_dump($people);

Result:

array(4) {
  [0]=>
  object(PersonalMilestone)#2 (2) {
    ["date"]=>
    string(10) "2008-11-08"
    ["name"]=>
    string(7) "Jacinda"
  }
  [1]=>
  object(PersonalMilestone)#3 (2) {
    ["date"]=>
    string(10) "1990-10-27"
    ["name"]=>
    string(4) "Bill"
  }
  [2]=>
  object(PersonalMilestone)#4 (2) {
    ["date"]=>
    string(10) "2014-09-20"
    ["name"]=>
    string(5) "James"
  }
  [3]=>
  object(PersonalMilestone)#5 (2) {
    ["date"]=>
    string(10) "1979-05-24"
    ["name"]=>
    string(7) "Winston"
  }
}

Looks OK right? Look again. The array has been changed into an indexed array. So this isn't a remapping, this is just "making a different array". F***'s sake, PHP.

Initially I thought this was because the two arrays I was remapping had different keys, and thought "aah... yeah... if I was daft I could see how I might implement the thing like this", so I changed the second array to be keyed by its own value (whilst shaking my head in disbelief).

$peopleData = ["2008-11-08" => "Jacinda", "1990-10-27" => "Bill", "2014-09-20" => "James", "1979-05-24" => "Winston"];
var_dump($peopleData);

$keys = array_keys($peopleData);
$keysIndexedByKeys = array_combine($keys, $keys);
var_dump($keysIndexedByKeys);

$people = array_map(function ($name, $date) {
    return new PersonalMilestone($date, $name);
}, $peopleData, $keys);

var_dump($people);

(and, yes, this is a lot of horsing around now, even had it worked which (spoiler) it didn't).

Result:
array(4) {
  ["2008-11-08"]=>
  string(7) "Jacinda"
  ["1990-10-27"]=>
  string(4) "Bill"
  ["2014-09-20"]=>
  string(5) "James"
  ["1979-05-24"]=>
  string(7) "Winston"
}
array(4) {
  ["2008-11-08"]=>
  string(10) "2008-11-08"
  ["1990-10-27"]=>
  string(10) "1990-10-27"
  ["2014-09-20"]=>
  string(10) "2014-09-20"
  ["1979-05-24"]=>
  string(10) "1979-05-24"
}
array(4) {
  [0]=>
  object(PersonalMilestone)#2 (2) {
    ["date"]=>
    string(10) "2008-11-08"
    ["name"]=>
    string(7) "Jacinda"
  }
  [1]=>
  object(PersonalMilestone)#3 (2) {
    ["date"]=>
    string(10) "1990-10-27"
    ["name"]=>
    string(4) "Bill"
  }
  [2]=>
  object(PersonalMilestone)#4 (2) {
    ["date"]=>
    string(10) "2014-09-20"
    ["name"]=>
    string(5) "James"
  }
  [3]=>
  object(PersonalMilestone)#5 (2) {
    ["date"]=>
    string(10) "1979-05-24"
    ["name"]=>
    string(7) "Winston"
  }
}

So the first array and the second array now have the same keys, but... the remapped array still doesn't. This is where we get to the subject line of this article: array_map has a dumb implementation.

I gave up on array_map. I really don't think it's fit for purpose beyond the most obviously simple applications. I just used a foreach loop instead:

$people = [];
foreach ($peopleData as $date => $name) {
    $people[$date] = new PersonalMilestone($date, $name);
}

var_dump($people);

And this works:

array(4) {
  ["2008-11-08"]=>
  object(PersonalMilestone)#1 (2) {
    ["date"]=>
    string(10) "2008-11-08"
    ["name"]=>
    string(7) "Jacinda"
  }
  ["1990-10-27"]=>
  object(PersonalMilestone)#2 (2) {
    ["date"]=>
    string(10) "1990-10-27"
    ["name"]=>
    string(4) "Bill"
  }
  ["2014-09-20"]=>
  object(PersonalMilestone)#3 (2) {
    ["date"]=>
    string(10) "2014-09-20"
    ["name"]=>
    string(5) "James"
  }
  ["1979-05-24"]=>
  object(PersonalMilestone)#4 (2) {
    ["date"]=>
    string(10) "1979-05-24"
    ["name"]=>
    string(7) "Winston"
  }
}

This is simple enough code, but I like having my code more descriptive if possible, so if I'm iterating over an array because I want to remap its values, then I prefer to use a mapping call, not a generic loop. But we need to be pragmatic about these things.

Oh... for the sake of completeness I did work out how to do it "properly" (where "properly" == "passes my tests" not "is good code"):

$keys = new ArrayIterator(array_keys($peopleData));

$people = array_map(function ($name) use ($keys) {
    $date = $keys->current();
    $keys->next();
    return new PersonalMilestone($date, $name);
}, $peopleData);

(I'll spare you the output... you get the idea by now).

What I've done here is to pass a separate iterator into the callback, and I manually iterate over that each iteration of the callback loop. This gives me the value for the date key for each element of the source array. But that's just shoehorning square peg into a round hole, so I would note that down as a curio, but not actually advocate its use.

I was left with the feeling "PHP, you're f***in' rubbish" (a feeling I get increasingly the more I use it), but I thought just cos I know other languages which do this in a more clear fashion doesn't mean that they're right and PHP is (unhelpfully) wrong. I decided to have a look at how a few other languages tackle the same issue, to see if PHP really does suck in this regard. However that will be the topic for a follow-up article, as this one is already long enough.

Righto.

--
Adam