Showing posts with label PHP. Show all posts
Showing posts with label PHP. Show all posts

Monday, 7 May 2018

In defence of public properties

G'day:
Recently I was re-writing one of our cron job scripts. It's the sort of thing that runs at 2am, reads some stuff from the DB, analyses it, and emails some crap to interested parties. We usually try to avoid complete rewrites, but this code was written... erm... [cough insert some period of time ago when it was apparently OK to write really shit code with no code quality checks and no tests]... and it's unmaintainable, so we're biting the bullet and rewriting coherently whilst adding some business logic changes into it.

The details for the email - to, from, subject, data to build the body from - come from three different sources in total, before sending to the mailer service to be queued-up for sending. So I knocked together a simple class to aggregate that information together:

class SpecialInterestingReportEmail {

    public $from;
    public $to;
    public $subject;
    public $body;

    public static $bodyTemplate = "specialInterestingReportEmail.html.twig";
    
    function __construct($from, $to, $subject, $body) {
        $this->from = $from;
        $this->to = $to;
        $this->subject = $subject;
        $this->body = $body;
    }
}

I then have a factory method in an EmailMessageService (not entirely sure that's the best name for it, but [shrug]) which chucks all the bits and pieces together, and returns an email object:

function createSpecialInterestingReportEmail($special, $interesting) {
    $body = $this-twigService->render(
        SpecialInterestingReportEmail::bodyTemplate,
        [
            "special" => $special,
            "interesting" => $interesting
        ]
    )
    return new SpecialInterestingReportEmail(
        $this->addressesForReport->from,
        $this->addressesForReport->to,
        $this->reportSubject,
        $body
    );
}

I don't think this design is perfect, but in the context of what we're doing it's OK.

Notice:
All interaction I mention below with my colleagues is paraphrased, embellished, or outright changed to make my point. I am not quoting anyone, and take this as a work of fiction. It's also drawn from previous similar conversations I've had on this topic.

One of my code reviewers was horrified:

Code Reviewer: "You can't have public properties! That's bad OOP!"
Me: "Why?"
CR: "You're breaking encapsulation. You need to do it like this [proceeds to tutor me on how to write accessor methods, because, like, I needed a tutorial on design anti-patterns]".
Me: "Why? What are we gaining by making those properties private, just to force us to write a bunch of boilerplate code to then access them? And how is that a good change to make to this code that now exists and is fulfilling the requirement? This is code review... we've already done the planning for this. This class is just a single container to aggregate some data into a conceptual 'email message' object. It doesn't need any behaviour - it doesn't need any methods - it's just for moving data between services".
CR: "But you can't break encapsulation!"
Me: "Why not? Other than parroting dogma read from some OOP 101 book, why is this a problem? And how's it solved by accessor methods?"
CR: "Well someone could directly change the values to be wrong!"
Me: "Well... they probably shouldn't do that then. That'd be dumb. They could equally put the wrong values straight into the constructor too. It'd still be just as wrong. Look... we don't need to validate the data - that seems to be your concern here? - as it's just a script reading known-good values from the DB, and sending them to our email queue. The code's right there. There's no scope for the data to accidentally be wrongified. And if it was... the tests would pick it up anyhow".
CR: "But what if other code starts using this code?"
Me: "What... yer saying 'what it some other application starts using this cronjob as some sort of library?' Why would that happen? This is not a public API. It makes no pretence of being a public API. If anyone started using this as an API, they deserve everything they get".
CR: "But they might".
ME: "OK, let's say some of this code is gonna be needed somewhere else, and we need to make it into a library. At that point in time, we'd extract the relevant code, consider stuff like encapsulation, data hiding, providing a public interface etc. But that would be different code from this lot".
CR: "But you should write all code like it will be used that way".
Me: "Again: why? This code is not going to be used this way. It just isn't. And anyhow, what yer suggesting is YAGNI / coding-for-an-unknown-future anyhow. We don't gain anything for the purposes of the current requirement chucking pointless boilerplate code into these classes. That's not an improvement".

And it went on.

One problem I encounter fairly frequently - both at work and in the wild - is people who will read about some concept, and thenceforth That Concept Is Law. Someone has Said It, therefore it applies to every single situation thereafter. They don't ever seem to bother trying to reason why the "law" was stated in the first place, what about it makes it a good rule, and when perhaps it's not necessary.

I look at these things and think "will this benefit my current task?". Generally it does because these things don't acquire acceptance without scrutiny and sanity-checking. But sometimes, however, it doesn't make sense to follow the dogma.

In this situation: it does not help.

In a different situation, if I was writing a separate API which handled the Email object creation, and other apps were gonna use it, I'd've possibly tightened-up the property access. But only possibly. My position on such things is to be liberal with what one permits to be done with code. If all my theoretical accessor method was gonna achieve was to return a private value... I'd really consider just leaving it public instead, and allow direct access. Why not?

There's a risk that later I might need to better control access to those properties, but... I'd deal with that at the time: these things can be handled as/when. It's even easy enough to have transitionary code from direct-access to accessor-access using __get and __set. I know these are frowned-upon, but in transitionary situations: they're fine. So one could seamlessly patch the API for code already consuming it via direct access with that transitionary approach, and then in the next (or some subsequent ~) version, make the breaking change to require using accessors, eg:

v1.0 - direct access only.
v1.1 - add in transitionary code using __get and __set. Advise that direct access is deprecated and will be removed in the next release. Also add in accessor methods.
v2.0 - remove direct access.

It doesn't even need to be v2.0. Just "some later version". But for the sake of making sure the transitionary code is temporary, better to do sooner rather than later. The thing is that software versioning is there for a reason, so it's OK to only introduce necessary coding overhead when it's needed.

Another thing that occurred to me when I was thinking about this. Consider this code:

$email = [
    "from" => "from@example.com",
    "to" => "to@example.com",
    "subject" => "Example subject",
    "body" => "Example body"
];

$fromAddress = $email["from"];

Perfectly fine. So how come this code is not fine:

$email = new Email(
    "from@example.com",
    "to@example.com",
    "Example subject",
    "Example body"
);

$fromAddress = $email->from;

Why's it OK to access an array directly, but it's not - apparently - OK to give the collection of email bits a name (Email), and otherwise use it the same way?

I can't think of one.

Rules are great. Rules apply sensibly 95% of the time. But when one reads about a rule... don't just stop there... understand the rule. The rules are not there simply for the sake of enabling one to not then think about things. Always think about things.

Righto.

--
Adam

PS: completely happy to be schooled as to how I am completely wrong here. This is half the reason I wrote this.

Saturday, 5 May 2018

PHP: perpetual misuse of array type declarations

G'day:
This has been irking me enough recently to cause me to dust off this old blog.

For a while PHP has had limited argument type checking on function arguments, and this was improved in PHP 7. The docs cover it - Function arguments › Type declarations - so I'll not repeat them here. But we can have this code:

function takesArray(array $a){
    return $a;
}

var_dump(takesArray(["tahi", "rua", "toru", "wha"]));

Output:

array(4) {
  [0]=>
  string(4) "tahi"
  [1]=>
  string(3) "rua"
  [2]=>
  string(4) "toru"
  [3]=>
  string(3) "wha"
}

And if we try to pass something other than an array:

<?php
function takesArray(array $a){
    return $a;
}

try {
    $a = takesArray("NOT_AN_ARRAY");
} catch (\Throwable $t) {
    echo $t;
}

We get this:

TypeError:
Argument 1 passed to takesArray() must be of the type array, string given,
called in [...][...] on line 7 and defined in [...][...]:2
Stack trace:
#0 [...][...](7): takesArray('NOT_AN_ARRAY')
#1 {main}

Fair enough. No complaint there.

What I have an issue with is that now I see a lot of devs slapping "array" as an parameter type in every method signature that takes an array. What's wrong with that you probably ask? Well... in most situations they're doing this, it's not simply an array that is the requirement. It's generally one of these two situations:

function getFirstName(array $name){
    return $name['firstName'];
}

$firstName = getFirstName(["lastName" => "Cameron", "firstName" => "Adam"]);

echo $firstName;

OK, this is a really contrived example, but it's simple and demonstrates the point. Here the function takes an array, and returns the firstName from the array. But note that the requirement of the argument is not "it's an array", as stated. It very specifically needs to be "an associative array that has a key firstName". "array" is not correct here. It's not at all helpful. If one is needing to do this sort of thing, then create a class Name, and specify that. Because that's what you actually want here:

class Name {
    public $lastName;
    public $firstName;
    
    function __construct($lastName, $firstName) {
        $this->lastName = $lastName;
        $this->firstName = $firstName;
    }
}

function getFirstName(Name $name){
    return $name->firstName;
}

The second bogus usage of array as a type declaration is with indexed arrays. Consider this:

function filterOutEarlierDates(array $dates, int $year){
    $cutOff = new \DateTime("$year-01-01");

    $filtered = array_filter($dates, function (\DateTimeInterface $date) use ($cutOff) {
        $diff = $date->diff($cutOff);
        return $diff->invert === 1;
    });
    
    return array_values($filtered);
}

$dobs = [
    new \DateTime("1968-12-20"),
    new \DateTime("1970-02-17"),
    new \DateTime("2011-03-24"),
    new \DateTime("2016-08-17")
];
$cutOff = 2011;

$youngins = filterOutEarlierDates($dobs, $cutOff);

var_dump($youngins);


Here filterOutEarlierDates claims it takes an array. Well it does. Again this lacks the necessary precision for the requirement. It doesn't take an array. It takes - specifically - an array of DateTimeInterface objects. If you give it anything else than that, the code breaks. As type checking is specifically to guard against that, specifying array is simply neither correct or even helpful here.

Specifying array as an argument type check does have a place. When what the function needs actually is an array. And just an array. Any old array.

function array_some(array $array, callable $callback) {
    foreach ($array as $key => $value) {
        $result = $callback($value, $key);
        if ($result) {
            return true;
        }
    }
    return false;
}

$colours = ["whero", "karaka", "kowhai", "kakariki", "kikorangi", "poropango", "papura"];

$containsSixLetterWords = array_some($colours, function ($colour) {
   return strlen($colour) === 6; 
});

var_dump($containsSixLetterWords);

$numbers = [
    "one" => "tahi",
    "two" => "rua",
    "three" => "toru",
    "four" => "wha",
    "five" => "rima",
    "six" => "ono",
    "seven" => "whitu",
    "eight" => "ware",
    "nine" => "iwa",
    "ten" => "tekau"
];

$hasShortKeys = array_some($numbers, function ($number, $key) {
   return strlen($key) <= 3; 
});

var_dump($hasShortKeys);

Here's a PHP equivalent of JS's some method. It takes an array (any array) and a callback, and applies the callback to each array entry until the callback returns true, then it exits. This function correctly claims the first argument value needs to be an array. Without any other sort of qualification: it just needs to be an array.

Actually there's a similar issue with my type checking of the $callback parameter there. It can't be any callable: it needs to be a callable that takes a coupla arguments and returns a boolean. In C# I guess we'd use a Delegate for that, but there's no such construct in PHP. So I should probably settle for giving the argument a clear name ($callback probably isn't good enough here), and dispense with the type check.

But even then I question the merits of type-checking array here. What if instead of just an array, we had a collection object that implements \Iterator, eg:

class ColourCollection implements \Iterator {
    private $position = 0;
    private $colours;  

    public function __construct($colours) {
        $this->position = 0;
        $this->colours = $colours;
    }

    public function rewind() {
        $this->position = 0;
    }

    public function current() {
        return $this->colours[$this->position];
    }

    public function key() {
        return $this->position;
    }

    public function next() {
        ++$this->position;
    }

    public function valid() {
        return isset($this->colours[$this->position]);
    }
}

$rainbow = new ColourCollection($colours);

If we run this variation of the code, we get:

Fatal error:
Uncaught TypeError: Argument 1 passed to array_some() must be of the type array, object given,
 called in [...][...] on line 48 and defined in [...][...]:2
Stack trace:
#0 [...][...](48): array_some(Object(ColourCollection), Object(Closure))
#1 {main}
  thrown in [...][...] on line 2

If we take the type check out: the code works. So how is the array type check really helping us here? It's not.

To be honest I think I'm being a bit edge-case-y with that last example. I don't see a lot of objects that implement the iterator class: we just stick with arrays instead. But it is a legit consideration I guess.

I would use array as a type check in that array_some situation - where the argument actually just needs to be an array, with no special other qualities about it. But this is very rare. In other cases where the argument value needs to be "some specific type or array", then I think it's dead wrong to have an array type check there.

Righto.

--
Adam

Tuesday, 28 November 2017

That array_map quandary implemented in other languages

G'day:
A coupla days ago I bleated about array_map [having] a dumb implementation. I had what I thought was an obvious application for array_map in PHP, but it couldn't really accommodate me due to array_map not exposing the array's keys to the callback, and then messing up the keys in the mapped array if one passes array_map more than one array to process.

I needed to remap this:

[
    "2008-11-08" => "Jacinda",
    "1990-10-27" => "Bill",
    "2014-09-20" => "James",
    "1979-05-24" => "Winston"
]

To this:

array(4) {
  '2008-11-08' =>
  class IndexedPerson#3 (2) {
    public $date =>
    string(10) "2008-11-08"
    public $name =>
    string(7) "Jacinda"
  }
  '1990-10-27' =>
  class IndexedPerson#4 (2) {
    public $date =>
    string(10) "1990-10-27"
    public $name =>
    string(4) "Bill"
  }
  '2014-09-20' =>
  class IndexedPerson#5 (2) {
    public $date =>
    string(10) "2014-09-20"
    public $name =>
    string(5) "James"
  }
  '1979-05-24' =>
  class IndexedPerson#6 (2) {
    public $date =>
    string(10) "1979-05-24"
    public $name =>
    string(7) "Winston"
  }
}

Note how the remapped object also contains the original key value. That was the sticking point. Go read the article for more detail and more whining.

OK so my expectations of PHP's array higher order functions are based  on  my experience with JS's and CFML's equivalents. Both of which receive the key as well as the value in all callbacks. I decided to see how other languages achieve the same end, and I'll pop the codee in here for shits 'n' giggles.


CFML

Given most of my history is as a CFML dev, that one was easy.

peopleData = ["2008-11-08" = "Jacinda", "1990-10-27" = "Bill", "2014-09-20" = "James", "1979-05-24" = "Winston"]

people = peopleData.map((date, name) => new IndexedPerson(date, name))

people.each((date, person) => echo("#date# => #person#<br>"))

Oh, this presupposes the IndexedPerson component. Due to a shortcoming of how CFML works, components must be declared in a file of their own:

component {

    function init(date, name) {
        this.date = date
        this.name = name
    }

    string function _toString() {
        return "{date:#this.date#; name: #this.name#}"
    }
}


But the key bit is the mapping operation:

people = peopleData.map((date, name) => new IndexedPerson(date, name))

Couldn't be simpler (NB: this is Lucee's CFML implementation, not ColdFusion's which does not yet support arrow functions).

The output is:


2008-11-08 => {date:2008-11-08; name: Jacinda}
1990-10-27 => {date:1990-10-27; name: Bill}
2014-09-20 => {date:2014-09-20; name: James}
1979-05-24 => {date:1979-05-24; name: Winston}

Also note that CFML doesn't have associative arrays, it has structs, so the keys are not ordered. This does not matter here. (Thanks to Zac for correcting me here: CFML does have ordered structs these days).


JS

The next language I turned to was JS as that's the I'm next most familiar with. One thing that hadn't occurred to me is that whilst JS's Array implementation has a map method, we need to use an object here as the keys are values not indexes. And whilst I knew Objects didn't have a map method, I didn't know what the equivalent might be.

Well it turns out that there's no real option to use a map here, so I needed to do a reduce on the object's entries, Still: it's pretty terse and obvious:

class IndexedPerson {
    constructor(date, name) {
        this.date = date
        this.name = name
    }
}

let peopleData = {"2008-11-08": "Jacinda", "1990-10-27": "Bill", "2014-09-20": "James", "1979-05-24": "Winston"}

let people = Object.entries(peopleData).reduce(function (people, personData) {
    people.set(personData[0], new IndexedPerson(personData[0], personData[1]))
    return people
}, new Map())

console.log(people)

This returns what we want:

Map {
  '2008-11-08' => IndexedPerson { date: '2008-11-08', name: 'Jacinda' },
  '1990-10-27' => IndexedPerson { date: '1990-10-27', name: 'Bill' },
  '2014-09-20' => IndexedPerson { date: '2014-09-20', name: 'James' },
  '1979-05-24' => IndexedPerson { date: '1979-05-24', name: 'Winston' } }

TBH I think this is a misuse of an object to contain basically an associative array / struct, but so be it. It's the closest analogy to the PHP requirement. I was able to at least return it as a Map, which I think is better. I tried to have the incoming personData as a map, but the Map prototype's equivalent of entries() used above is unhelpful in that it returns an Iterator, and the prototype for Iterator is a bit spartan.

I think it's slightly clumsy I need to access the entries value via array notation instead of some sort of name, but this is minor.

As with all my code, I welcome people showing me how I should actually be doing this. Post a comment. I'm looking at you Ryan Guill ;-)

Java

Next up was Java. Holy fuck what a morass of boilterplate nonsense I needed to perform this simple operation in Java. Deep breath...

import java.util.HashMap;
import java.util.Map;
import java.util.stream.Collectors;

class IndexedPerson {
    String date;
    String name;
    
    public IndexedPerson(String date, String name) {
        this.date = date;
        this.name = name;
    }
    
    public String toString(){
        return String.format("{date: %s, name: %s}", this.date, this.name);
    }
}

class Collect {

    public static void main(String[] args) {

        HashMap<String,String> peopleData = loadData();

        HashMap<String, IndexedPerson> people = mapToPeople(peopleData);
            
        dumpIdents(people);
    }
    
    private static HashMap<String,String> loadData(){
        HashMap<String,String> peopleData = new HashMap<String,String>();
        
        peopleData.put("2008-11-08", "Jacinda");
        peopleData.put("1990-10-27", "Bill");
        peopleData.put("2014-09-20", "James");
        peopleData.put("1979-05-24", "Winston");
        
        return peopleData;
    }
    
    private static HashMap<String,IndexedPerson> mapToPeople(HashMap<String,String> peopleData) {
        HashMap<String, IndexedPerson> people = (HashMap<String, IndexedPerson>) peopleData.entrySet().stream()
            .collect(Collectors.toMap(
                e -> e.getKey(),
                e -> new IndexedPerson(e.getKey(), e.getValue())
            ));
            
        return people;
    }
    
    private static void dumpIdents(HashMap<String,IndexedPerson> people) {
        for (Map.Entry<String, IndexedPerson> entry : people.entrySet()) {
            System.out.println(String.format("%s => %s", entry.getKey(), entry.getValue()));
        }
    }
    
}

Result:
1979-05-24 => {date: 1979-05-24, name: Winston}
2014-09-20 => {date: 2014-09-20, name: James}
1990-10-27 => {date: 1990-10-27, name: Bill}
2008-11-08 => {date: 2008-11-08, name: Jacinda}

Most of that lot seems to be just messing around telling Java what types everything are. Bleah.

The interesting bit - my grasp of which is tenuous - is the Collectors.toMap. I have to admit I derived that from reading various Stack Overflow articles. But I got it working, and I know the general approach now, so that's good.

Too much code for such a simple thing though, eh?


Groovy

Groovy is my antidote to Java. Groovy makes this shit easy:

class IndexedPerson {
    String date
    String name

    IndexedPerson(String date, String name) {
        this.date = date;
        this.name = name;
    }

    String toString(){
        String.format("date: %s, name: %s", this.date, this.name)
    }
}

peopleData = ["2008-11-08": "Jacinda", "1990-10-27": "Bill", "2014-09-20": "James", "1979-05-24": "Winston"]

people = peopleData.collectEntries {date, name -> [date, new IndexedPerson(date, name)]}

people.each {date, person -> println String.format("%s => {%s}", date, person)}

Bear in mind that most of that is getting the class defined, and the output. The bit that does the mapping is just the one line in the middle. That's more like it.

Again, I don't know much about Groovy… I had to RTFM to find out how to do the collectEntries bit, but it was easy to find and easy to understand.

I really wish I had a job doing Groovy.

Oh yeah, for the sake of completeness, the output was thus:

2008-11-08 => {date: 2008-11-08, name: Jacinda}
1990-10-27 => {date: 1990-10-27, name: Bill}
2014-09-20 => {date: 2014-09-20, name: James}
1979-05-24 => {date: 1979-05-24, name: Winston}


Ruby

Ruby's version was pretty simple too as it turns out. No surprise there as Ruby's all about higher order functions and applying blocks to collections and stuff like that.

class IndexedPerson

    def initialize(date, name)
        @date = date
        @name = name
    end

    def inspect
        "{date:#{@date}; name: #{@name}}\n"
    end
end

peopleData = {"2008-11-08" => "Jacinda", "1990-10-27" => "Bill", "2014-09-20" => "James", "1979-05-24" => "Winston"}

people = peopleData.merge(peopleData) do |date, name|
    IndexedPerson.new(date, name)
end

puts people

Predictable output:

{"2008-11-08"=>{date:2008-11-08; name: Jacinda}
, "1990-10-27"=>{date:1990-10-27; name: Bill}
, "2014-09-20"=>{date:2014-09-20; name: James}
, "1979-05-24"=>{date:1979-05-24; name: Winston}
}

I wasn't too sure about all that block nonsense when I first started looking at Ruby, but I quite like it now. It's easy to read.


Python

My Python skills don't extend much beyond printing G'day World on the screen, but it was surprisingly easy to google-up how to do this. And I finally got to see what Python folk are on about with this "comprehensions" stuff, which I think is quite cool.

class IndexedPerson:
    def __init__(self, date, name):
        self.date = date
        self.name = name

    def __repr__(self):
        return "{{date: {date}, name: {name}}}".format(date=self.date, name=self.name)

people_data = {"2008-11-08": "Jacinda", "1990-10-27": "Bill", "2014-09-20": "James", "1979-05-24": "Winston"}

people = {date: IndexedPerson(date, name) for (date, name) in people_data.items()}

print("\n".join(['%s => %s' % (date, person) for (date, person) in people.items()]))


And now that I am all about Clean Code, I kinda get the "whitespace as indentation" thing too. It's clear enough if yer code is clean in the first place.

The output of this is identical to the Groovy one.

Only one more then I'll stop.

Clojure

I can only barely do G'day World in Clojure, so this took me a while to work out. I also find the Clojure docs to be pretty impentrable. I'm sure they're great if one already knows what one is doing, but I found them pretty inaccessible from the perspective of a n00b. It's like if the PHP docs were solely the user-added stuff at the bottom of each docs page. Most blog articles I saw about Clojure were pretty much just direct regurgitation of the docs, without much value-add, if I'm to be honest.

(defrecord IndexedPerson [date name])

(def people-data (array-map "2008-11-08" "Jacinda" "1990-10-27" "Bill" "2014-09-20" "James" "1979-05-24" "Winston"))

(def people
  (reduce-kv
    (fn [people date name] (conj people (array-map date (IndexedPerson. date name))))
    (array-map)
    people-data))

(print people)

The other thing with Clojure for me is that the code is so alien-looking to me that I can't work out how to indent stuff to make the code clearer. All the examples I've seen don't seem very clear, and the indentation doesn't help either, I think. I guess with more practise it would come to me.

It seems pretty powerful though, cos there's mot much code there to achieve the desired end-goal.

Output for this one:

{2008-11-08 #user.IndexedPerson{:date 2008-11-08, :name Jacinda},
1990-10-27 #user.IndexedPerson{:date 1990-10-27, :name Bill},
2014-09-20 #user.IndexedPerson{:date 2014-09-20, :name James},
1979-05-24 #user.IndexedPerson{:date 1979-05-24, :name Winston}}


Summary

This was actually a very interesting exercise for me, and I learned stuff about all the languages concerned. Even PHP and CFML.

I twitterised a comment regarding how pleasing I found each solution:


This was before I did the Clojure one, and I'd slot that in afer CFML and before JS, making the list:
  1. Python
  2. Ruby
  3. Groovy
  4. CFML
  5. Clojure
  6. JS
  7. PHP
  8. Java

Python's code looks nice and it was easy to find out what to do. Same with Ruby, just not quite so much. And, really same with Groovy. I could order those three any way. I think Python tips the scales slightly with the comprehensions.

CFML came out suprisingly well in this, as it's a bloody easy exercise to achieve with it.

Clojure's fine, just a pain in the arse to understand what's going on, and the code looks a mess to me. But it does a lot in little space.

JS was disappointing because it wasn't nearly so easy as I expected it to be.

PHP is a mess.

And - fuck me - Java. Jesus.

My occasional reader Barry O'Sullivan volunteered some input the other day:


Hopefully he's still up for this, and I'll add it to the list so we can have a look at that code too.

Like I said before, if you know a better or more interesting way to do this in any of the languages above, or any other languages, make a comment and post a link to a Gist (just don't put the code inline in the comment please; it will not render at all well).

I might have another one of these exercises to do soon with another puzzle a friend of mine had to recently endure in a job-interview-related coding test. We'll see.

Righto.

--
Adam

Sunday, 26 November 2017

PHP: array_map has a dumb implementation

G'day:
This cropped up a few weeks back, but I've had no motivation to write anything recently so I've sat on it for a bit. It came back to bite me again the other day and my motivation seems to be returning so here we go.

I was working on some code the other day wherein I needed to query some stuff from one DB, and then for the records returned from that, look for records in an entirely different data repository for some "additional data". The best I could do was to get the IDs from the first result and do like a WHERE IN (:ids) of the other data source to get records that were the additional data for some/all of the IDs. There was no one-to-one match between the two data sources, so the second set of results might not have records for all IDs from the first one. It was convenient for me to return the second result not as an indexed array, but as an associative array keyed on the ID. This is easy enough in PHP, and is quite handy:

$peopleData = $conn
    ->query('SELECT date, name FROM mapexampledata')
    ->fetchAll(PDO::FETCH_ASSOC | PDO::FETCH_COLUMN | PDO::FETCH_UNIQUE);

var_dump($peopleData);

Result:

[
    "2008-11-08" => "Jacinda",
    "1990-10-27" => "Bill",
    "2014-09-20" => "James",
    "1979-05-24" => "Winston"
]

Note how the array key is the date column from the query, and the value of the array is the other column (or all the subsequent columns, if there were more). I've just indexed by date here to make it more clear that the index key is not just an integer sequence.

From this data, I needed to remap that to be an array of PersonalMilestone objects, rather than an array of raw data. Note that the object has both date and name values:

class PersonalMilestone {
    public $date;
    public $name;

    function __construct($date, $name) {
        $this->date = $date;
        $this->name = $name;
    }
}

(If it's not obvious... this domain model and data is completely unrelated to the work I was actually doing, it's just something simple I concocted for this blog article. The general gist of the issue is preserved though).

"Easy", I thought; "that's just an array_map operation". Oh hohohoho, how funny I am. I forgot I was using PHP.

So I wrote my tests (OK, I did not write the tests for this example for the blog. "IRL" I did though. Always do TDD, remember?), and then went to write the array_map implementation:

$peopleData = ["2008-11-08" => "Jacinda", "1990-10-27" => "Bill", "2014-09-20" => "James", "1979-05-24" => "Winston"];

$people = array_map(function ($name) {
    return new PersonalMilestone(
        $date, // dammit this won't work: array_map only gives me the value, not the key
        $name
    );
}, $peopleData);

F*** it. This just beggars my belief as the other languages I have done the same operation on passes both key and value into the map call back. What does PHP do instead? Well it allows you to pass multiple arrays to the array_map function, and the callback receives the value from each of those arrays. I dunno why they chose to implement that as even additional behaviour, let along the default (and only) behaviour. Sigh.

But I thought I could leverage this! I could pass the keys of my array in as a second array, and then I'll have both the values I need to create the PersonalMilestone object. Hurrah!

$peopleData = ["2008-11-08" => "Jacinda", "1990-10-27" => "Bill", "2014-09-20" => "James", "1979-05-24" => "Winston"];

$keys = array_keys($peopleData);

$people = array_map(function ($name, $date) {
    return new PersonalMilestone($date, $name);
}, $peopleData, $keys);

var_dump($people);

Result:

array(4) {
  [0]=>
  object(PersonalMilestone)#2 (2) {
    ["date"]=>
    string(10) "2008-11-08"
    ["name"]=>
    string(7) "Jacinda"
  }
  [1]=>
  object(PersonalMilestone)#3 (2) {
    ["date"]=>
    string(10) "1990-10-27"
    ["name"]=>
    string(4) "Bill"
  }
  [2]=>
  object(PersonalMilestone)#4 (2) {
    ["date"]=>
    string(10) "2014-09-20"
    ["name"]=>
    string(5) "James"
  }
  [3]=>
  object(PersonalMilestone)#5 (2) {
    ["date"]=>
    string(10) "1979-05-24"
    ["name"]=>
    string(7) "Winston"
  }
}

Looks OK right? Look again. The array has been changed into an indexed array. So this isn't a remapping, this is just "making a different array". F***'s sake, PHP.

Initially I thought this was because the two arrays I was remapping had different keys, and thought "aah... yeah... if I was daft I could see how I might implement the thing like this", so I changed the second array to be keyed by its own value (whilst shaking my head in disbelief).

$peopleData = ["2008-11-08" => "Jacinda", "1990-10-27" => "Bill", "2014-09-20" => "James", "1979-05-24" => "Winston"];
var_dump($peopleData);

$keys = array_keys($peopleData);
$keysIndexedByKeys = array_combine($keys, $keys);
var_dump($keysIndexedByKeys);

$people = array_map(function ($name, $date) {
    return new PersonalMilestone($date, $name);
}, $peopleData, $keys);

var_dump($people);

(and, yes, this is a lot of horsing around now, even had it worked which (spoiler) it didn't).

Result:
array(4) {
  ["2008-11-08"]=>
  string(7) "Jacinda"
  ["1990-10-27"]=>
  string(4) "Bill"
  ["2014-09-20"]=>
  string(5) "James"
  ["1979-05-24"]=>
  string(7) "Winston"
}
array(4) {
  ["2008-11-08"]=>
  string(10) "2008-11-08"
  ["1990-10-27"]=>
  string(10) "1990-10-27"
  ["2014-09-20"]=>
  string(10) "2014-09-20"
  ["1979-05-24"]=>
  string(10) "1979-05-24"
}
array(4) {
  [0]=>
  object(PersonalMilestone)#2 (2) {
    ["date"]=>
    string(10) "2008-11-08"
    ["name"]=>
    string(7) "Jacinda"
  }
  [1]=>
  object(PersonalMilestone)#3 (2) {
    ["date"]=>
    string(10) "1990-10-27"
    ["name"]=>
    string(4) "Bill"
  }
  [2]=>
  object(PersonalMilestone)#4 (2) {
    ["date"]=>
    string(10) "2014-09-20"
    ["name"]=>
    string(5) "James"
  }
  [3]=>
  object(PersonalMilestone)#5 (2) {
    ["date"]=>
    string(10) "1979-05-24"
    ["name"]=>
    string(7) "Winston"
  }
}

So the first array and the second array now have the same keys, but... the remapped array still doesn't. This is where we get to the subject line of this article: array_map has a dumb implementation.

I gave up on array_map. I really don't think it's fit for purpose beyond the most obviously simple applications. I just used a foreach loop instead:

$people = [];
foreach ($peopleData as $date => $name) {
    $people[$date] = new PersonalMilestone($date, $name);
}

var_dump($people);

And this works:

array(4) {
  ["2008-11-08"]=>
  object(PersonalMilestone)#1 (2) {
    ["date"]=>
    string(10) "2008-11-08"
    ["name"]=>
    string(7) "Jacinda"
  }
  ["1990-10-27"]=>
  object(PersonalMilestone)#2 (2) {
    ["date"]=>
    string(10) "1990-10-27"
    ["name"]=>
    string(4) "Bill"
  }
  ["2014-09-20"]=>
  object(PersonalMilestone)#3 (2) {
    ["date"]=>
    string(10) "2014-09-20"
    ["name"]=>
    string(5) "James"
  }
  ["1979-05-24"]=>
  object(PersonalMilestone)#4 (2) {
    ["date"]=>
    string(10) "1979-05-24"
    ["name"]=>
    string(7) "Winston"
  }
}

This is simple enough code, but I like having my code more descriptive if possible, so if I'm iterating over an array because I want to remap its values, then I prefer to use a mapping call, not a generic loop. But we need to be pragmatic about these things.

Oh... for the sake of completeness I did work out how to do it "properly" (where "properly" == "passes my tests" not "is good code"):

$keys = new ArrayIterator(array_keys($peopleData));

$people = array_map(function ($name) use ($keys) {
    $date = $keys->current();
    $keys->next();
    return new PersonalMilestone($date, $name);
}, $peopleData);

(I'll spare you the output... you get the idea by now).

What I've done here is to pass a separate iterator into the callback, and I manually iterate over that each iteration of the callback loop. This gives me the value for the date key for each element of the source array. But that's just shoehorning square peg into a round hole, so I would note that down as a curio, but not actually advocate its use.

I was left with the feeling "PHP, you're f***in' rubbish" (a feeling I get increasingly the more I use it), but I thought just cos I know other languages which do this in a more clear fashion doesn't mean that they're right and PHP is (unhelpfully) wrong. I decided to have a look at how a few other languages tackle the same issue, to see if PHP really does suck in this regard. However that will be the topic for a follow-up article, as this one is already long enough.

Righto.

--
Adam

Sunday, 23 July 2017

PHP: A function that returns two different things

G'day:
I'm gonna plagiarise one of me own answers to a Stack Overflow question here, as I think it's good generic advice.

The question was "PHP 7: Multiple function return types". The subject line sums it up really. The person here wants to return a value or false from a function. I see this a lot in PHP code. Indeed... PHP itself does it a lot.

My answer was as follows:

[...]

From a design perspective, having a function that potentially returns different types of result indicates a potential design flaw:

  • If you're returning your result, or otherwise false if something didn't go to plan; you should probably be throwing an exception instead. If the function processing didn't go according to plan; that's an exceptional situation: leverage the fact. I know PHP itself has a habit of returning false if things didn't work, but that's just indicative of poor design - for the same reason - in PHP.
  • If your function returns potentially different things, then it's quite possible it's doing more than one thing, which is bad design in your function. Functions should do one thing. If your function has an if/else with the true/false block handling different chunks of processing (as opposed to just handling exit situations), then this probably indicates you ought to have two functions, not one. Leave it to the calling code to decide which is the one to use.
  • If your function returns two different object types which then can be used in a similar fashion in the calling code (ie: there's no if this: do that; else do this other thing in the calling code), this could indicate you ought to be returning an interface, not a concrete implementation.
There will possibly be legit situations where returning different types is actually the best thing to do. If so: all good. But that's where the benefit of using a loosely typed language comes in: just don't specify the return type of the function. But… that said… the situation probably isn't legit, so work through the preceding design considerations first to determine if you really do have one of these real edge cases where returning different types is warranted.

I'll add two more considerations here.

There's another option which is kinda legit. In PHP 7.1 one can declare the return type to be nullable.

Consider a function which potentially returns a boolean:

function f($x) {
    if ($x) {
        return true;
    }
}

We can't declare that with a return type of bool as if our logic strays done that falsey branch, we get a fatal error:

function f($x) : bool {
    if ($x) {
        return true;
    }
}

f(1); // OK

f(0); // PHP Fatal error:  Uncaught TypeError: Return value of f() must be of the type boolean, none returned


But in 7.1, one can declare the return type as nullable:

function g($x) : ?bool {
    if ($x) {
        echo "returning true";
        return true;
    }

    echo "returning null";
    return null;
}

g(1); // returning true
g(0); // returning null

Now sometimes this is legit. One might be looking for something from some other resource (eg: a DB record) which quite legitimately might not exist. And there might not be a legit "null implementation" of the given class one is wanting to return: for example returning the object with some/all of its properties not set or something. In this case a null is OK. But remember in this case one is forcing more logic back on the calling code: checking for the null. So not necessarily a good approach. I'd consider whether this situation is better handled with an exception.

The last consideration is that one can still specify multiple return types in a type-hint annotation, eg:

/**
 * @param bool $b
 * @return A|B
 */
function f($b){
    if ($b) {
        return new A();
    }else {
        return new B();
    }
}


This doesn't actually enforce anything, so it's mostly a lie, just like all comments are, but at least one's IDE might be able to make sense of it, and give code assistance and autocomplete suggestions when writing one's code:



See how it offers both the method from an A and a B there.

Still: I really dislike these annotations as they're really just comments, and especially in this case they're lying: f can actually return anything it likes.

Right... must dash. I am supposed to be down @ my ex's place hanging out with my son in 10min. I'll press "send" and proof-read this later ;-)

Righto.

--
Adam

Wednesday, 8 February 2017

PHP: more array dereferencing weirdness

G'day:
(Firstly: I hasten to add that the "weirdness" is "weird to me". It might be completely predictable to everyone else in the PHP world).

This is a follow-on from my earlier article "PHP: accessing an undefined array element yields a notice. Or does it? WTF, PHP?". Now before I start: Kalle Sommer Nielsen has come back to me in a comment on that article and mentions it's expected behaviour, and explained it... but I don't quite follow yet. I'm hoping some ensuing experimentation will cast some light on the scene.

My next variation of this has me scratching my head more. Well: it's demonstrating the same thing, but perhaps more clearly.

<?php

echo PHP_EOL . "Test with array" . PHP_EOL;

$a = [1, 2, 3];
var_dump($a[0]);
var_dump($a[0]['id']);
var_dump($a[0][0]['id']);
var_dump($a[0]['id'][0]);

echo PHP_EOL . "Test with int" . PHP_EOL;
$i = 1;
var_dump($i['id']);
var_dump($i[0]['id']);
var_dump($i['id'][0]);

echo PHP_EOL;

echo PHP_EOL . "Test with string" . PHP_EOL;
$s = '1';
var_dump($s['id']);
var_dump($s[0]['id']);
var_dump($s['id'][0]);

This yields:


C:\temp>php test.php

Test with array
C:\temp\test.php:6:
int(1)
C:\temp\test.php:7:
NULL
C:\temp\test.php:8:
NULL
C:\temp\test.php:9:
NULL

Test with int
C:\temp\test.php:13:
NULL
C:\temp\test.php:14:
NULL
C:\temp\test.php:15:
NULL


Test with string
PHP Warning:  Illegal string offset 'id' in C:\temp\test.php on line 21
PHP Stack trace:
PHP   1. {main}() C:\temp\test.php:0

Warning: Illegal string offset 'id' in C:\temp\test.php on line 21

Call Stack:
    0.0002     356384   1. {main}() C:\temp\test.php:0

C:\temp\test.php:21:
string(1) "1"
PHP Warning:  Illegal string offset 'id' in C:\temp\test.php on line 22
PHP Stack trace:
PHP   1. {main}() C:\temp\test.php:0

Warning: Illegal string offset 'id' in C:\temp\test.php on line 22

Call Stack:
    0.0002     356384   1. {main}() C:\temp\test.php:0

C:\temp\test.php:22:
string(1) "1"
PHP Warning:  Illegal string offset 'id' in C:\temp\test.php on line 23
PHP Stack trace:
PHP   1. {main}() C:\temp\test.php:0

Warning: Illegal string offset 'id' in C:\temp\test.php on line 23

Call Stack:
    0.0002     356384   1. {main}() C:\temp\test.php:0

C:\temp\test.php:23:
string(1) "1"

C:\temp>

OK, so here I factor-out the top-level array from the mix. That test was always referring to the first element of the array, which is just the integer 1, so we might as well just test with that. Needless to say, there's the same results as with the first array element. But I still don't get it. What is using array notation to dereference an integer supposedly doing? It's a non-sensical operation, as far as I understand things. And if it is a nonsensical operation: it should error.

For completeness I also try a variation with a string "1". This does error like I'd expect. Well: it gives a warning and then returns an inappropriate (IMO) result anyhow. This is probably PHP doing it's combo of telling me I did it wrong but then going ahead and guessing what I meant anyhow. Grrr.

I'm gonna mess around with de-referencing integers some more when I get a moment, and try to work out WTF.

Righto.

--
Adam

Monday, 30 January 2017

PHP: accessing an undefined array element yields a notice. Or does it? WTF, PHP?

G'day:
So we encountered this today:

<?php
$array = [1, 2, 3];

var_dump($array['id']); // this yields a warning
var_dump($array[0]['id']); // none of these do
var_dump($array[0][0]['id']);
var_dump($array[0]['id'][0]);

As indicated, this yields results:

C:\temp>php shite.php
PHP Notice:  Undefined index: id in C:\temp\shite.php on line 4
PHP Stack trace:
PHP   1. {main}() C:\temp\shite.php:0

Notice: Undefined index: id in C:\temp\shite.php on line 4

Call Stack:
    0.0002     353472   1. {main}() C:\temp\shite.php:0

C:\temp\shite.php:4:
NULL
C:\temp\shite.php:5:
NULL
C:\temp\shite.php:6:
NULL
C:\temp\shite.php:7:
NULL

C:\temp>

I think this behaviour is bloody daft. Accessing something that doesn't exist should be more than a notice, and it shouldn't then go ahead and return "null" anyhow. And giving a notice and returning null is possibly the worst way of combining possible actions here. What am I supposed to do with that? Other than bury my head in my hands a weep.

Now this is partially documented on the Arrays page of the docs:

Note:
Attempting to access an array key which has not been defined is the same as accessing any other undefined variable: an E_NOTICE-level error message will be issued, and the result will be NULL.
Note:
Array dereferencing a scalar value which is not a string silently yields NULL, i.e. without issuing an error message.

That's shit, but... well: it's not that surprising (I mean... it's PHP, right?). But how come

$array['id']

incurs the warning, but this doesn't:

$array[0]['id']

This just seems to make fairly unhelpful behaviour just that much less helpful.

I'm sure there's a reason for this. Not a good one, I hasten to add, but no doubt there's a reason.

Anyone know what it is?

Update:

Thanks to Dave (comment below) for pointing out there's a bug been raised for this: 68110.


Righto.

--
Adam

Saturday, 14 January 2017

Incredibly: a reader asks me for help with PHP & Nginx

G'day:
Jesus fuck, you lot. Don't ask me shit about systems support / server application config / all that godawful shit that should be consigned to the Systems Support Team (sorry to my mates in this role, but I fucking hate it, and became a dev so I didn't have to do it any more).

Right so ages ago I wrote an article "PHP: getting PHP 5 and PHP 7 running side by side on the same machine (a better way)", and someone recently asked me how do do the same on Nginx. Well it's "slow news night" here in my life: I'm just at the pub in Galway passing time by getting pissed on Guinness (I'm on me fifth pint, and even I can tell the writing here is reflecting that) and writing blog articles, so I had a look at it. It's pretty easy, as it turns out.

Firstly: I am no expert on Nginx. I don't like web servers. I try not to ever have to use them. I know enough about IIS to know it's a pain in the arse but I can do what I need if I have to; and I know enough about Apache to get things working. I don't even know why Nginx exists. I'm guessing the reason is "to annoy Adam, cos it's just one more bloody thing he'll need to know about at some point". Sigh. I got PHP working on Nginx once before ("PHP: getting my dev environment running on Nginx instead of Apache"), and other than that have converted some rewrite rules from Apache to Nginx (man: does Nginx suck compared to Apache for those!).

This is a different laptop from the one I did that other exercise on, so whilst I had Nginx on here (for the rewrite exercise, which was work-related and this is my work laptop), I decided to start from scratch. I deleted what I had installed (where for Nginx "installed" means "unzipped").

Here's what I did:
  1. grabbed the latest Windows Nginx download from their site. I just googled "nginx windows download" to find that.
  2. Unzipped it. Relocated it to my apps dir: c:\apps\nginx
  3. Stopped Apache (which listens on port 80), and instead started Nginx (just run nginx.exe from the dir above). By default it listens on localhost and port 80.
  4. Browse to http://localhost/ and verified the Nginx default index.html page was served. OK, so it works as a baseline. Always test the baseline before proceeding with any customisations of things.
  5. Set-up a coupla test hosts in my hosts file (C:\windows\system32\drivers\etc\hosts... make sure to start yer text editor as an admin, otherwise you won't be able to save it):
    127.0.0.1 php5.nginx.local
    127.0.0.1 php7.nginx.local

    I foresaw that to run both PHP5 and PHP7, Nginx is gonna need to differentiate between which one wants to use, and using different host names seemed to make sense and be easy.
  6. I edited my nginx.conf file (in the conf subdir of the one above) to know about these two hosts:
    http {
        # [...]
    
        server {
            listen       8800;
            server_name  php5.nginx.local;
            root   c:/src/php/php.local/www;
    
            # [...]
        }
    
        server {
            listen       8800;
            server_name  php7.nginx.local;
            root   c:/src/php/php.local/www;
    
            # [...]
        }
    }
    

    Where I've elided stuff, it's the same as it was before. The chief considerations here are:
    • having a server for both PHP5 and PHP7;
    • having them listen on each of those two new hosts I set up;
    • setting the root to be where my PHP code is. This is the same for both in this case, as I want to serve exactly the same code via both 5 and 7;
    • oh and I'm listening on port 8800 as I don't want Nginx to interfere with my normal Apache install (I'll be sticking with Apache after this experiment, thanks).
    Note that this config will still not serve PHP, but it'll at least run.
  7. When one runs Nginx from the console it hogs the prompt, so to stop it one needs to run another console and call nginx -s stop. We need to do this to test the config changes. So I did that, and used the other console to start it again.
  8. I browse to each of http://php5.nginx.local:8800 and http://php7.nginx.local:8800 to test they were working. They were running, but giving a 403 cos I didn't have an index.html in that directory, nor did I have directory browsing switched on (which for my test code I do, as it makes finding stuff easier).
  9. I told Nginx to allow directory browsing for each of the server configs:
    location / {
        index  index.html index.htm;
        autoindex on;
    }
    

    Do not do this in production. Well: don't do anything I say in production.
  10. I cycled Nginx again and tested both hosts:

    Index of /


    ../
    code-coverage-reports/                             15-Nov-2016 13:33                   -
    community/                                         30-Jun-2016 11:47                   -
    experiment/                                        15-Nov-2016 08:47                   -
    library/                                           30-Jun-2016 11:47                   -
    stackoverflow/                                     30-Jun-2016 11:47                   -
    deleteme.php                                       30-Jun-2016 11:47                 233
    gdayWorld.html                                     30-Jun-2016 11:47                  11
    gdayWorld.php                                      30-Jun-2016 11:47                  50
    phpinfo.php                                        30-Jun-2016 11:47                  21
    utf8.html                                          21-Dec-2016 08:21                 133
    


    So that's all good except for me not having deleted that file that perhaps I meant to, a while back ;-)
  11. Next I just followed the instructions from me other blog article, getting PHP to listen out to traffic coming in from Nginx (the code below is in a batch file):
    start C:\bin\RunHiddenConsole.exe C:\apps\php\5\5\php-cgi.exe -b 127.0.0.1:8550 start C:\bin\RunHiddenConsole.exe C:\apps\php\7\1\php-cgi.exe -b 127.0.0.1:8710
    Note that each of them is listening on a different port.
  12. And then tell Nginx to pass PHP requests across to PHP:
    server {
        # [...]
        location ~ \.php$ {
            fastcgi_pass   127.0.0.1:8710;
            fastcgi_index  index.php;
            fastcgi_param  SCRIPT_FILENAME  $document_root$fastcgi_script_name;
            include        fastcgi_params;
        }
        # [...]
    }
    

    That's the one for PHP7, but the PHP5 one is the same except for the port.
  13. I restart Nginx again, and this time hit phpinfo.php on each host:
    {i'd show you proof, but BlogSpot is refusing to let me put two images inline in an <ol> list, it seems. Trust me... it works).
  14. Hurrah!
  15. The last thing I tried to do is to wrap my batch file that starts Nginx into a service, using RunAsService.exe, but this is my work laptop and it's locked-down too tight for me to run that. But... well... the batch file works.
So, anyway... that's it. That's what one needs to do to get Nginx serving two different versions of PHP. My will to live has been sapped by even having to think about this sort of shit, so I'm going back to my Guinness (two pint further ahead than I was when I started this exercise).

Sorry this one is a bit scrappy but... well... I'm drunk.

Righto.

--
Adam

PHP: 2000 words on elseif (and PHPMD)

G'day:
This is another one that just lives up to the definitive notion of a blog: a web log; a log of what I'm doing. But, like... web-based. A blog. What I basically mean is that I'm not sure I'm showing any insight here (look: I basically know I'm not), but it's what I looked at today, so it's what I'm writing about today.

We're trying to lift our game when it comes to our code "Cleanliness" (in the RC Martin "Clean Code" sense), and we've started more rigorously using some code review tools recently, such as PHPCS and PHPMD. They're both pretty good, and I recommend them to anyone using PHP. Almost everyone probably already is: I'm always late to these games.

One PHPMD rule that tends to irk me is the one that declares "[one's code] uses an else expression. Else is never necessary and you can simplify the code to work without else". Now: generally this is true. One can usually refactor one's code to stick it in a helper method and early return in the if logic, then just the rest of the code after the if block is what happens in the else case. This is good because it helps one's code be "closer to the left-edge"; ie: it's not indented because it's not in a logic block any more. The more one needs to indent one's code: the more it probably smells. It's got to the stage with our code now that a double nesting of logic-control statements almost gets the same reaction from code reviewers as when they see a comment in our code; "hey, gather round everyone: Adam's made a comment. Let's look at it!" (Adam shrinks into the carpet and starts getting his excuses ready. Later he refactors the code to be clearer, and the comment goes; the code reviewers are happy). The problem for me is that I'm not clever enough sometimes, and it takes me a coupla takes before I see how to get rid of a wayward else. At least now I usually pick it up and work it out during my "self code review" phase, and sort it out. And PHPMD helps with this.

Today I was reviewing a colleague's code, and noted they had not an else but an elseif. And I triumphantly(*) pinged them with a comment reminding them to run PHPMD before submitting their pull request.

(*) because I'm petty.

It turns out that they had run PHPMD, and it hadn't pulled 'em up. Initially (again: cos I'm petty) I didn't believe them and knocked together my own SSCCE and looked at it.

This demonstrates the problem:

function takeAChance(){
    $firstChance = (bool) rand(0,1);

    if ($firstChance) {
        echo "firstChance was true";
    } else {
        echo "firstChance was false";
    }
}

takeAChance();

And now running PHPMD over it:

C:\src\php\php.local\src\phpmd>phpmd else_block.php text phpmd.xml
C:\src\php\php.local\src\phpmd\else_block.php:8 The method takeAChance uses an else expression.
Else is never necessary and you can simplify the code to work without else.

C:\src\php\php.local\src\phpmd>


OK, fair enough. Oh btw, the solution to this is to early return:

function takeAChance(){
    $firstChance = (bool) rand(0,1);

    if ($firstChance) {
        echo "firstChance was true";
        return;
    }
    echo "firstChance was false";
}

(note: yeah yeah, in the real world I'd not be echoing from a function. I get that. Not the point here)

OK, now here's an SSCCE of the sort of thing we had in this pull request:

function takeAChance(){
    $firstChance = (bool) rand(0,1);
    $secondChance = (bool) rand(0,1);

    if ($firstChance) {
        echo "firstChance was true; don't care what secondChance was";
    }elseif($secondChance){
        echo "well at least secondChance was true";
    }
}

And running PHPMD:

C:\src\php\php.local\src\phpmd>phpmd elseif_block.php text phpmd.xml

C:\src\php\php.local\src\phpmd>

No problems. Grrr. This does not make sense to me. If we add an actual else in there:

function takeAChance(){
    $firstChance = (bool) rand(0,1);
    $secondChance = (bool) rand(0,1);

    if ($firstChance) {
        echo "firstChance was true; don't care what secondChance was";
    }elseif($secondChance){
        echo "well at least secondChance was true";
    }else{
        echo "both were false";
    }
}

And PHPMD says:
C:\src\php\php.local\src\phpmd>phpmd if_elseif_else_block.php text phpmd.xml
C:\src\php\php.local\src\phpmd\if_elseif_else_block.php:11
The method takeAChance uses an else expression.
Else is never necessary and you can simplify the code to work without else.

C:\src\php\php.local\src\phpmd>


And, yeah, it's true... but the elseif seems to me to be as much an issue here as an else. We can short-circuit both easily enough with early returns:

function takeAChance(){
    $firstChance = (bool) rand(0,1);
    $secondChance = (bool) rand(0,1);

    if ($firstChance) {
        echo "firstChance was true; don't care what secondChance was";
        return;
    }
    if($secondChance){
        echo "well at least secondChance was true";
        return;
    }
    echo "both were false";
}

Equally, to me a PHP elseif (note this is an actual command, it's not a concatenation of else if like one might expect), really is, functionally, a "single-statement else", just with the statement itself being an if statement. What I mean is remember that the if / elseif / else statement doesn't need to use blocks (ie: a bunch of statements within braces), it could be a single statement:

function takeAChance() {
    $firstChance = (bool) rand(0,1);

    if ($firstChance)
        echo "firstChance was true";
    else
        echo "firstChance was false";

}

(what an abomination: almost looks like Python. Bleah).

Anyway... so yeah, one doesn't need to use braces if there's only one statement to execute based on the result of the condition. Don't do this. But it's possible.

What I'm getting at is this:

if ($firstChance) {
    echo "firstChance was true; don't care what secondChance was";
}elseif($secondChance){
    echo "well at least secondChance was true";
}

Is basically like this:

if ($firstChance) {
    echo "firstChance was true; don't care what secondChance was";
}else{
    if($secondChance){
        echo "well at least secondChance was true";
    }
}

Just without the braces on the else:

if ($firstChance) {
    echo "firstChance was true; don't care what secondChance was";
}else
    if ($secondChance){
        echo "well at least secondChance was true";
    }

Compare this again to the elseif version:

if ($firstChance) {
    echo "firstChance was true; don't care what secondChance was";
}elseif ($secondChance){
    echo "well at least secondChance was true";
}

The only difference there is the whitespace between the else and the if. And logically there is no difference, as far as I can tell.

I tested all variations I could think of of elseifs, elses, braces, no braces, and the like (see the samples on GitHub), and ran PHPMD over them:

C:\src\php\php.local\src\phpmd\samples>phpmd . text phpmd.xml
C:\src\php\php.local\src\phpmd\samples\else_block.php:8 The method takeAChance uses an else expression. Else is never necessary and you can simplify the code to work without else.
C:\src\php\php.local\src\phpmd\samples\else_if_block.php:9      The method takeAChance uses an else expression. Else is never necessary and you can simplify the code to work without else.
C:\src\php\php.local\src\phpmd\samples\if_elseif_else_block.php:11      The method takeAChance uses an else expression. Else is never necessary and you can simplify the code to work without else.

C:\src\php\php.local\src\phpmd\samples>

Hmmmm... it's not a case of else clauses being a problem it seems. It's a problem with else blocks being a problem.

It's complaining about this:

if ($firstChance) {
    echo "firstChance was true";
} else {
    echo "firstChance was false";
}


But not about this:

if ($firstChance) {
    echo "firstChance was true";
} else 
    echo "firstChance was false";
 


The only difference being the lack of braces on the second one.

Now I guess this makes an element of sense. The "untidiness" of large logic blocks is that they are more of a pain to navigate across when maintaining the code. With an if/else statement one needs to pay attention to where the if is, where the else is, which code is in which clause, and where the whole thing ends. This is not a problem if one keeps the code clean so each block has only 1-2 statements in it, but we've all seen if/elses that go on for miles and miles worth of scrolling down (and, it's worth pointing out: that is shit code if it's written like that. Don't do it). It's exacerbated by further nested logic within that. Fair enough.

But with a non-block else case - which intrinsically can only be one statement - the problem of unclean block sizes can simply not occur. Also fair enough.

But in that case, the warning should not be able else, intrinsically; it should be about the size of the blocks. And the size rule should probably apply to the if block too! If a single-statement else is OK, then a single-statement else block should be OK too.

I think the solution to this is that any else clause should be flagged: be it a single statement, or be it a block. So intrinsically an elseif should be flagged too. I'll raise this with the PHPMD bods.

Whilst on this: they should probably swing the other way and actually flag any instances of single-statement logic clauses: ifs and loops should always have braces. That's a widely accepted truism in curly-brace-language programming. I realise PHPCS does pick up on these (I only realise it just now cos I just tested it), but if PHPMD is busying itself with this sort of thing too; it should do a thorough job. In my view brace-less control logic is mess.


Lastly on this, I'm slightly irked by this, from the PHPMD Clean Code Rules page:

ElseExpression

[...]

An if expression with an else branch is never necessary.

It is terribly pedantic of me, but… it's not an expression. PHP flow control statements are…statements, not expressions. An expression is a specific thing: it is some code that represents a value. In PHP if/else statement does not represent a value. One cannot do this:
$message = if ($firstChance) {
    "firstChance was true";
} else {
    "firstChance was false";
}

If you cannot have some code on the right-hand side of an assignment (or anywhere else a value might be expected), then it's not an expression. A counterpoint to this, one can do this in Ruby:

message = if firstChance
    "firstChance was true";
else
    "firstChance was false";
end

puts message

This is because "everything is an object" in Ruby, I guess: even statements. In this case it seems the value of the if/else statement is that of the last statement in the relevant clause that gets executed by the logic. I'm buggered if I know whether Ruby best practices would encourage that (and by that I mean: I really actually don't know!), but I dunno that I'd be doing that sort of thing. Still: it demonstrates an if statement that is also an expression. Which a PHP one is not.

This sort of thing wouldn't bug me if it was in a casual conversation, but in an official implementation, I think it's a bit… ah… well let's saynit could just be done properly. But then again I guess this is PHP. And "doing it right" doesn't seem to be something PHP concerns itself with too much. It seems to be more "just get it done".

All this aside, I think PHPMD is a bloody handy tool, and whether or not I agree with the minutiae of some of its analysis or its turn of phrase is minor. It's helped my code a lot in the last coupla weeks. And I think our code in general is better of for using tools like this.

So… yeah. That was what I was doing yesterday (it was Fri when I started, it's now Sat evening and I'm in the pub). not scintillating stuff, but the examination of what was going on and the general experimentation was useful (to me, anyhow), I think. And it's enabled me to write a 2000-word blog article which is basically about elseif expressionsclauses.

Righto.

--
Adam

Saturday, 7 January 2017

PHP: looking into usage of the ::class constant

G'day:
There's not much to this article. It's arisen from the fact there's a paucity of docs on this subject on the PHP website, and they're quite hard to google for (given Google still insists on ignoring punctuation in searches, for some daftly ignorant reason. So searching for "php ::class" just gets one results for "php class". Not the same thing. Similarly seaching for "php ::class constant" doesn't help.

For the record, the two relevant docs pages are:


Update:

Kalle Sommer Nielsen has been in touch via Twitter and has pointed me to the original RFC for ::class, too: Request for Comments: Class Name Resolution As Scalar Via "class" Keyword. Now to be honest, the content of that RFC would make for better documentation than the current docs furnish. Obviously it'd need a slight tweak to reword some of the questions as statements, and if there were any changes between RFC and what was adopted, those should be included too. Kalle also said he's tweaked the docs slightly too, which is cool. Nice work fella!

I include those there as much for me to be able to easily re-find them as anything else.

In truth there's not much to this ::class construct. I just wish the docs were easier to locate. When discussing the feature, it's nice to be able to point to some docs. Or now: a blog article ;-)

So what does it do?

All classes have a built-in constant (::class) that contains a string that is the class's name. Same as what the get_class function will return for an object, but it works on the actual class too (because it's a class constant). Here are some examples:

<?php

namespace me\adamcameron\cc;

use com\example\other\SomeClassToAlias as AliasedClass;
use com\example\other\SomeClassToAlias;
use com\example\other\SomeOtherClass;

require_once realpath(__DIR__ . '/../vendor/autoload.php');

echo "Using ::class" . PHP_EOL;
printf("SomeClass: %s%s", SomeClass::class, PHP_EOL);
printf("SomeOtherClass: %s%s", SomeOtherClass::class, PHP_EOL);
printf("SomeClassToAlias as AliasedClass: %s%s", AliasedClass::class, PHP_EOL);
printf("SomeClassToAlias: %s%s", SomeClassToAlias::class, PHP_EOL);
printf("PHPUnit_Framework_Exception: %s%s", \PHPUnit_Framework_Exception::class, PHP_EOL);

$someClass = new SomeClass();
$someOtherClass = new SomeOtherClass();
$someAliasedClass = new AliasedClass();
$someClassToAlias = new SomeClassToAlias();
$phpunitFrameworkException = new \PHPUnit_Framework_Exception();

echo PHP_EOL . "Using get_class() on instance" . PHP_EOL;
printf("SomeClass: %s%s", get_class($someClass), PHP_EOL);
printf("SomeOtherClass: %s%s", get_class($someOtherClass), PHP_EOL);
printf("AliasedClass: %s%s", get_class($someAliasedClass), PHP_EOL);
printf("SomeClassToAlias: %s%s", get_class($someClassToAlias), PHP_EOL);
printf("PHPUnit_Framework_Exception: %s%s", get_class($phpunitFrameworkException), PHP_EOL);

In the example all the classes in the me\adamcameron or com\example namespaces are just empty classes, eg:

<?php

namespace me\adamcameron\cc;

class SomeClass {}

I've also included a PHPUnit class there as PHPUnit is a lib I'm aware of that doesn't use namespaces (dunno why), and wanted to see how that behaved, in case there were quirks: seemingly not.

Here's the output:

C:\src\php\php.local\src\oo\classConstant\src>php testWithNamespace.php
Using ::class
SomeClass: me\adamcameron\cc\SomeClass
SomeOtherClass: com\example\other\SomeOtherClass
SomeClassToAlias as AliasedClass: com\example\other\SomeClassToAlias
SomeClassToAlias: com\example\other\SomeClassToAlias
PHPUnit_Framework_Exception: PHPUnit_Framework_Exception

Using get_class() on instance
SomeClass: me\adamcameron\cc\SomeClass
SomeOtherClass: com\example\other\SomeOtherClass
AliasedClass: com\example\other\SomeClassToAlias
SomeClassToAlias: com\example\other\SomeClassToAlias
PHPUnit_Framework_Exception: PHPUnit_Framework_Exception

C:\src\php\php.local\src\oo\classConstant\src>

So you see there are no surprises really: ::class just returns the fully-qualified path of the class concerned. One thing to not is that it does not pay any attention to aliasing, whether on the class itself or on an object of that aliased class. I guess an alias is just for clarity in source code, rather than any sort of "renaming" exercise.

Where does one use ::class constants?

Well my primary usage is when mocking, using PHPUnit. Instead of this:

$mockedThing = $this->getMockBuilder('path\to\class\being\mocked\MockThisClass')
    ->disableOriginalConstructor()
    ->setMethod(['someMethod'])
    ->getMock();

We just have this:

$mockedThing = $this->getMockBuilder(MockThisClass::class)
    ->disableOriginalConstructor()
    ->setMethod(['someMethod'])
    ->getMock();

(and PHPStorm even includes the use statement for me, automatically):

use path\to\class\being\mocked\MockThisClass;

It's good to keep all the class pathing references together at the top of the file.

One flaw in this constant is this behaviour:
<?php

require_once realpath(__DIR__ . '/../vendor/autoload.php');

echo "Using ::class" . PHP_EOL;
printf("SomeNonExistentClass: %s%s", SomeNonExistentClass::class, PHP_EOL);

Any sensible person would probably want an error to be thrown there. But... no. PHP does this:

C:\src\php\php.local\src\oo\classConstant\src>php testWithNonExistentClass.php
Using ::class
SomeNonExistentClass: SomeNonExistentClass

C:\src\php\php.local\src\oo\classConstant\src>

Groan. Why does PHP insist on being "helpful" like this. The class doesn't exist! Just say that. FFS.

Oh well... that sort of thing is almost to be expected of PHP, I guess. Sigh. Well this ::class constant mostly a well-implemented, if not glamourous, small feature in PHP. And now I seem to have documented ::class more than PHP itself has. Heh.

Speaking of documentation, I can't help but think this ::class constant should also be mentioned on a coupla other pages in the PHP docs:

It'd be a good fit for both of those pages, plus that's where Google lands you if you search for it.

That's it. I have to dash to go visit my son for a few hours. Thanks to my colleague Carlos for pointing this whole thing out to me, btw. Both the ::class constant construct itself, but also the observation to be made about aliases. Nice one, fella.

Righto.

--
Adam

PS: apologies for the blatant SEO keyword stuffing of ::class constant in this article ("oops... I did it again"). It was by design.

Wednesday, 4 January 2017

PHP: looking at the Chain of Responsibility pattern

G'day:
We're working on a proof of concept for some stuff at work at the moment, and I've been put on a code review of the first round of the code. Part of the code is dealing with conditionally getting some data from cache, or if it ain't there, getting it from the DB instead: a fairly common trope. Normally I'd fall back on the decorator pattern for this, indeed I've written about it in the past ("Using a decorator pattern to reduce inappropriate code complexity"). I didn't see this in play in this first tranche of code, so suggested its use. The authoring dev suggested they felt that the decorator pattern was not a good fit for this, so they were using the Chain of Responsibility pattern instead. Another of our devs - when prodded - tended to agree.

I gave this some thought, read-up on the Chain of Responsibility pattern (and some more on the Decorator Pattern), and wasn't quite seeing it.

Just to back up a bit... if yer like me and kinda knew about the Chain of Responsibility pattern, but wasn't intimately familiar with it: read that Wikipedia link above, and do some googling of yer own. But in short it's characterised by having a task that needs doing, and having a sequence (a chain) of agents that possibly can (or possibly cannot) resolve all or part of the task. The process is that the task is passed to the chain of agents, and each in turn decides for itself whether it can fulfil the task in whole or in part. If an agent can resolve the whole task it does so: returning the result. If a given agent cannot fulfil the whole task (either it cannot fulfil it at all, or can only do part of it), it does its work, then passes responsibility on to the next task in the chain. Each agent knows two things: how to do its part of the job, and that it can pass work on to the "next" agent. It does not (and should not!) know what the next agent does, nor should it rely on what the next agent does. It simply does its job and returns it or passes on incomplete work to the next agent in the chain. I can't be arsed with a diagram: go google one.

Now: for two thirds of our challenge, I think the Chain of Responsibility pattern is a good fit: for the "Is it in the cache? Yes [/No. Is it in the DB? Yes/No]" bit. No qualms there. However it sticks in my craw once we get to the step after getting from the DB: putting it in the cache for next time. That step does not belong in the chain as it's not performing the same sort of task as the other two: getting the data. Second to that it's hitting the cache before and after the DB step, which makes it more seem like a decoration to me (of sorts... I might get back to the "of sorts..." bit later). Even if one decides "get from cache" is one agent's job, and "get from DB, oh and put it in the cache" is another agent's job, that latter agent is doing a mishmash of "unrelated" work, and seems to be a bit tightly-coupled to me, as well as having a wodge of code (that's a technical term) doing two things, rather than two wodges of code doing one thing each.

That said, I don't like being so emphatic about "that's wrong" without being happy about my understanding of things, so figured I should try to make it work in a way that sits well with me in a proof of concept. That and it seemed like good material for a blog article.

In my proof of concept I am fetching some Person data, for example:

$person = $personService->getById(1);

The PersonService defers to a "RetrievalHandler" for this:

class PersonService {

    private $retrievalHandler;

    public function __construct(PersonRetrievalHandler $retrievalHandler) {
        $this->retrievalHandler = $retrievalHandler;
    }

    public function getById($id) : Person {
        $record = $this->retrievalHandler->getById($id);

        if (is_array($record)){
            $person = new Person($record['firstName'], $record['lastName']);

            return $person;
        }
        return new Person();
    }
}


This just finds and returns a Person object, implement thus:

class Person {

    public $firstName;
    public $lastName;

    public function __construct($firstName=null, $lastName=null) {
        $this->firstName = $firstName;
        $this->lastName = $lastName;
    }
}

As noted, the service uses a the RetrievalHandler to retreive stuff. A handler is implementation of this:

abstract class PersonRetrievalHandler {

    protected $nextHandler;
    protected $logger;

    public function __construct() {
        $this->nextHandler = new class {
            function getById(){
                return null;
            }
        };
    }

    public abstract function getById(int $id) : array;

    public function setNextHandler(PersonRetrievalHandler $nextHandler) {
        $this->nextHandler = $nextHandler;
    }
}

All it's required to do here is know how to use getById, and also it can setNextHandler. I'll come back to the code in the constructor later. Don't worry about that for now.

I've got two implementations of this. One for getting the thing from a caching service:

class CachedPersonHandler extends PersonRetrievalHandler {

    private $cacheService;

    public function __construct(CacheService $cacheService){
        parent::__construct();
        $this->cacheService = $cacheService;
    }

    public function getById(int $id) : array {
        if ($this->cacheService->exists($id)) {
            return $cachedPerson = $this->cacheService->get($id);
        }

        return $this->nextHandler->getById($id);
    }
}


And one for getting it from a database repository:

class DatabasePersonHandler extends PersonRetrievalHandler {

    private $repository;

    public function __construct(PersonRepository $repository) {
        parent::__construct();
        $this->repository = $repository;
    }

    public function getById($id) : array {
        $record = $this->repository->getById($id);
        if (count($record)) {
            return $record;
        }

        return $this->nextHandler->getById($id);
    }
}


Note this is very woolly and not production-sound code. It's just to demonstrate the concept.

Important to note in both these situations is the agents don't themselves do any of the work; they just try to get the work done by something else, and return it if done, or pass the task to the next agent if the current agent itself couldn't fulfil the requirement.

I've created a stub caching service and repository:

class CacheService {

    private $cache = [];
    private $logger;

    public function __construct(LoggingService $logger)
    {
        $this->logger = $logger;
    }

    public function get($key) {
        $this->logger->info("Cache hit for $key");
        return $this->cache[$key];
    }

    public function exists(string $key) : bool {
        $exists = array_key_exists($key, $this->cache);

        $this->logger->info(sprintf("Cache %s for %d", $exists ? 'hit' : 'miss', $key));

        return $exists;
    }

    public function put(string $key, $value) {
        $this->cache[$key] = $value;
    }
}


class PersonRepository {

    private $data = [
        [],
        ['firstName' => 'Tui', 'lastName' => 'Tahi'],
        ['firstName' => 'Rewi', 'lastName' => 'Rua'],
        ['firstName' => 'Tama', 'lastName' => 'Toru'],
        ['firstName' => 'Whina', 'lastName' => 'Wha']
    ];
    private $logger;

    public function __construct(LoggingService $logger)
    {
        $this->logger = $logger;
    }

    public function getById($id) {
        if (array_key_exists($id, $this->data)){
            $this->logger->info("Database hit for $id");
            return $this->data[$id];
        }
        $this->logger->info("Database miss for $id");
        return [];
    }
}


These also log activity in a very naïve way:

class LoggingService {
    public function info(string $message) {
        echo $message . PHP_EOL;
    }
}


All of this is stitched together in a factory, as there's a lot of moving parts:

class PersonServiceFactory {

    public static function getPersonService()
    {
        $loggingService = new LoggingService();
        $cacheService = new CacheService($loggingService);
        $personRepository = new PersonRepository($loggingService);

        $cachedHandler = new CachedPersonHandler($cacheService);
        $databaseHandler = new DatabasePersonHandler($personRepository);

        $cachedHandler->setNextHandler($databaseHandler);

        $personService = new PersonService($cachedHandler);

        return $personService;
    }

}


Note how the only knowledge each agent has of another is what a given agent's next handler might be. Speaking of next handlers. let's go back to that code in the PersonRetrievalHandler's constructor:

public function __construct() {
    $this->nextHandler = new class {
        function getById(){
            return null;
        }
    };
}


You'll remember each getById implementation will either fulfil the task and return the result, or it'll defer to the next agent in the chain. This is just a mechanism to make this "work" even if the last agent in the chain calls return $this->nextHandler->getById($id). The base class will take the call and just return null. I was semi-pleased to be able to use a class literal here (or anonymous class as PHP wants to call them). So this means I do not need to set the next handler on the $databaseHandler: it'll just use the inbuilt one if it cannot fulfil the requirement itself.

Right so we can now run this to see what happens. Remember the expectation is that the requested data will be fetched from the cache as a priority, but if not found: get it from the DB. Note that I am not doing anything here about putting the data into the cache if it wasn't there in the first place. I'll get to that. Anyway, here's some tests:

$personService = PersonServiceFactory::getPersonService();
$person = $personService->getById(1);
var_dump($person);
$person = $personService->getById(1);
var_dump($person);
$person = $personService->getById(4);
var_dump($person);
$person = $personService->getById(5);
var_dump($person);

Here we're:
  • getting the same person twice to start with: ostensibly this'd demonstrate a cache-miss then a cache-hit;
  • getting a different person from the first (back to a cache miss);
  • getting a person who is neither in the cache nor the DB.

Results:

C:\src\php\php.local\src\chainOfResponsibility\public>php manualTest.php
Cache miss for 1
Database hit for 1
object(me\adamcameron\cor\model\Person)#9 (2) {
  ["firstName"]=>
  string(3) "Tui"
  ["lastName"]=>
  string(4) "Tahi"
}
Cache miss for 1
Database hit for 1
object(me\adamcameron\cor\model\Person)#10 (2) {
  ["firstName"]=>
  string(3) "Tui"
  ["lastName"]=>
  string(4) "Tahi"
}
Cache miss for 4
Database hit for 4
object(me\adamcameron\cor\model\Person)#9 (2) {
  ["firstName"]=>
  string(5) "Whina"
  ["lastName"]=>
  string(3) "Wha"
}
Cache miss for 5
Database miss for 5
object(me\adamcameron\cor\model\Person)#10 (2) {
  ["firstName"]=>  NULL
  ["lastName"]=>  NULL
}

C:\src\php\php.local\src\chainOfResponsibility\public>

That's all good:
  • the cache never finds anything (although it does look),
  • so passes the job on to the DB. The DB will return the data if found,
  • otherwise it'll fall back to the default agent which just returns null.


So far: so good. But why did I omit the "putting it into the cache" step. Well... using this model, it simply won't work. The rule is that if an agent can resolve the work, then it returns it. Otherwise it defers to the next agent. I figured I could implement a "pass-thru" agent which just caches the result found by the DB agent: job done. But if the DB agent finds the record - read: "it can fulfil the task" - then it doesn't pass the task on to the next agent. It considers the task fulfilled, so just returns its result. So the only time the "put it in cache" agent will be called is if the DB agent doesn't find anything. Precisely the wrong time to cache something. That aside, the agent only passes on the inputs (the $id) to the next agent, so there's no mechanism here for an agent to receive the result from the previous agent even if the result was there to pass.

What we'd need to do here is just use the chain of responsibility to get the data, and perhaps leave it to the PersonService to cache the result it receives. However this is a bit unbalanced in my view, as caching considerations of the same data are being handled at not only two different places in the code, but two different "levels" of the task at hand. That's a bit clumsy in my book.

Next I thought I could adjust the DB handler to unconditionally pass its result on to the next handler, rather than break out of the chain once done. But that's a bit crap as it makes the DB handler behave differently than the other handlers in the chain, which also implies it knows more about what's going on than it should: ie that there's a reason to always pass the result on, rather than return it if it's done the work. Not cool.

Update:

My colleague (a different one) suggested a third open: roll the cache-put into the CachedPersonHandler:

class CachedPersonHandler extends PersonRetrievalHandler {

    private $cacheService;

    public function __construct(CacheService $cacheService){
        parent::__construct();
        $this->cacheService = $cacheService;
    }

    public function getById(int $id) : array {
        if ($this->cacheService->exists($id)) {
            return $cachedPerson = $this->cacheService->get($id);
        }

        $person = $this->nextHandler->getById($id);
        $this->cacheService->put($id, $person);
        return $person;
    }
}

Well: indeed. Now it's basically a decorator: you've decorated a DB call with some caching. Contrast it with this from my earlier article on the Decorator Pattern:

class CachedUserRepository implements UserRepositoryInterface {

    private $repository;
    private $cacheService;

    public function __construct(UserRepositoryInterface $repository, CacheServiceInterface $cacheService) {
        $this->repository = $repository;
        $this->cacheService = $cacheService;
    }

    public function getById($id) {
        if ($this->cacheService->isCached($id)) {
            return $this->cacheService->get($id);
        }
        $object = $this->repository->getById($id);

        $this->cacheService->put($id, $object);
        return $object;
    }

}

You've arrived at the same conclusion I started with ;-)

I did try one other tactic. Instead of passing just the inputs to the agents, I had a look at passing the inputs and the outputs through the chain, and also always passing on to the next agent, irrespective of whether a given handler arrived at the result. This was very proof-of-concept-y. It's possibly a legit approach if the Chain of Responsibility was one in that a given agent was only supposed to fulfil part of the solution, relying on all links in the chain to arrive at the complete result. This is not what I'm doing here, but decided to leverage this approach to get the caching working "properly".


Intermission

Go have a pee or get some more popcorn or something.


Now I have an interface for a Handler:

interface Handler {

    public function perform($request, $response);

}


This just reflects a more generic solution than getById. More importantly note that unlike getById, perform takes a request and a response. For our examples the "request" would be the ID. And the response would be any result a given agent was able to come up with.

This is implemented in an abstract way by PersonRetrievalHandler:

abstract class PersonRetrievalHandler implements Handler {

    protected $nextHandler;
    protected $logger;

    public function __construct() {
        $this->nextHandler = $this->getPassThroughHandler();
    }

    public function setNextHandler(PersonRetrievalHandler $nextHandler) {
        $this->nextHandler = $nextHandler;
    }

    private function getPassThroughHandler(){
        return new class  {
            function perform($_, $response){
                return $response;
            }
        };
    }
}

It's still down to the individual handlers to implement the perform method though.

I've also refactored that class literal out into a helper method; and its perform method now returns whatever response it was passed, rather than returning null.

The perform method in CachedPersonHandler is thus:

public function perform($request, $response=null) {
    $id = $request;
    if ($this->cacheService->exists($id)) {
        $response = $this->cacheService->get($id);
    }

    return $this->nextHandler->perform($request, $response);
}


Note that it doesn't return the result itself, it just passes it on to the next agent. Otherwise it's doing the same thing.

And the DatabasePersonHandler has also been slightly refactored:

public function perform($request, $response=null) {
    if (is_null($response)){
        $id = $request;
        $response = $this->repository->getById($id);
    }

    return $this->nextHandler->perform($request, $response);
}


It infers that the cache didn't find anything by it not receiving anything in the response, and only if so does it hit the DB. Otherwise it just passes everything through to the next agent.

Now we have a third handler, CacheablePersonHandler:

class CacheablePersonHandler extends PersonRetrievalHandler {

    private $cacheService;

    public function __construct(CacheService $cacheService){
        parent::__construct();
        $this->cacheService = $cacheService;
    }

    public function perform($request, $response=null) {
        if (!is_null($response)) {
            $id = $request;
            $this->cacheService->put($id, $response);
        }

        return $this->nextHandler->perform($request, $response);
    }
}


This one makes no pretence about finding data, it just takes any received response and caches it, keyed on the request. And passes the response on. At this point remember the default handler will just return whatever response it receives.

We wire all this together in the factory again:

public static function getPersonService()
{
    $loggingService = new LoggingService();
    $cacheService = new CacheService($loggingService);
    $personRepository = new PersonRepository($loggingService);

    $cachedHandler = new CachedPersonHandler($cacheService);
    $databaseHandler = new DatabasePersonHandler($personRepository);
    $cacheableHandler = new CacheablePersonHandler($cacheService);

    $databaseHandler->setNextHandler($cacheableHandler);
    $cachedHandler->setNextHandler($databaseHandler);

    $personService = new PersonService($cachedHandler);

    return $personService;
}


This differs only in that it also creates the CacheablePersonHandler too, and stick it at the end of the chain.

Here's our revised test:

$personService = PersonServiceFactory::getPersonService();

foreach ([1,1,4,5,5] as $id){
    $person = $personService->getById($id);
    var_dump($person);
}


(it's the same, just without the tautological repeated duplication of the same stuff again).

And the result:

C:\src\php\php.local\src\chainOfResponsibility\public>php manualTest.php
Cache miss for 1
Database hit for 1
Cache put for 1
object(me\adamcameron\cor\model\Person)#8 (2) {
  ["firstName"]=>
  string(3) "Tui"
  ["lastName"]=>
  string(4) "Tahi"
}
Cache hit for 1
Cache put for 1
object(me\adamcameron\cor\model\Person)#11 (2) {
  ["firstName"]=>
  string(3) "Tui"
  ["lastName"]=>
  string(4) "Tahi"
}
Cache miss for 4
Database hit for 4
Cache put for 4
object(me\adamcameron\cor\model\Person)#8 (2) {
  ["firstName"]=>
  string(5) "Whina"
  ["lastName"]=>
  string(3) "Wha"
}
Cache miss for 5
Database miss for 5
object(me\adamcameron\cor\model\Person)#11 (2) {
  ["firstName"]=>
  NULL
  ["lastName"]=>
  NULL
}
Cache miss for 5
Database miss for 5
object(me\adamcameron\cor\model\Person)#8 (2) {
  ["firstName"]=>
  NULL
  ["lastName"]=>
  NULL
}

C:\src\php\php.local\src\chainOfResponsibility\public>

So it "works", in that DB hits are now going into the cache.

But it also re-caches cache hits too :-(

Sigh. I could work around this by doing an exists call in the latter handler too:

public function perform($request, $response=null) {
    if (!is_null($response)) {
        $id = $request;
        if (!$this->cacheService->exists($id)) {
            $this->cacheService->put($id, $response);
        }
    }

    return $this->nextHandler->perform($request, $response);
}


But it all seems a bit clumsy to me. And when something seems clumsy to me, I start wondering if this square peg is supposed to be going in this round hole. And my conclusion is generally: "no".

The problem here goes back to the "cache the data" part of this chain doesn't belong in the chain. It's not a congruent task compared to the data-fetch tasks. The "responsibility" in the "Chain of Responsibility" is that the responsibility is "fulfilling the task". The caching agent makes no attempt to fulfil the requirement of the chain, so it doesn't belong in the chain to me.

So where does it belong? Well it belongs as near as possible to when we know the database was hit, and we have the result. So we could put it in the database agent. But then we have this imbalance that the cache-look-up is part of the chain, and the cache-put is within part of the chain. I don't like that asymmetry. Also it means the DB agent is doing two things: getting data and caching it, which seems like it's doing too much to me: it needs to know how to do two different things. We could go down the route of having like a CachedPersonRepository, and that at least removes those two responsibilities from the agent, but it does just move them: now the repo needs to know about the DB and the caching.

What I'd end up doing is using a caching decorator around a DB-only repository. That's symmetrical, and all concerns are separated.

Hmmm.

Second Intermission

Go find a suitable blade so you can open a vein and end it all instead of having to read on… (and on…)


Oh, as a footnote to this another requirement I wanted to prove is that the chain links behaved "sensibly" if called without their adjacent agents. IE: the DatabasePersonAgent agent didn't need the CachedPersonAgent, nor did it need the CacheablePersonAgent.

Here's some examples:

Just the database handler:

$loggingService = new LoggingService();
$personRepository = new PersonRepository($loggingService);

$databaseHandler = new DatabasePersonHandler($personRepository);

$personService = new PersonService($databaseHandler);

foreach ([1,5] as $id){
    $person = $personService->getById($id);
    var_dump($person);
}


Result:

C:\src\php\php.local\src\chainOfResponsibility\public>php manualTestWithJustDatabaseHandler.php
Database hit for 1
object(me\adamcameron\cor\model\Person)#7 (2) {
  ["firstName"]=>
  string(3) "Tui"
  ["lastName"]=>
  string(4) "Tahi"
}
Database miss for 5
object(me\adamcameron\cor\model\Person)#8 (2) {
  ["firstName"]=>
  NULL
  ["lastName"]=>
  NULL
}

C:\src\php\php.local\src\chainOfResponsibility\public>

Yup: it's getting data from the DB, and doesn't require the other agents.

Now one with just the CachedPersonHandler:

$loggingService = new LoggingService();
$cachingService = new CacheService($loggingService);

$cachedHandler = new CachedPersonHandler($cachingService);

$personService = new PersonService($cachedHandler);

$cachingService->put(5, ['firstName'=>'Ria', 'lastName'=>'Rima']);

foreach ([1,5] as $id){
    $person = $personService->getById($id);
    var_dump($person);
}

Result:

C:\src\php\php.local\src\chainOfResponsibility\public>php manualTestWithJustCachedHandler.php
Cache put for 5
Cache miss for 1
object(me\adamcameron\cor\model\Person)#7 (2) {
  ["firstName"]=>
  NULL
  ["lastName"]=>
  NULL
}
Cache hit for 5
object(me\adamcameron\cor\model\Person)#8 (2) {
  ["firstName"]=>
  string(3) "Ria"
  ["lastName"]=>
  string(4) "Rima"
}

C:\src\php\php.local\src\chainOfResponsibility\public>

Here I'm priming the cache manually, just to make sure it'll find the cached item OK. And it does.

For the sake of completeness I also test with just the cache-put agent. It won't do anything useless (a warning flag in itself), but it should "work":

$loggingService = new LoggingService();
$cachingService = new CacheService($loggingService);

$cacheableHandler = new CacheablePersonHandler($cachingService);

$personService = new PersonService($cacheableHandler);

foreach ([1,5] as $id){
    $person = $personService->getById($id);
    var_dump($person);
}


Result:

C:\src\php\php.local\src\chainOfResponsibility\public>php manualTestWithJustCacheableHandler.php
object(me\adamcameron\cor\model\Person)#7 (2) {
  ["firstName"]=>
  NULL
  ["lastName"]=>
  NULL
}
object(me\adamcameron\cor\model\Person)#8 (2) {
  ["firstName"]=>
  NULL
  ["lastName"]=>
  NULL
}

C:\src\php\php.local\src\chainOfResponsibility\public>

This is all unhelpful to the stated goal, but it's doing what it's told and is self-contained. "Win".

In conclusion I think this has helped me get a handle on the Chain of Responsibility pattern, which is good. I still want to have a look at chains wherein the agents achieve partial fulfilment, and possibly some recursion as well. That'd be cool.

But I really don't think this is the right solution to the job at hand at work.

I'll leave you be now. Oh... as per usual: all the code is on Github, including the working application. The first iteration of the work is tagged separately.

Righto.

--
Adam