Monday, 24 April 2017

A cyclomatic complexity challenge

G'day:
This one came up briefly in a Friday afternoon messenger conversation between meself and my colleague in Dublin, Carlos. He's doing some code maintenance, and come across a function in one of our classes for which PHPMD's cyclomatic complexity analyser is complaining at us, and he was wondering what - if anything - do do about it.

We usually heed PHPMD's advice: variable names too long or too short; methods too long; classes with too many methods etc. But I think our general feel for cyclomatic complexity issues is to treat things with a grain of salt.

Now I'm not gonna explain cyclomatic complexity if you don't know about it: you can just read about it on Wikipedia: Cyclomatic complexity. In short it's an approach to giving a numeric score the complexity of some logic based on how many different logic branches it contains.

The code I present in this article offers a passing resemblance to the code in our code base, but I have tweaked it to make it more clear as a stand-alone example than it would be if I just ripped the code lock-stock out of its context. This captures the intent and the complexity (in that "cyclomatic" sense) of the original code, but that's about it. "all the names have been changed to protect the innocent" as it were.

Anyway, here's an analogy of the code in question:

public function update(int $id, array $parameters): void
{
    $this->propertyRepository->update($id, $parameters);

    if (array_key_exists('media', $parameters) && is_array($parameters['media'])) {
        $this->mediaRepository->update($id, $parameters['media']);
    }
    if (array_key_exists('metadata', $parameters) && is_array($parameters['metadata'])) {
        $this->metadataRepository->update($id, $parameters['metadata']);
    }
    if (array_key_exists('social', $parameters) && is_array($parameters['social'])) {
        $this->socialRepository->update($id, $parameters['social']);
    }
    if (array_key_exists('personnel', $parameters) && is_array($parameters['personnel'])) {
        $this->personnelRepository->update($id, $parameters['personnel']);
    }
    if (array_key_exists('availability', $parameters) && is_array($parameters['availability'])) {
        $this->availabilityRepo->update($id, $parameters['availability']);
    }
}

It's innocuous enough. This is part of PropertyService which services the business logic for a REST endpoint that handles updates to a Property. I'm in the online accommodation booking business, remember, so here a "Property" is like a hotel or something.

The raw data has been validated as "not completely wrong, missing or dangerous" at this point, but is still just in a raw array - a transformation of the incoming JSON payload from the request.

An update has a required component: the basic property info, plus a number of optional components depending on the nature of the source request. I've completely changed these here from the original, but I'm pretending a property can also comprise separate media, metadata, social, personnel and availability components to the business object, all of which have different storage implementations, marshalled by their respective repositories.

Rightly or wrongly this is a generic update endpoint which is the staging ground for everything else going on, so we've got all this conditional processing based on the existence of the relevant sub-components of the object itself. This is not so extraordinary.

But because we have five conditional bits, and each condition has a compound expression, this tips the scales of our cyclomatic complexity threshold. if we only had four of those conditions, we'd be fine. Now: no-one is saying that "four is OK; five is bad", but what we are saying is "well... four is OK; five needs further assessment". We don't wanna get too dogmatic about this stuff.

Anyway, here's PHPStorm whining at me:


(I have no idea what NPath complexity is, but I assume it's just a different way of measuring the same thing).

As I indicated, we had a quick chat and thought "do you know what? We're probably OK with this code". Granted the complexity is there, but the code is easy enough to follow. Because the conditions are "the same" each time a lot fo the perceived complexity is just metric, not reality. We also thought any tactic to reduce the perceived complexity might result in more real-world complexity.

I decided to see what I could do with it over the weekend, as a basis for this article. And, obviosuly, if I came up with a good solution we could use it.

First things first, I made sure we had 100% logical test coverage on the existsing code. This means we're in a safe place to refactor should we decide to. The tests are long and boring, so I'll not include them here. They're all pretty straight fwd anyhow. Oh, I'll repet the test cases from the data provider anyhow:

return [
    'no optionals' => ['parameters' => []],
    'media is simple' => ['parameters' => ['media' => self::$mediaParameter]],
    'media is array' => ['parameters' => ['media' => [self::$mediaParameter]]],
    'metadata is simple' => ['parameters' => ['metadata' => self::$metadataParameter]],
    'metadata is array' => ['parameters' => ['metadata' => [self::$metadataParameter]]],
    'social is simple' => ['parameters' => ['social' => self::$socialParameter]],
    'social is array' => ['parameters' => ['social' => [self::$socialParameter]]],
    'personnel is simple' => ['parameters' => ['personnel' => self::$personnelParameter]],
    'personnel is array' => ['parameters' => ['personnel' => [self::$personnelParameter]]],
    'availability is simple' => ['parameters' => ['availability' => self::$availabilityParameter]],
    'availability is array' => ['parameters' => ['availability' => [self::$availabilityParameter]]],
    'all optionals' => ['parameters' => $allParameters]
];

The tests just make sure the appropriate repo is called given the presence/absence of the optional part(s) in the inputs.

BTW I was mindful that the complexity of the tests is also part of the complexity of the code. That's a lot of test variations for one method.

Anyhow, the first thing I spotted - pretty much immediately - is the conditional logic could be simplified if we had a function which was "key exists in array and its value is itself an array". This straight-away halves the conditional expressions in the function:

public function update(int $id, array $parameters): void
{
    $this->propertyRepository->update($id, $parameters);

    if ($this->keyExistsAndIsArray('media', $parameters)) {
        $this->mediaRepository->update($id, $parameters['media']);
    }
    if ($this->keyExistsAndIsArray('metadata', $parameters)) {
        $this->metadataRepository->update($id, $parameters['metadata']);
    }
    if ($this->keyExistsAndIsArray('social', $parameters)) {
        $this->socialRepository->update($id, $parameters['social']);
    }
    if ($this->keyExistsAndIsArray('personnel', $parameters)) {
        $this->personnelRepository->update($id, $parameters['personnel']);
    }
    if ($this->keyExistsAndIsArray('availability', $parameters)) {
        $this->availabilityRepo->update($id, $parameters['availability']);
    }
}

private function keyExistsAndIsArray(string $key, array $array) : bool
{
    return array_key_exists($key, $array) && is_array($array[$key]);
}

Because I've got full test coverage, I know this solution is analogous to the original one. And the tests also cover the private method 100% too.  And it is slightly clearer. Win.

Next I took a closer look again, and each conditional block is pretty much the same too. Just a different key, and a different repo to send the optional parameters too. So I could do this in another single private function too:

private function updateOptional($id, $parameters, $optional, $repository)
{
    if (array_key_exists($optional, $parameters) && is_array($parameters[$optional])) {
        $repository->update($id, $parameters[$optional]);
    }
}

This is just a homogenised version of each if statement from the original method. The original method is now greatly simplified:

public function update(int $id, array $parameters): void
{
    $this->propertyRepository->update($id, $parameters);

    $this->updateOptional($id, $parameters, 'media', $this->mediaRepository);
    $this->updateOptional($id, $parameters, 'metadata', $this->metadataRepository);
    $this->updateOptional($id, $parameters, 'social', $this->socialRepository);
    $this->updateOptional($id, $parameters, 'personnel', $this->personnelRepository);
    $this->updateOptional($id, $parameters, 'availability', $this->availabilityRepo);
}

I'm not that cool with having a method that takes four arguments: one should aim for zero arguments, after all; and each additional argument beyond that is reducing yer code cleanliness, but in this case it's not part of the public interface and it's only used in one place, so I think it's OK. Also I've got rid of eight of the ten boolean expressions the original version had, and that is definitely a clean-code win.

And, of course, all the tests still run fine, and I've 100% coverage still.

Looking over the three solutions here, I do actually think this last one is more clear than the first, and we've not added any real complexity for complexity's sake. So I think I'll probably suggest this to the parties concerned. We won't apply it straight away as the first version of the code is written and released and working, but we'll bear it in mind should we come to maintain that code further, and perhaps as a technique to consider next time a similar situation comes up.

Also important to note that I'd not take this approach from the outset, if say we had only 2-3 conditions in that method. This really is just an "edge-case" improvement, and like I said I don't think it's worth diving in and changing unless we have reason to touch that code again. And like if we were still at the code review stage of the original code, I'd perhaps make this observation, but I'd not suggest we revisit the code before approving it. The code that stands is, after all, pretty much A-OK. It's important to not get too dogmatic about these things, and slow proceedings down just to make some OK code slightly more OK. "Good enough" is, well: good enough.

I've got another article brewing about the unit tests I wrote for this. I'll hope to tackle that during the week, but I've beer to drink this evening, so I won't be doing it then.

Righto.

--
Adam

Friday, 14 April 2017

I actively consider what I think ought to go into a controller

G'day:
This is one of these ones where I state the obvious (probably), and expose myself as a fraud in that it's taken me this long to reach this conclusion. I can only take solace in that I'm clearly in "good" company. Where "good" == "a lot".

Yesterday I was going a code review, and came across a controller method which made one of my eyebrows raise as it was about 40 lines long. Now this isn't intrinsically bad in context. I have seen (and still see!) controller methods that are hundreds and hundreds of lines long. In the past I've seen controllers tipping the 1000-line mark. Those examples are clearly just wrong. And the developers who wrote them and/or continue to maintain them should be shot. I'm not even sure I'm over-stating it with that sentence. Hmmm... I probably am I guess. But they should be squarely and actively admonished, anyhow.

So in the scheme of things, 40 lines ain't bad. Especially when we try to keep our code wrapped within 80-100chars, so a few of the statements were multi-line. There was about say a dozen statements in it altogether. Not so bad.

Before I go on - and not only because the dev concerned will be reading this article (once I send them the link... pretty sure they don't know this blog exists) - I hasten to add that this code is not by any means bad. I'm probably gonna go "close enough" on the code review, once one bit of the code is refactored at least into a private method. I am not having a go at the dev concerned here. They are just my muse for the day. And I happen to think they're pretty good at their job.

There was one bit of the controller that stood out to me and I went "nuh-uh, that's gotta go". There was a foreach loop just before the response was returned which performed a calculation. That's business logic and it has no place in a controller. Fortunately we have an appropriate service class which we can lift that out and re-home in, and slap some tests on it, and our code is a bit better and a bit more maintainable.

(NB: yeah... no tests... we have a policy of not testing controller methods, but that's with the caveat there's no logic in them. We followed the "don't test controller methods" bit, but not the "provided there's no logic in them" bit here).

However once I spot a problem, I immediately start looking for more, so I assessed the rest of the code there. The controller did seem too fat. I wanted to know if all the logic did actually belong in the controller, and my instinct was: probably not.

At the same time as writing this article I'm chatting on Slack about it, and me mate Mingo offered the perennial answer to the question "should this go in the controller?"

Isn't the answer always "No"?
~ mjhagen 2017

(yes, he cited the attribution too, so I'm including that ;-)

It's pleasingly pithy, and almost ubiquitously agreed with; but in a casually dismissive way that is seldom heeded, I think. The problem is it's alluding to a mindset one should adopt as a guideline, but clearly the answer isn't "no", so I think people dismiss it. If one says "one should never do x", and it's clear that "never" is slight hyperbole, people will acknowledge the hyperbole and stretch it back the other way as far as they see fit to accommodate their current whim.

With this in mind (retroactively... Mingo just mentioned that now, and the code review was yesterday, and this article was drafted in my head last night), I stopped to think about a rule of thumb as to when something ought to go in a controller.

Broadly speaking I feel there are only three types of logic that belong in a controller:

  1. a call to validate / sanitise all incoming arguments so they're fit for purpose for fulfilling the request;
  2. one (preferably) or more (and increasingly less preferably the more there are) calls to get data - using those incoming arguments - to build the response;
  3. a call to assemble that data for the response;
  4. returning the response.

Yeah that's 4. I was not counting the return statement. An example of this might be:

class UserController {
    function handleGet(rawArgs){
        args = validationService.validate(rawArgs)
        
        user = userService.get(args.id)
        publishedContent = contentService.getForId(args.id, args.startDate, args.endDate)
        twitterContent = twitterService.getForId(args.id, args.startDate, args.endDate)
        facebookContent = facebookService.getForId(args.id, args.startDate, args.endDate)
        
        response = {
            user = user,
            publishedContent = publishedContent,
            socialContent = {
                twitter = twitterContent,
                facebook = facebookContent
            }
        }
        return new Response(response)
    }
}

I think that's OK, even though there's four service calls. All the logic is busying itself with what it is to fulfil that request. The controller simply dictates what data needs returning, and gets that data. there is no business logic in there. It's all controller logic.

A pseudo-code example where we're adding business logic into this might be:

// ...
publishedContent = contentService.getForId(args.id, args.startDate, args.endDate)
topFiveArticles = publishedContent.slice(5)

// ...

response = {
    // ...
    articles = topFiveArticles,
    // ...
}

Here the slice call - even though it's a native array method - is still business logic. If the controller method was getTopFiveArticles or something, I'd go back to considering it controller logic. So it's a fine line.

Back to the code i was reviewing, it was doing 1&2, then it was going off-piste with that foreach loop, then doing 3&4. Ignoring the foreach, I was still itchy, but wasn't seeing why. It seemed like it was roughly analogous to my first example above.

So what's the problem? Why was the code still making me furrow my brow?

Well there was just too much code. That set-off a warning bell. I wanted to reduce the number of calls to various services (or repositories as is the case here: data from slightly different sources), but initially I couldn't really justify it. Then I looked closer.

The code in question was more like this (I've got rid of the date args to focus more):

class ContentController {
    function handleGet(rawArgs){
        args = validationService.validate(rawArgs)
        
        user = userService.get(args.id)
        publishedContent = contentService.getForId(args.id)
        
        twitterContent = twitterService.getForId(user.twitterId)
        facebookContent = facebookService.getForId(user.facebookId)
        
        response = {
            articles = publishedContent,
            socialContent = {
                twitter = twitterContent,
                facebook = facebookContent
            }
        }
        return new Response(response)
    }
}


I hasten to add the code I was reviewing is nothing to do with users and content and social media stuff, this is just an example which is analogous but would make more sense to people not in the context of our business.

The difference is subtle. The difference is that the decision that the IDs for the social content are stored in the user object is business logic. It's not controller logic to "know" that's how that data is stored. the user object here is not part of the data that composes the response; it's part of some business logic that dictates how to get the data.

And this does indeed break that second rule:

  • a call to validate / sanitise all incoming arguments so they're fit for purpose for fulfilling the request;
  • [...] calls to get data - using those incoming arguments - to build the response;

The call to get the social media content is not using those arguments. It's using a value from a separate call:
user = userService.get(args.id)
// ...
twitterContent = twitterService.getForId(user.twitterId)
facebookContent = facebookService.getForId(user.facebookId)


Another tell tale here is that user is not even part of the response. It's just used to get those "foreign keys".

My rule of thumb in those four points above is still correct. But one needs to pay special attention to the second point. Any time one's inclined to fetch data "indirectly", the indirection is business logic. It might not be a loop or some conditional or switch block or something more obvious that manipulates the data before it's returned, but that indirection is still business logic. And does not belong a the controller.

So how should that controller look? Move the content acquisition out into a service that has the explicit purpose of getting a user's content - from wherever - via some key:

class ContentController {
    function handleGet(rawArgs){
        args = validationService.validate(rawArgs)
        
        content = contentService.getById(args.id)
        
        response = {
            articles = content.published,
            socialContent = {
                twitter = content.twitter,
                facebook = content.facebook
            }
        }
        return new Response(response)
    }
}


This follows the second point to the letter, and there's now no business logic in the controller. Also the controller legitimately does not really need unit testing now, but the logic in the ContentService can be. Also the controller method is down to a size that would not raise any eyebrows.

This is not an earth-shattering conclusion by any measure, but I'm quite pleased to fine-tune my position here, and hopefully it can help drive discussion about this in code reviews in future.

It'd be great to hear anyone else's thoughts on this?

--
Adam

Friday, 24 March 2017

Help explain closure to the Adobe ColdFusion Team

G'day:
OK, this is a CFML-centric article. But it's also a call to provide code analogies in different languages to a CFML example, to show the ColdFusion Team they dunno what they're on about.

Here's some CFML code:

function doit() {
    var a = ["a", "b", "c"];
    var b = ["x", "y", "z"];
    var counter = 0;
    
    arrayEach(a, function(foo) {
        counter = 0;
        arrayEach(b, function(bar) {
            counter++;
            // in dump counter is always 0
            writeDump({
                counter: counter, 
                foo: foo, 
                bar: bar
            });
        });
    });
}
doit();

This leverages closure to reference the outer counter variable within the innermost function expression. Ergo, it's the same variable. So the expected output of this would be:


Note how the counter is declared in the main function, reset in the outer arrayEach handlers and incremented for each iteration of the inner arrayEach call. So it'll cycle through 1,2,3 three times when output.

That was run on Lucee. Running it on ColdFusion yields:


See how the counter is messed up: it's always zero. It should increment for each iteration of the inner function expression.

Adobe - being their typical selves - is claiming this is "by design":

https://tracker.adobe.com/#/view/CF-4197194:


That's nonsense.

Other languages behave predictably:

JavaScript:

function doit() {
    var a = ["a", "b", "c"];
    var b = ["x", "y", "z"];
    var counter = 0;
    
    a.forEach(function(foo) {
        counter = 0;
        b.forEach(function(bar) {
            counter++;
            // in dump counter is always 0
            console.log({
                counter: counter, 
                foo: foo, 
                bar: bar
            });
        });
    });
}
doit();
VM128:11 Object {counter: 1, foo: "a", bar: "x"}
VM128:11 Object {counter: 2, foo: "a", bar: "y"}
VM128:11 Object {counter: 3, foo: "a", bar: "z"}
VM128:11 Object {counter: 1, foo: "b", bar: "x"}
VM128:11 Object {counter: 2, foo: "b", bar: "y"}
VM128:11 Object {counter: 3, foo: "b", bar: "z"}
VM128:11 Object {counter: 1, foo: "c", bar: "x"}
VM128:11 Object {counter: 2, foo: "c", bar: "y"}
VM128:11 Object {counter: 3, foo: "c", bar: "z"}


And PHP (apologies for the rubbish way PHP does closure):

function doit() {
    $a = ["a", "b", "c"];
    $b = ["x", "y", "z"];
    $counter = 0;
    
    array_walk($a, function($foo) use (&$counter, $b) {
        $counter = 0;
        array_walk($b, function($bar) use (&$counter, $foo) {
            $counter++;
            // in dump counter is always 0
            var_dump([
                "counter" => $counter, 
                "foo" => $foo, 
                "bar" => $bar
            ]);
        });
    });
}
doit();


array(3) {
  ["counter"]=>
  int(1)
  ["foo"]=>
  string(1) "a"
  ["bar"]=>
  string(1) "x"
}
array(3) {
  ["counter"]=>
  int(2)
  ["foo"]=>
  string(1) "a"
  ["bar"]=>
  string(1) "y"
}
array(3) {
  ["counter"]=>
  int(3)
  ["foo"]=>
  string(1) "a"
  ["bar"]=>
  string(1) "z"
}
array(3) {
  ["counter"]=>
  int(1)
  ["foo"]=>
  string(1) "b"
  ["bar"]=>
  string(1) "x"
}
array(3) {
  ["counter"]=>
  int(2)
  ["foo"]=>
  string(1) "b"
  ["bar"]=>
  string(1) "y"
}
array(3) {
  ["counter"]=>
  int(3)
  ["foo"]=>
  string(1) "b"
  ["bar"]=>
  string(1) "z"
}
array(3) {
  ["counter"]=>
  int(1)
  ["foo"]=>
  string(1) "c"
  ["bar"]=>
  string(1) "x"
}
array(3) {
  ["counter"]=>
  int(2)
  ["foo"]=>
  string(1) "c"
  ["bar"]=>
  string(1) "y"
}
array(3) {
  ["counter"]=>
  int(3)
  ["foo"]=>
  string(1) "c"
  ["bar"]=>
  string(1) "z"
}


But I'm quite keen to know if there's any language that behaves as ColdFusion does in this example? If you've got a mo', could you knock out an equivalent of this code in [your language of choice], and share the results?

I'm 99.999% sure Adobe are just not quite getting closure, but wanna make sure I'm not missing anything.

Righto.

--
Adam

Wednesday, 22 March 2017

Code puzzle: find any match from an array of regex patterns

G'day:
I'm being slightly cheeky as I'm hijacking someone else's question / puzzle from the CFML Slack channel that we were discussing last night. Don't worry, that'll be the last mention of CFML in this article, and indeed the question itself was asking for a JS solution. But it's a generic code puzzle.

Let's say one has a string, and an array of regex patterns one wants to check for in said string. A number of solutions were offered, using a number of techniques. And it even branched away from JS into Clojure (thanks Sean).

How would you go about doing it?

The input data is:

patterns = ["Savvy", "Smart", "Sweet"]
testForPositive = "Super Savvy"
testForNegative = "Super Svelt"

And in these cases we want true and false to be returned, respectively.

Note that the patterns there need to be considered regular expressions, not simple strings. The example data is just using simple strings for the sake of clarity.

Initially I was gonna do something like this:

function matchesOne(str, patterns) {
    var onePattern = new RegExp(patterns.join("|"));
    return str.search(onePattern) > -1;
}

Here I'm joining all the patterns into one big pattern, and searching for that. I was slightly hesitant about this as it's loading the complexity into the regex operation, which I felt perhaps wasn't ideal. Also it's not very clever: if any of the individual patterns had a | in them: this'd break. Scratch that.

Ryan Guill seeded the idea of approaching it from the other direction. Using the array as an array, and using one of JS's array higher-order functions to do the trick. We saw examples using find and filter, but I reckoned some was the best option here:

function matchesOne(str, patterns) {
    return patterns.some(pattern => str.search(pattern) > -1);
}


I'm happy with that.

The ante was raised next: write a function which returned the position of the first match (not just a boolean as to whether there was a match or not).

To me this was a reduction operation: take an array and - given some algorithm applied to each element - return a final single value.

I tried, but I could not make it very clean. This is what I got:

function firstMatch(str, patterns) {
    return patterns.reduce(function(first, pattern){
        var thisMatch = str.search(pattern);
        return first == -1 ? thisMatch : thisMatch == -1 ? first : Math.min(first, thisMatch);
    }, -1);
}


I'm happy with the reduce approach, but I'm really not happy with the expression to identify the lowest match (or continue to return -1 if no match is found):

first == -1 ? thisMatch : thisMatch == -1 ? first : Math.min(first, thisMatch)


To be clear, if we expanded that out, the logic is:

if (first == -1) {
    return thisMatch;
}
if (thisMatch == -1) {
    return first;
}
return Math.min(first, thisMatch);

That's actually "cleaner", but I still think I'm missing a trick.

So the puzzle to solve here is: what's a clever (but also Clean - in the RC Martin sense) way returning the lowest non -1 value of first and thisMatch, or fall back to -1 if all else fails.

Using this code (or some approximation thereof):

function firstMatch(str, patterns) {
    return patterns.reduce(function(first, pattern){
        var thisMatch = str.search(pattern);
        // return result of your code here
    }, -1);
}

console.log(firstMatch("Super Savvy", ["Savvy", "Smart", "Sweet"]));
console.log(firstMatch("Super Svelt", ["Savvy", "Smart", "Sweet"]));


One should get 6 and -1 for the respective "tests".

Use any language you like.

Oh and hey... if you have any interesting solutions for the two functions themselves, feel free to post 'em too.


Righto.

--
Adam

Thursday, 9 March 2017

CFML: that survey... sit. rep.

G'day:
A coupla people have asked me what's happening with my feedback from that survey you all did for me (Survey: CFML usage and migration strategies).

I have to simply apologise for the delay. I started to look at it, but could not work out how best to present the data. Then I buggered off to NZ to sit in the sun watching cricket and drinking beer with me mates; and "CFML survey data analysis" was not high on my list of priorities. I did start looking at the data some more though.

So not much progress yet, but I'm now back in London and trying to refocus on IT stuff, so I'll try to start feeding back soon. -ish.

Again: apologies for the delay.

Righto.

--
Adam

PHPUnit: setMethods and how I've been writing too much code

G'day:
Seems our technical architect is becoming my PHP muse. Good on ya, Pete.

Anyway, he just sent out what I thought was a controversial email in which he said (paraphrase) "we're calling setMethods way too often: in general we don't need to call it". To contextualise, I mean this:

$dependency = $this
    ->getMockBuilder(MyDependency::class)
    ->setMethods(['someMethod'])
    ->getMock();

$dependency
    ->method('someMethod')
    ->willReturn('MOCKED RESULT');


See here I'm explicitly saying "mock out the someMethod method". Pete's contention is that that's generally unnecessary, cos getMockBuilder mocks 'em all by default anyhow. This goes against my understanding of how it works, and I've written bloody thousands of unit tests with PHPUnit so it surprised me I was getting it wrong. I kinda believed him cos he said he tested it, but I didn't believe him so much as to not go test it myself.

So I knocked out a quick service and dependency for it:

namespace me\adamcameron\myApp;

class MyService
{

    private $myDecorator;

    public function __construct(MyDecorator $myDecorator)
    {
        $this->myDecorator = $myDecorator;
    }

    public function decorateMessage($message)
    {
        $message = $this->myDecorator->addPrefix($message);
        $message = $this->myDecorator->addSuffix($message);
        return $message;
    }

}

namespace me\adamcameron\myApp;

class MyDecorator
{

    public function addPrefix($message)
    {
        return "(ACTUAL PREFIX) $message";
    }

    public function addSuffix($message)
    {
        return "$message (ACTUAL SUFFIX)";
    }

}

Nothing surprising there: the service takes that decorator as a constructor arg, and then when calling decorateMessage the decorator's methods are called.

In our tests of decorateMessage, we don't want to use the real decorator methods, we want to use mocks.

First I have a test the way I'd normally mock things: explicitly calling setMethods with the method names:

public function testWithExplicitMock()
{
    $decorator = $this
        ->getMockBuilder(MyDecorator::class)
        ->setMethods(['addPrefix', 'addSuffix'])
        ->getMock();

 $decorator
  ->method('addPrefix')
  ->willReturn('(MOCKED PREFIX)');

 $decorator
  ->method('addSuffix')
  ->willReturn('(MOCKED SUFFIX)');

 $service = new MyService($decorator);

 $result = $service->decorateMessage('TEST MESSAGE');

 $this->assertSame(
        '(MOCKED SUFFIX)',
        $result
    );
}

And this mocks out the method correctly. This is my baseline.

Secondly, I still call setMethods, but I give it an empty array:

public function testWithImplicitMock()
{
    $decorator = $this
        ->getMockBuilder(MyDecorator::class)
        ->setMethods([])
        ->getMock();

 $decorator
  ->method('addPrefix')
  ->willReturn('(MOCKED PREFIX)');

 $decorator
  ->method('addSuffix')
  ->willReturn('(MOCKED SUFFIX)');

 $service = new MyService($decorator);

 $result = $service->decorateMessage('TEST MESSAGE');

 $this->assertSame(
        '(MOCKED SUFFIX)',
        $result
    );
}

This also mocks out the method correctly. Passing an empty array mocks out all the methods in the mocked class.

Next I try passing null to setMethods:

public function testWithNull()
{
    $decorator = $this
        ->getMockBuilder(MyDecorator::class)
        ->setMethods(null)
        ->getMock();

    $decorator
        ->method('addPrefix')
        ->willReturn('(MOCKED PREFIX)');

    $decorator
        ->method('addSuffix')
        ->willReturn('(MOCKED SUFFIX)');

    $service = new MyService($decorator);

    $result = $service->decorateMessage('TEST MESSAGE');

    $this->assertSame(
        '(ACTUAL PREFIX) TEST MESSAGE (ACTUAL SUFFIX)',
        $result
    );
}

This creates the mocked object, but the methods are not mocked. So the result here is using the actual methods.

In the last example I mock one of the methods (addPrefix), but not the other (addSuffix):

public function testWithOneExplicitMock()
{
    $decorator = $this
        ->getMockBuilder(MyDecorator::class)
        ->setMethods(['addPrefix'])
        ->getMock();

    $decorator
        ->method('addPrefix')
        ->willReturn('(MOCKED PREFIX)');

    $service = new MyService($decorator);

    $result = $service->decorateMessage('TEST MESSAGE');

    $this->assertSame(
        '(MOCKED PREFIX) (ACTUAL SUFFIX)',
        $result
    );
}

Here we see the important bit: when one explicitly mocks one or a subset of methods, then the other methods are not mocked. Which is kinda obvious now I say it, but I guess we had been thinking one needed to call setMethods to be able to then mock the behaviour of the method. And, TBH, I don't think we thought through what was happening to the methods we were not mocking. I think we started calling setMethods all the time because we were slavishly following the PHPUnit docs, which always call it too, which is a less than ideal precedent to "recommend".

Generally speaking, one should want to mock all methods of a dependency, and it should really be the exception when explicitly not wanting to mock a method: ie, to leave it actually callable.

The win here is that we can now de-clutter our tests a bit. We really seldom need to call setMethods, and we sure want to make it clear when we do want to only mock some of a dependency's methods. So this will make our code cleaner and clearer, and easier to code review and maintain. Win!

Cheers Pete!

Righto.

--
Adam

Wednesday, 8 February 2017

PHP: more array dereferencing weirdness

G'day:
(Firstly: I hasten to add that the "weirdness" is "weird to me". It might be completely predictable to everyone else in the PHP world).

This is a follow-on from my earlier article "PHP: accessing an undefined array element yields a notice. Or does it? WTF, PHP?". Now before I start: Kalle Sommer Nielsen has come back to me in a comment on that article and mentions it's expected behaviour, and explained it... but I don't quite follow yet. I'm hoping some ensuing experimentation will cast some light on the scene.

My next variation of this has me scratching my head more. Well: it's demonstrating the same thing, but perhaps more clearly.

<?php

echo PHP_EOL . "Test with array" . PHP_EOL;

$a = [1, 2, 3];
var_dump($a[0]);
var_dump($a[0]['id']);
var_dump($a[0][0]['id']);
var_dump($a[0]['id'][0]);

echo PHP_EOL . "Test with int" . PHP_EOL;
$i = 1;
var_dump($i['id']);
var_dump($i[0]['id']);
var_dump($i['id'][0]);

echo PHP_EOL;

echo PHP_EOL . "Test with string" . PHP_EOL;
$s = '1';
var_dump($s['id']);
var_dump($s[0]['id']);
var_dump($s['id'][0]);

This yields:


C:\temp>php test.php

Test with array
C:\temp\test.php:6:
int(1)
C:\temp\test.php:7:
NULL
C:\temp\test.php:8:
NULL
C:\temp\test.php:9:
NULL

Test with int
C:\temp\test.php:13:
NULL
C:\temp\test.php:14:
NULL
C:\temp\test.php:15:
NULL


Test with string
PHP Warning:  Illegal string offset 'id' in C:\temp\test.php on line 21
PHP Stack trace:
PHP   1. {main}() C:\temp\test.php:0

Warning: Illegal string offset 'id' in C:\temp\test.php on line 21

Call Stack:
    0.0002     356384   1. {main}() C:\temp\test.php:0

C:\temp\test.php:21:
string(1) "1"
PHP Warning:  Illegal string offset 'id' in C:\temp\test.php on line 22
PHP Stack trace:
PHP   1. {main}() C:\temp\test.php:0

Warning: Illegal string offset 'id' in C:\temp\test.php on line 22

Call Stack:
    0.0002     356384   1. {main}() C:\temp\test.php:0

C:\temp\test.php:22:
string(1) "1"
PHP Warning:  Illegal string offset 'id' in C:\temp\test.php on line 23
PHP Stack trace:
PHP   1. {main}() C:\temp\test.php:0

Warning: Illegal string offset 'id' in C:\temp\test.php on line 23

Call Stack:
    0.0002     356384   1. {main}() C:\temp\test.php:0

C:\temp\test.php:23:
string(1) "1"

C:\temp>

OK, so here I factor-out the top-level array from the mix. That test was always referring to the first element of the array, which is just the integer 1, so we might as well just test with that. Needless to say, there's the same results as with the first array element. But I still don't get it. What is using array notation to dereference an integer supposedly doing? It's a non-sensical operation, as far as I understand things. And if it is a nonsensical operation: it should error.

For completeness I also try a variation with a string "1". This does error like I'd expect. Well: it gives a warning and then returns an inappropriate (IMO) result anyhow. This is probably PHP doing it's combo of telling me I did it wrong but then going ahead and guessing what I meant anyhow. Grrr.

I'm gonna mess around with de-referencing integers some more when I get a moment, and try to work out WTF.

Righto.

--
Adam