Monday, 24 April 2017

A cyclomatic complexity challenge

G'day:
This one came up briefly in a Friday afternoon messenger conversation between meself and my colleague in Dublin, Carlos. He's doing some code maintenance, and come across a function in one of our classes for which PHPMD's cyclomatic complexity analyser is complaining at us, and he was wondering what - if anything - do do about it.

We usually heed PHPMD's advice: variable names too long or too short; methods too long; classes with too many methods etc. But I think our general feel for cyclomatic complexity issues is to treat things with a grain of salt.

Now I'm not gonna explain cyclomatic complexity if you don't know about it: you can just read about it on Wikipedia: Cyclomatic complexity. In short it's an approach to giving a numeric score the complexity of some logic based on how many different logic branches it contains.

The code I present in this article offers a passing resemblance to the code in our code base, but I have tweaked it to make it more clear as a stand-alone example than it would be if I just ripped the code lock-stock out of its context. This captures the intent and the complexity (in that "cyclomatic" sense) of the original code, but that's about it. "all the names have been changed to protect the innocent" as it were.

Anyway, here's an analogy of the code in question:

public function update(int $id, array $parameters): void
{
    $this->propertyRepository->update($id, $parameters);

    if (array_key_exists('media', $parameters) && is_array($parameters['media'])) {
        $this->mediaRepository->update($id, $parameters['media']);
    }
    if (array_key_exists('metadata', $parameters) && is_array($parameters['metadata'])) {
        $this->metadataRepository->update($id, $parameters['metadata']);
    }
    if (array_key_exists('social', $parameters) && is_array($parameters['social'])) {
        $this->socialRepository->update($id, $parameters['social']);
    }
    if (array_key_exists('personnel', $parameters) && is_array($parameters['personnel'])) {
        $this->personnelRepository->update($id, $parameters['personnel']);
    }
    if (array_key_exists('availability', $parameters) && is_array($parameters['availability'])) {
        $this->availabilityRepo->update($id, $parameters['availability']);
    }
}

It's innocuous enough. This is part of PropertyService which services the business logic for a REST endpoint that handles updates to a Property. I'm in the online accommodation booking business, remember, so here a "Property" is like a hotel or something.

The raw data has been validated as "not completely wrong, missing or dangerous" at this point, but is still just in a raw array - a transformation of the incoming JSON payload from the request.

An update has a required component: the basic property info, plus a number of optional components depending on the nature of the source request. I've completely changed these here from the original, but I'm pretending a property can also comprise separate media, metadata, social, personnel and availability components to the business object, all of which have different storage implementations, marshalled by their respective repositories.

Rightly or wrongly this is a generic update endpoint which is the staging ground for everything else going on, so we've got all this conditional processing based on the existence of the relevant sub-components of the object itself. This is not so extraordinary.

But because we have five conditional bits, and each condition has a compound expression, this tips the scales of our cyclomatic complexity threshold. if we only had four of those conditions, we'd be fine. Now: no-one is saying that "four is OK; five is bad", but what we are saying is "well... four is OK; five needs further assessment". We don't wanna get too dogmatic about this stuff.

Anyway, here's PHPStorm whining at me:


(I have no idea what NPath complexity is, but I assume it's just a different way of measuring the same thing).

As I indicated, we had a quick chat and thought "do you know what? We're probably OK with this code". Granted the complexity is there, but the code is easy enough to follow. Because the conditions are "the same" each time a lot fo the perceived complexity is just metric, not reality. We also thought any tactic to reduce the perceived complexity might result in more real-world complexity.

I decided to see what I could do with it over the weekend, as a basis for this article. And, obviosuly, if I came up with a good solution we could use it.

First things first, I made sure we had 100% logical test coverage on the existsing code. This means we're in a safe place to refactor should we decide to. The tests are long and boring, so I'll not include them here. They're all pretty straight fwd anyhow. Oh, I'll repet the test cases from the data provider anyhow:

return [
    'no optionals' => ['parameters' => []],
    'media is simple' => ['parameters' => ['media' => self::$mediaParameter]],
    'media is array' => ['parameters' => ['media' => [self::$mediaParameter]]],
    'metadata is simple' => ['parameters' => ['metadata' => self::$metadataParameter]],
    'metadata is array' => ['parameters' => ['metadata' => [self::$metadataParameter]]],
    'social is simple' => ['parameters' => ['social' => self::$socialParameter]],
    'social is array' => ['parameters' => ['social' => [self::$socialParameter]]],
    'personnel is simple' => ['parameters' => ['personnel' => self::$personnelParameter]],
    'personnel is array' => ['parameters' => ['personnel' => [self::$personnelParameter]]],
    'availability is simple' => ['parameters' => ['availability' => self::$availabilityParameter]],
    'availability is array' => ['parameters' => ['availability' => [self::$availabilityParameter]]],
    'all optionals' => ['parameters' => $allParameters]
];

The tests just make sure the appropriate repo is called given the presence/absence of the optional part(s) in the inputs.

BTW I was mindful that the complexity of the tests is also part of the complexity of the code. That's a lot of test variations for one method.

Anyhow, the first thing I spotted - pretty much immediately - is the conditional logic could be simplified if we had a function which was "key exists in array and its value is itself an array". This straight-away halves the conditional expressions in the function:

public function update(int $id, array $parameters): void
{
    $this->propertyRepository->update($id, $parameters);

    if ($this->keyExistsAndIsArray('media', $parameters)) {
        $this->mediaRepository->update($id, $parameters['media']);
    }
    if ($this->keyExistsAndIsArray('metadata', $parameters)) {
        $this->metadataRepository->update($id, $parameters['metadata']);
    }
    if ($this->keyExistsAndIsArray('social', $parameters)) {
        $this->socialRepository->update($id, $parameters['social']);
    }
    if ($this->keyExistsAndIsArray('personnel', $parameters)) {
        $this->personnelRepository->update($id, $parameters['personnel']);
    }
    if ($this->keyExistsAndIsArray('availability', $parameters)) {
        $this->availabilityRepo->update($id, $parameters['availability']);
    }
}

private function keyExistsAndIsArray(string $key, array $array) : bool
{
    return array_key_exists($key, $array) && is_array($array[$key]);
}

Because I've got full test coverage, I know this solution is analogous to the original one. And the tests also cover the private method 100% too.  And it is slightly clearer. Win.

Next I took a closer look again, and each conditional block is pretty much the same too. Just a different key, and a different repo to send the optional parameters too. So I could do this in another single private function too:

private function updateOptional($id, $parameters, $optional, $repository)
{
    if (array_key_exists($optional, $parameters) && is_array($parameters[$optional])) {
        $repository->update($id, $parameters[$optional]);
    }
}

This is just a homogenised version of each if statement from the original method. The original method is now greatly simplified:

public function update(int $id, array $parameters): void
{
    $this->propertyRepository->update($id, $parameters);

    $this->updateOptional($id, $parameters, 'media', $this->mediaRepository);
    $this->updateOptional($id, $parameters, 'metadata', $this->metadataRepository);
    $this->updateOptional($id, $parameters, 'social', $this->socialRepository);
    $this->updateOptional($id, $parameters, 'personnel', $this->personnelRepository);
    $this->updateOptional($id, $parameters, 'availability', $this->availabilityRepo);
}

I'm not that cool with having a method that takes four arguments: one should aim for zero arguments, after all; and each additional argument beyond that is reducing yer code cleanliness, but in this case it's not part of the public interface and it's only used in one place, so I think it's OK. Also I've got rid of eight of the ten boolean expressions the original version had, and that is definitely a clean-code win.

And, of course, all the tests still run fine, and I've 100% coverage still.

Looking over the three solutions here, I do actually think this last one is more clear than the first, and we've not added any real complexity for complexity's sake. So I think I'll probably suggest this to the parties concerned. We won't apply it straight away as the first version of the code is written and released and working, but we'll bear it in mind should we come to maintain that code further, and perhaps as a technique to consider next time a similar situation comes up.

Also important to note that I'd not take this approach from the outset, if say we had only 2-3 conditions in that method. This really is just an "edge-case" improvement, and like I said I don't think it's worth diving in and changing unless we have reason to touch that code again. And like if we were still at the code review stage of the original code, I'd perhaps make this observation, but I'd not suggest we revisit the code before approving it. The code that stands is, after all, pretty much A-OK. It's important to not get too dogmatic about these things, and slow proceedings down just to make some OK code slightly more OK. "Good enough" is, well: good enough.

I've got another article brewing about the unit tests I wrote for this. I'll hope to tackle that during the week, but I've beer to drink this evening, so I won't be doing it then.

Righto.

--
Adam

Friday, 14 April 2017

I actively consider what I think ought to go into a controller

G'day:
This is one of these ones where I state the obvious (probably), and expose myself as a fraud in that it's taken me this long to reach this conclusion. I can only take solace in that I'm clearly in "good" company. Where "good" == "a lot".

Yesterday I was going a code review, and came across a controller method which made one of my eyebrows raise as it was about 40 lines long. Now this isn't intrinsically bad in context. I have seen (and still see!) controller methods that are hundreds and hundreds of lines long. In the past I've seen controllers tipping the 1000-line mark. Those examples are clearly just wrong. And the developers who wrote them and/or continue to maintain them should be shot. I'm not even sure I'm over-stating it with that sentence. Hmmm... I probably am I guess. But they should be squarely and actively admonished, anyhow.

So in the scheme of things, 40 lines ain't bad. Especially when we try to keep our code wrapped within 80-100chars, so a few of the statements were multi-line. There was about say a dozen statements in it altogether. Not so bad.

Before I go on - and not only because the dev concerned will be reading this article (once I send them the link... pretty sure they don't know this blog exists) - I hasten to add that this code is not by any means bad. I'm probably gonna go "close enough" on the code review, once one bit of the code is refactored at least into a private method. I am not having a go at the dev concerned here. They are just my muse for the day. And I happen to think they're pretty good at their job.

There was one bit of the controller that stood out to me and I went "nuh-uh, that's gotta go". There was a foreach loop just before the response was returned which performed a calculation. That's business logic and it has no place in a controller. Fortunately we have an appropriate service class which we can lift that out and re-home in, and slap some tests on it, and our code is a bit better and a bit more maintainable.

(NB: yeah... no tests... we have a policy of not testing controller methods, but that's with the caveat there's no logic in them. We followed the "don't test controller methods" bit, but not the "provided there's no logic in them" bit here).

However once I spot a problem, I immediately start looking for more, so I assessed the rest of the code there. The controller did seem too fat. I wanted to know if all the logic did actually belong in the controller, and my instinct was: probably not.

At the same time as writing this article I'm chatting on Slack about it, and me mate Mingo offered the perennial answer to the question "should this go in the controller?"

Isn't the answer always "No"?
~ mjhagen 2017

(yes, he cited the attribution too, so I'm including that ;-)

It's pleasingly pithy, and almost ubiquitously agreed with; but in a casually dismissive way that is seldom heeded, I think. The problem is it's alluding to a mindset one should adopt as a guideline, but clearly the answer isn't "no", so I think people dismiss it. If one says "one should never do x", and it's clear that "never" is slight hyperbole, people will acknowledge the hyperbole and stretch it back the other way as far as they see fit to accommodate their current whim.

With this in mind (retroactively... Mingo just mentioned that now, and the code review was yesterday, and this article was drafted in my head last night), I stopped to think about a rule of thumb as to when something ought to go in a controller.

Broadly speaking I feel there are only three types of logic that belong in a controller:

  1. a call to validate / sanitise all incoming arguments so they're fit for purpose for fulfilling the request;
  2. one (preferably) or more (and increasingly less preferably the more there are) calls to get data - using those incoming arguments - to build the response;
  3. a call to assemble that data for the response;
  4. returning the response.

Yeah that's 4. I was not counting the return statement. An example of this might be:

class UserController {
    function handleGet(rawArgs){
        args = validationService.validate(rawArgs)
        
        user = userService.get(args.id)
        publishedContent = contentService.getForId(args.id, args.startDate, args.endDate)
        twitterContent = twitterService.getForId(args.id, args.startDate, args.endDate)
        facebookContent = facebookService.getForId(args.id, args.startDate, args.endDate)
        
        response = {
            user = user,
            publishedContent = publishedContent,
            socialContent = {
                twitter = twitterContent,
                facebook = facebookContent
            }
        }
        return new Response(response)
    }
}

I think that's OK, even though there's four service calls. All the logic is busying itself with what it is to fulfil that request. The controller simply dictates what data needs returning, and gets that data. there is no business logic in there. It's all controller logic.

A pseudo-code example where we're adding business logic into this might be:

// ...
publishedContent = contentService.getForId(args.id, args.startDate, args.endDate)
topFiveArticles = publishedContent.slice(5)

// ...

response = {
    // ...
    articles = topFiveArticles,
    // ...
}

Here the slice call - even though it's a native array method - is still business logic. If the controller method was getTopFiveArticles or something, I'd go back to considering it controller logic. So it's a fine line.

Back to the code i was reviewing, it was doing 1&2, then it was going off-piste with that foreach loop, then doing 3&4. Ignoring the foreach, I was still itchy, but wasn't seeing why. It seemed like it was roughly analogous to my first example above.

So what's the problem? Why was the code still making me furrow my brow?

Well there was just too much code. That set-off a warning bell. I wanted to reduce the number of calls to various services (or repositories as is the case here: data from slightly different sources), but initially I couldn't really justify it. Then I looked closer.

The code in question was more like this (I've got rid of the date args to focus more):

class ContentController {
    function handleGet(rawArgs){
        args = validationService.validate(rawArgs)
        
        user = userService.get(args.id)
        publishedContent = contentService.getForId(args.id)
        
        twitterContent = twitterService.getForId(user.twitterId)
        facebookContent = facebookService.getForId(user.facebookId)
        
        response = {
            articles = publishedContent,
            socialContent = {
                twitter = twitterContent,
                facebook = facebookContent
            }
        }
        return new Response(response)
    }
}


I hasten to add the code I was reviewing is nothing to do with users and content and social media stuff, this is just an example which is analogous but would make more sense to people not in the context of our business.

The difference is subtle. The difference is that the decision that the IDs for the social content are stored in the user object is business logic. It's not controller logic to "know" that's how that data is stored. the user object here is not part of the data that composes the response; it's part of some business logic that dictates how to get the data.

And this does indeed break that second rule:

  • a call to validate / sanitise all incoming arguments so they're fit for purpose for fulfilling the request;
  • [...] calls to get data - using those incoming arguments - to build the response;

The call to get the social media content is not using those arguments. It's using a value from a separate call:
user = userService.get(args.id)
// ...
twitterContent = twitterService.getForId(user.twitterId)
facebookContent = facebookService.getForId(user.facebookId)


Another tell tale here is that user is not even part of the response. It's just used to get those "foreign keys".

My rule of thumb in those four points above is still correct. But one needs to pay special attention to the second point. Any time one's inclined to fetch data "indirectly", the indirection is business logic. It might not be a loop or some conditional or switch block or something more obvious that manipulates the data before it's returned, but that indirection is still business logic. And does not belong a the controller.

So how should that controller look? Move the content acquisition out into a service that has the explicit purpose of getting a user's content - from wherever - via some key:

class ContentController {
    function handleGet(rawArgs){
        args = validationService.validate(rawArgs)
        
        content = contentService.getById(args.id)
        
        response = {
            articles = content.published,
            socialContent = {
                twitter = content.twitter,
                facebook = content.facebook
            }
        }
        return new Response(response)
    }
}


This follows the second point to the letter, and there's now no business logic in the controller. Also the controller legitimately does not really need unit testing now, but the logic in the ContentService can be. Also the controller method is down to a size that would not raise any eyebrows.

This is not an earth-shattering conclusion by any measure, but I'm quite pleased to fine-tune my position here, and hopefully it can help drive discussion about this in code reviews in future.

It'd be great to hear anyone else's thoughts on this?

--
Adam