Monday 24 April 2017

A cyclomatic complexity challenge

G'day:
This one came up briefly in a Friday afternoon messenger conversation between meself and my colleague in Dublin, Carlos. He's doing some code maintenance, and come across a function in one of our classes for which PHPMD's cyclomatic complexity analyser is complaining at us, and he was wondering what - if anything - do do about it.

We usually heed PHPMD's advice: variable names too long or too short; methods too long; classes with too many methods etc. But I think our general feel for cyclomatic complexity issues is to treat things with a grain of salt.

Now I'm not gonna explain cyclomatic complexity if you don't know about it: you can just read about it on Wikipedia: Cyclomatic complexity. In short it's an approach to giving a numeric score the complexity of some logic based on how many different logic branches it contains.

The code I present in this article offers a passing resemblance to the code in our code base, but I have tweaked it to make it more clear as a stand-alone example than it would be if I just ripped the code lock-stock out of its context. This captures the intent and the complexity (in that "cyclomatic" sense) of the original code, but that's about it. "all the names have been changed to protect the innocent" as it were.

Anyway, here's an analogy of the code in question:

public function update(int $id, array $parameters): void
{
    $this->propertyRepository->update($id, $parameters);

    if (array_key_exists('media', $parameters) && is_array($parameters['media'])) {
        $this->mediaRepository->update($id, $parameters['media']);
    }
    if (array_key_exists('metadata', $parameters) && is_array($parameters['metadata'])) {
        $this->metadataRepository->update($id, $parameters['metadata']);
    }
    if (array_key_exists('social', $parameters) && is_array($parameters['social'])) {
        $this->socialRepository->update($id, $parameters['social']);
    }
    if (array_key_exists('personnel', $parameters) && is_array($parameters['personnel'])) {
        $this->personnelRepository->update($id, $parameters['personnel']);
    }
    if (array_key_exists('availability', $parameters) && is_array($parameters['availability'])) {
        $this->availabilityRepo->update($id, $parameters['availability']);
    }
}

It's innocuous enough. This is part of PropertyService which services the business logic for a REST endpoint that handles updates to a Property. I'm in the online accommodation booking business, remember, so here a "Property" is like a hotel or something.

The raw data has been validated as "not completely wrong, missing or dangerous" at this point, but is still just in a raw array - a transformation of the incoming JSON payload from the request.

An update has a required component: the basic property info, plus a number of optional components depending on the nature of the source request. I've completely changed these here from the original, but I'm pretending a property can also comprise separate media, metadata, social, personnel and availability components to the business object, all of which have different storage implementations, marshalled by their respective repositories.

Rightly or wrongly this is a generic update endpoint which is the staging ground for everything else going on, so we've got all this conditional processing based on the existence of the relevant sub-components of the object itself. This is not so extraordinary.

But because we have five conditional bits, and each condition has a compound expression, this tips the scales of our cyclomatic complexity threshold. if we only had four of those conditions, we'd be fine. Now: no-one is saying that "four is OK; five is bad", but what we are saying is "well... four is OK; five needs further assessment". We don't wanna get too dogmatic about this stuff.

Anyway, here's PHPStorm whining at me:


(I have no idea what NPath complexity is, but I assume it's just a different way of measuring the same thing).

As I indicated, we had a quick chat and thought "do you know what? We're probably OK with this code". Granted the complexity is there, but the code is easy enough to follow. Because the conditions are "the same" each time a lot fo the perceived complexity is just metric, not reality. We also thought any tactic to reduce the perceived complexity might result in more real-world complexity.

I decided to see what I could do with it over the weekend, as a basis for this article. And, obviosuly, if I came up with a good solution we could use it.

First things first, I made sure we had 100% logical test coverage on the existsing code. This means we're in a safe place to refactor should we decide to. The tests are long and boring, so I'll not include them here. They're all pretty straight fwd anyhow. Oh, I'll repet the test cases from the data provider anyhow:

return [
    'no optionals' => ['parameters' => []],
    'media is simple' => ['parameters' => ['media' => self::$mediaParameter]],
    'media is array' => ['parameters' => ['media' => [self::$mediaParameter]]],
    'metadata is simple' => ['parameters' => ['metadata' => self::$metadataParameter]],
    'metadata is array' => ['parameters' => ['metadata' => [self::$metadataParameter]]],
    'social is simple' => ['parameters' => ['social' => self::$socialParameter]],
    'social is array' => ['parameters' => ['social' => [self::$socialParameter]]],
    'personnel is simple' => ['parameters' => ['personnel' => self::$personnelParameter]],
    'personnel is array' => ['parameters' => ['personnel' => [self::$personnelParameter]]],
    'availability is simple' => ['parameters' => ['availability' => self::$availabilityParameter]],
    'availability is array' => ['parameters' => ['availability' => [self::$availabilityParameter]]],
    'all optionals' => ['parameters' => $allParameters]
];

The tests just make sure the appropriate repo is called given the presence/absence of the optional part(s) in the inputs.

BTW I was mindful that the complexity of the tests is also part of the complexity of the code. That's a lot of test variations for one method.

Anyhow, the first thing I spotted - pretty much immediately - is the conditional logic could be simplified if we had a function which was "key exists in array and its value is itself an array". This straight-away halves the conditional expressions in the function:

public function update(int $id, array $parameters): void
{
    $this->propertyRepository->update($id, $parameters);

    if ($this->keyExistsAndIsArray('media', $parameters)) {
        $this->mediaRepository->update($id, $parameters['media']);
    }
    if ($this->keyExistsAndIsArray('metadata', $parameters)) {
        $this->metadataRepository->update($id, $parameters['metadata']);
    }
    if ($this->keyExistsAndIsArray('social', $parameters)) {
        $this->socialRepository->update($id, $parameters['social']);
    }
    if ($this->keyExistsAndIsArray('personnel', $parameters)) {
        $this->personnelRepository->update($id, $parameters['personnel']);
    }
    if ($this->keyExistsAndIsArray('availability', $parameters)) {
        $this->availabilityRepo->update($id, $parameters['availability']);
    }
}

private function keyExistsAndIsArray(string $key, array $array) : bool
{
    return array_key_exists($key, $array) && is_array($array[$key]);
}

Because I've got full test coverage, I know this solution is analogous to the original one. And the tests also cover the private method 100% too.  And it is slightly clearer. Win.

Next I took a closer look again, and each conditional block is pretty much the same too. Just a different key, and a different repo to send the optional parameters too. So I could do this in another single private function too:

private function updateOptional($id, $parameters, $optional, $repository)
{
    if (array_key_exists($optional, $parameters) && is_array($parameters[$optional])) {
        $repository->update($id, $parameters[$optional]);
    }
}

This is just a homogenised version of each if statement from the original method. The original method is now greatly simplified:

public function update(int $id, array $parameters): void
{
    $this->propertyRepository->update($id, $parameters);

    $this->updateOptional($id, $parameters, 'media', $this->mediaRepository);
    $this->updateOptional($id, $parameters, 'metadata', $this->metadataRepository);
    $this->updateOptional($id, $parameters, 'social', $this->socialRepository);
    $this->updateOptional($id, $parameters, 'personnel', $this->personnelRepository);
    $this->updateOptional($id, $parameters, 'availability', $this->availabilityRepo);
}

I'm not that cool with having a method that takes four arguments: one should aim for zero arguments, after all; and each additional argument beyond that is reducing yer code cleanliness, but in this case it's not part of the public interface and it's only used in one place, so I think it's OK. Also I've got rid of eight of the ten boolean expressions the original version had, and that is definitely a clean-code win.

And, of course, all the tests still run fine, and I've 100% coverage still.

Looking over the three solutions here, I do actually think this last one is more clear than the first, and we've not added any real complexity for complexity's sake. So I think I'll probably suggest this to the parties concerned. We won't apply it straight away as the first version of the code is written and released and working, but we'll bear it in mind should we come to maintain that code further, and perhaps as a technique to consider next time a similar situation comes up.

Also important to note that I'd not take this approach from the outset, if say we had only 2-3 conditions in that method. This really is just an "edge-case" improvement, and like I said I don't think it's worth diving in and changing unless we have reason to touch that code again. And like if we were still at the code review stage of the original code, I'd perhaps make this observation, but I'd not suggest we revisit the code before approving it. The code that stands is, after all, pretty much A-OK. It's important to not get too dogmatic about these things, and slow proceedings down just to make some OK code slightly more OK. "Good enough" is, well: good enough.

I've got another article brewing about the unit tests I wrote for this. I'll hope to tackle that during the week, but I've beer to drink this evening, so I won't be doing it then.

Righto.

--
Adam