Showing posts with label Clean Code. Show all posts
Showing posts with label Clean Code. Show all posts

Monday, 24 May 2021

Code smells: a look at a switch statement

G'day:

There was a section in last week's Working Code Podcast: Book Club #1 Clean Code by "Uncle Bob" Martin (pt2) where the team were discussing switch statements being a code smell to avoid in OOP (this is at about the 28min mark; I can't find an audio stream of it that I can deep-link to though). I didn't think they quite nailed their understanding of it (sorry team, I don't mean that to sound patronising), so afterwards I asked Namesake if it might be useful if I wrote an article on switch as a code smell. He confirmed that it might've been more a case of mis-articulation than not getting it, but I ought to go ahead anyhow. So I decided to give it some thought.

Coincidentally, I happened to be looking at some of Adam's own code in his Semaphore project, and something I was looking at the test for was… a switch statement. So I decided to think about that.

I stress I said I'd think about it because I'm def on the learning curve with all this stuff, and whilst I've seen some really smell switch statements, and they're obvious, I can't say that I can reason through a good solution to every switch I see. This is an exercise in learning and thinking for me.

Here's the method with the switch in it:

private boolean function ruleMathIsTrue(required any userAttributeValue, required string operator, required any ruleValue){
    switch (arguments.operator){
        case '=':
        case '==':
            return arguments.userAttributeValue == arguments.ruleValue;
        case '!=':
            return arguments.userAttributeValue != arguments.ruleValue;
        case '<':
            return arguments.userAttributeValue < arguments.ruleValue;
        case '<=':
            return arguments.userAttributeValue <= arguments.ruleValue;
        case '>':
            return arguments.userAttributeValue > arguments.ruleValue;
        case '>=':
            return arguments.userAttributeValue >= arguments.ruleValue;
        case 'in':
            return arrayFindNoCase(arguments.ruleValue, arguments.userAttributeValue) != 0;
        default:
            return false;
    }
}

First up: this is not an egregious case at all. It's isolated in a private method rather than being dumped in the middle of some other logic, and that's excellent. The method is close enough to passing a check of the single-responsibility principle to me: it does combine both "which approach to take" with "and actually doing it", but it's a single - simple - expression each time, so that's cool.

What sticks out to me though is the repetition between the cases and the implementation:

They're mostly the same except the three edge-cases:

  • = needs to map to ==;
  • in, which needs a completely different sort of operation.
  • Instead of just throwing an exception if an unsupported operator is used, it just goes "aah… let's just be false" (and return false and throwing an exception are both equally edge-cases anyhow).

This makes me itchy.

One thing I will say for Adam's code, and that helps me in this refactoring exercise, is that he's got good testing of this method, so I am safe to refactor stuff, and when the tests pass I know I'm all good.


My first attempt at refactoring this takes the approach that a switch can often be re-implemented as a map: each case is a key; and the payload of the case is just some handler. This kinda makes the method into a factory method (kinda):

operationMap = {
    '=' : () => userAttributeValue == ruleValue,
    '==' : () => userAttributeValue == ruleValue,
    '!=' : () => userAttributeValue != ruleValue,
    '<' : () => userAttributeValue < ruleValue,
    '<=' : () => userAttributeValue <= ruleValue,
    '>' : () => userAttributeValue > ruleValue,
    '>=' : () => userAttributeValue >= ruleValue,
    'in' : () => ruleValue.findNoCase(userAttributeValue) != 0
};
return operationMap.keyExists(operator) ? operationMap[operator]() : false

OK so I have a map - lovely - but it's still got the duplication in it, and it might be slightly clever, but it's not really as clear as the switch.


Next I try to get rid of the duplication by dealing with each actual case in a specific way:

operator = operator == "=" ? "==" : operator;
supportedComparisonOperators = ["==","!=","<","<=",">",">="];
if (supportedComparisonOperators.find(operator)) {
    return evaluate("arguments.userAttributeValue #operator# arguments.ruleValue");
}
if (operator == "in") {
    return arrayFindNoCase(arguments.ruleValue, arguments.userAttributeValue);
}
return false;

This works, and gets rid of the duplication, but it's way less clear than the switch. And I was laughing at myself by the time I wrote this:

operator = operator == "=" ? "==" : operator

I realised I could get rid of most of the duplication even in the switch statement:

switch (arguments.operator){
    case "=":
        operator = "=="
    case '==':
    case '!=':
    case '<':
    case '<=':
    case '>':
    case '>=':
        return evaluate("arguments.userAttributeValue #operator# arguments.ruleValue");
    case 'in':
        return arrayFindNoCase(arguments.ruleValue, arguments.userAttributeValue) != 0;
    default:
        return false;
}

Plus I give myself bonus points for using evaluate in a non-rubbish situation. It's still a switch though, innit?


The last option I tried was a more actual polymorphic approach, but because I'm being lazy and CBA refactoring Adam's code to inject dependencies, and separate-out the factory from the implementations, it's not as nicely "single responsibility principle" as I'd like. Adam's method becomes this:

private boolean function ruleMathIsTrue(required any userAttributeValue, required string operator, required any ruleValue){
    return new BinaryOperatorComparisonEvaluator().evaluate(userAttributeValue, operator, ruleValue)
}

I've taken the responsibility for how to deal with the operators out of the FlagService class, and put it into its own class. All Adam's class needs to do now is to inject something that implements the equivalent of this BinaryOperatorComparisonEvaluator.evaluate interface, and stop caring about how to deal with it. Just ask it to deal with it.

The implementation of BinaryOperatorComparisonEvaluator is a hybrid of what we had earlier:

component {

    handlerMap = {
        '=' : (operand1, operand2) => compareUsingOperator(operand1, operand2, "=="),
        '==' : compareUsingOperator,
        '!=' : compareUsingOperator,
        '<' : compareUsingOperator,
        '<=' : compareUsingOperator,
        '>' : compareUsingOperator,
        '>=' : compareUsingOperator,
        'in' : inArray
    }

    function evaluate(operand1, operator, operand2) {
        return handlerMap.keyExists(operator) ? handlerMap[operator](operand1, operand2, operator) : false
    }

    private function compareUsingOperator(operand1, operand2, operator) {
        return evaluate("operand1 #operator# operand2")
    }

    private function inArray(operand1, operand2) {
        return operand2.findNoCase(operand1) > 0
    }
}

In a true polymorphic handling of this, instead of just mapping methods, the factory method / map would just give FlagService the correct object it needs to deal with the operator. But for the purposes of this exercise (and expedience), I'm hiding that away in the implementation of BinaryOperatorComparisonEvaluator itself. Just imagine compareUsingOperator and inArray are instances of specific classes, and you'll get the polymorphic idea. Even having the switch in here would be fine, because a factory method is one of the places where I think a switch is kinda legit.

One thing I do like about this handling is the "partial application" approach I'm taking to solve the = edge-case.

But do you know what? It's still not as clear as Adam's original switch. What I have enjoyed about this exercise is trying various different approaches to removing the smell, and all the things I tried had smells of their own, or - in the case of the last one - perhaps less smell, but the code just isn't as clear.


I'm hoping someone reading this goes "ah now, all you need to do is [this]" and comes up with a slicker solution.


I'm still going to look out for a different example of switch as a code smell. One of those situations where the switch is embedded in the middle of a block of code that then goes on to use the differing data each case prepares, and the code in each case being non-trivial. The extraction of those cases into separate methods in separate classes that all fulfil a relevant interface will make it clearer when to treat a switch as a smell, and solve it using polymorphism.

I think what we take from this is the knowledge that one ought not be too dogmatic about stamping out "smells" just cos some book says to. Definitely try the exercise (and definitely use TDD to write the first pass of your code so you can safely experiment with refactoring!), but if the end result ticks boxes for being "more pure", but it's at the same time less clear: know when to back out, and just run with the original. Minimum you'll be a better programmer for having taken yerself through the exercise.

Thanks to the Working Code Podcast crew for inspiring me to look at this, and particularly to Adam for letting me use his code as a discussion point.

Righto.

--
Adam

Monday, 19 April 2021

How (not to) apply a feature toggle in your code

G'day:

I've had these notes lying around for a while, but they were never interesting enough (to me) to flesh out into a full article. But feature toggling has been on my mind recently (OK, it's because of Working Code Podcast's coverage of same: "018: Feature Flags (Finally!)"), so I figured I'll see what I can do with it.

BTW you don't need to listen to the podcast first. It doesn't contextualise anything I say here (well maybe one thing down the bottom), it was just the catalyst for the decision to write this up. But go listen anyway. It's cool.

I see a lot of code where the dev seems to have though "OK, I need to alternate behaviour here based on one of the toggles our FeatureToggleService provides, which is based on the request the controller receives". They are looking at this code:

class SomeDao {

    SomeDao(connector){
        this.connector = connector
    }

    getRecords() {
        // lines of code
        // that get records
        // from this.connector
        // and does something with them
        // and then something else
        // and return them
    }
}

And the alternative behaviour is to get the records from somewhere else, process them slightly differently, but the end result is the same sorta data structure. They decide to do this:

class SomeDao {

    SomeDao(someConnector, someOtherConnector) {
        this.someConnector = someConnector
        this.someOtherConnector = someOtherConnector
    }

    getRecords(someFeatureFlag) {
        // lines of code
        if (someFeatureFlag) {
            // that get records the new way
            // from this.otherConnector
            // and does something different with them
        } else {
            // that get records the old way
            // from this.connector
            // and does something with them
        }
        // and then something else
        // and return them
    }
}    

So the code that calls that needs to know the feature flag value to pass to getRecords. So they update it:

class SomeRepository {

    getObjects(someFeatureFlag) {
        records = this.dao.getRecords(someFeatureFlag)
        
        // code to make objects from the record data
        
        return objects
    }
}

But yeah the repo method needs to know about the someFeatureFlag value now, so we update the service that calls it:

class SomeService {

    getBusinessObjects(someFeatureFlag) {
        // code that does some business logic before we can get the objects

        objects = this.repo.getRecords(someFeatureFlag)
        
        // other code that does some business logic after we've got the objects
        
        return stuffTheControllerNeeds
    }
}

And the controller that calls the service:

class SomeController {

    SomeController(featureFlagService) {
        this.featureFlagService = featureFlagService
    }

    fulfilSomeRequest(request) {
        // some request stuff

        someFeatureFlag = this.featureFlagService.getSomeFeatureFlag(basedOn=request)
        results = this.service.getBusinessObjects(someFeatureFlag)
        
        // use those results to make the response
        
        return response
    }
}

And then somewhere between the routing and the call to the controller method, there's an IoC container stitching everything together:

class SomeIocContainer {

    getSomeController() {
        return new SomeController(
            new SomeService(
                new SomeRepository(
                    new SomeDao(
                        new SomeConnector(),
                        new SomeOtherConnector()
                    )
                )
            ),
            new FeatureFlagService()
        )
    }
}

And we conclude we need to do that, because the controller is the only thing that knows about the request, and we need to the request to get the correct toggle value.

Hrm. But that's less than idea (no shit, Sherlock)… maybe we could give the DAO constructor the FeatureToggleService? We'd then still need to pass the request through the call stack. Deep breath:

class SomeIocContainer {

    getSomeController() {
        return new SomeController(
            new SomeService(
                new SomeRepository(
                    new SomeDao(
                        new SomeConnector(),
                        new SomeOtherConnector(),
                        new FeatureFlagService()
                    )
                )
            )
        )
    }
}

class SomeController {

    function fulfilSomeRequest(request) {
        // some request stuff

        results = service.getBusinessObjects(request)
        
        // use those results to make the response
        
        return response
    }
}

class SomeService {

    getBusinessObjects(request) {
        // code that does some business logic before we can get the objects

        objects = repo.getRecords(request)
        
        // other code that does some business logic after we've got the objects
        
        return stuffTheControllerNeeds
    }
}

class SomeRepository {

    getObjects(request) {
        records = dao.getRecords(request)
        
        // code to make objects from the record data
        
        return objects
    }
}

class SomeDao {

    SomeDao(someConnector, someOtherConnector, featureFlagService) {
        this.someConnector = someConnector
        this.someOtherConnector = someOtherConnector
        this.featureFlagService = featureFlagService
    }

    getRecords(request) {
        // lines of code
        if (this.featureFlagService.getSomeFeatureFlag(basedOn=request)) {
            // that get records the new way
            // from this.otherConnector
            // and does something different with them
        } else {
            // that get records the old way
            // from this.connector
            // and does something with them
        }
        // and then something else
        // and return them
    }
}

This is not an improvement. I'd go as far as to say it's worse because there's no way the request should ever leak out of its controller method.

And you might think I'm exaggerating here, but I have seen both those implementations in the past. I mean: I saw some today. And in the former - historical - cases the devs concerned sincerely went "it's not possible any other way: I need the toggle in the DAO, and the featureFlagService needs the request object to work out the toggle value".

I take the dev back to a pseudo-code version of their DAO method:

getRecords(something) {
    // lines of code
    if (something) {
        // that get records the new way
        // from this.otherConnector
        // and does something different with them
    } else { // something else
        // that get records the old way
        // from this.connector
        // and does something with them
    }
    // and then something else
    // and return them
}

The first red flag here is the else. Don't have elses in yer code. You don't need them, and they are a code smell. We extract that code:

getRecords(something) {
    // lines of code
    things = doTheThing(something)
    // and then something else
    // and return them
}

doTheThing(something){
    if (something) {
        // that get records the new way
        // from this.otherConnector
        // and does something different with them
        return them
    }
    // something else
    // that get records the old way
    // from this.connector
    // and does something with them
    return them
}

But this makes it more clear we're going the wrong way about this. doTheThing? How many things is it doing? Two possible things. And that is based on it doing a third thing: checking which of the two things it's supposed to do. So three things. How many things is any one bit of code supposed to do? One thing. And even if we refactor that doTheThing method some more:

doTheThing(something){
    if (something) {
        return getThemTheNewWay()
    }
    
    return getThemTheOldWay()
}

We've still got a DAO now that's doing things in two different ways; and it still needs to know how to make that decision. It's not the job of a DAO to check feature toggles. It's the job of a DAO to get data from storage. So even with a decent clean-code refactoring, it still smells from an architectural POV.

(Oh yeah and I happen to be using a DAO in this example, but the same applies to any sort of class. It's going to be its job to doTheThing, not make decisions about how to do thing based on a feature toggle).

How are we supposed to implement feature toggling in our code? In our object oriented code? Use OOP.

Use a factory method in the IOC container to get the correct DAO for the job:

class SomeIocContainer {

    getDao() {
        if (this.featureFlagService.getSomeFeatureFlag(basedOn=this.request)) {
            return SomeOtherDao(new SomeOtherConnector())
        }
        return SomeDao(new SomeConnector())
    }

    getSomeController() {
        return new SomeController(
            new SomeService(
                new SomeRepository(
                    getDao()
                )
            )
        )
    }
}

It is precisely the job of a factory method to make a call as to which implementation of a class to use. And it is precisely the job of an IOC container to compose services. Each are doing one job.

DAOs:

class SomeDao {

    SomeDao(connector){
        this.connector = connector
    }

    getRecords() {
        // lines of code
        records = getRecordsFromConnector()
        // and then something else
        // and return them
    }
    
    getRecordsFromConnector() {
        // that get records
        // from this.connector
        // and does something with them
    }
}

This is just a refactoring of the original method to pull some implementation into its own method. It has no idea about feature toggles, or new ways of doing thing.

class SomeOtherDao extends SomeDAO {
    
    getRecordsFromConnector() {
        // that get records the new way
        // from this.connector
        // and does something different with them
    }
}

This implementation only knows about the new way of doing things. Other than that it's the same DAO as the original. It also doesn't know anything about feature toggles.

One could put a case forward here that getRecordsFromConnector might be a member of a different class which is injected into the one original DAO. IE: favouring composition over inheritance. This is a case of "it depends", I think. And either way, that's just shifting the implementation: the mechanism is still the same.

Aside from the architectural improvements, when we come to remove the toggle, all we need to do is remove that one if statement, in an isolated and very simple factory method, nowhere near any implementation logic. That's it.

Bottom line, here are two things to think about when implementing your toggling:

  • If you are needing to refer to the toggle in more than one place - passing it around, or checking it more than once - you are probably doing something in a less than ideal fashion.
  • If you have the code that checks the toggle and the code that acts on the toggle in the same place, you are also probably not nailing it.

Maybe you are nailing it for the situation you are in. But you should definitely be asking yourself about your architecture and your OOP if you find yourself in either of those situations. And it's more than likely gonna be really easy to design the solution cleanly.


Oh and of course you need to bloody test your toggling. You're adding a thumping great decision into yer app. Feature toggling doesn't free you from testing, it increases your testing burden, because it's another moving part in your app. The clue is in the acceptance criteria: "it should do things this new way in this new situation" and "it should keep doing stuff the old way in the usual run of events". Those are your test cases. And by the very nature of the toggling approach, it's obviously a sitter for test automation rather than eyeballing it every time you do a release. Anyone who suggests feature toggling somehow makes their code more robust and requires less testing as a result (/me looks directly down camera at Ben) is having a laugh ;-)

And on that semi-in-joke note, that's enough out of me for today. Got tomorrow to get my brain ready for, and practising my line "just when I thought I was out…".

Righto.

--
Adam

Monday, 24 April 2017

A cyclomatic complexity challenge

G'day:
This one came up briefly in a Friday afternoon messenger conversation between meself and my colleague in Dublin, Carlos. He's doing some code maintenance, and come across a function in one of our classes for which PHPMD's cyclomatic complexity analyser is complaining at us, and he was wondering what - if anything - do do about it.

We usually heed PHPMD's advice: variable names too long or too short; methods too long; classes with too many methods etc. But I think our general feel for cyclomatic complexity issues is to treat things with a grain of salt.

Now I'm not gonna explain cyclomatic complexity if you don't know about it: you can just read about it on Wikipedia: Cyclomatic complexity. In short it's an approach to giving a numeric score the complexity of some logic based on how many different logic branches it contains.

The code I present in this article offers a passing resemblance to the code in our code base, but I have tweaked it to make it more clear as a stand-alone example than it would be if I just ripped the code lock-stock out of its context. This captures the intent and the complexity (in that "cyclomatic" sense) of the original code, but that's about it. "all the names have been changed to protect the innocent" as it were.

Anyway, here's an analogy of the code in question:

public function update(int $id, array $parameters): void
{
    $this->propertyRepository->update($id, $parameters);

    if (array_key_exists('media', $parameters) && is_array($parameters['media'])) {
        $this->mediaRepository->update($id, $parameters['media']);
    }
    if (array_key_exists('metadata', $parameters) && is_array($parameters['metadata'])) {
        $this->metadataRepository->update($id, $parameters['metadata']);
    }
    if (array_key_exists('social', $parameters) && is_array($parameters['social'])) {
        $this->socialRepository->update($id, $parameters['social']);
    }
    if (array_key_exists('personnel', $parameters) && is_array($parameters['personnel'])) {
        $this->personnelRepository->update($id, $parameters['personnel']);
    }
    if (array_key_exists('availability', $parameters) && is_array($parameters['availability'])) {
        $this->availabilityRepo->update($id, $parameters['availability']);
    }
}

It's innocuous enough. This is part of PropertyService which services the business logic for a REST endpoint that handles updates to a Property. I'm in the online accommodation booking business, remember, so here a "Property" is like a hotel or something.

The raw data has been validated as "not completely wrong, missing or dangerous" at this point, but is still just in a raw array - a transformation of the incoming JSON payload from the request.

An update has a required component: the basic property info, plus a number of optional components depending on the nature of the source request. I've completely changed these here from the original, but I'm pretending a property can also comprise separate media, metadata, social, personnel and availability components to the business object, all of which have different storage implementations, marshalled by their respective repositories.

Rightly or wrongly this is a generic update endpoint which is the staging ground for everything else going on, so we've got all this conditional processing based on the existence of the relevant sub-components of the object itself. This is not so extraordinary.

But because we have five conditional bits, and each condition has a compound expression, this tips the scales of our cyclomatic complexity threshold. if we only had four of those conditions, we'd be fine. Now: no-one is saying that "four is OK; five is bad", but what we are saying is "well... four is OK; five needs further assessment". We don't wanna get too dogmatic about this stuff.

Anyway, here's PHPStorm whining at me:


(I have no idea what NPath complexity is, but I assume it's just a different way of measuring the same thing).

As I indicated, we had a quick chat and thought "do you know what? We're probably OK with this code". Granted the complexity is there, but the code is easy enough to follow. Because the conditions are "the same" each time a lot fo the perceived complexity is just metric, not reality. We also thought any tactic to reduce the perceived complexity might result in more real-world complexity.

I decided to see what I could do with it over the weekend, as a basis for this article. And, obviosuly, if I came up with a good solution we could use it.

First things first, I made sure we had 100% logical test coverage on the existsing code. This means we're in a safe place to refactor should we decide to. The tests are long and boring, so I'll not include them here. They're all pretty straight fwd anyhow. Oh, I'll repet the test cases from the data provider anyhow:

return [
    'no optionals' => ['parameters' => []],
    'media is simple' => ['parameters' => ['media' => self::$mediaParameter]],
    'media is array' => ['parameters' => ['media' => [self::$mediaParameter]]],
    'metadata is simple' => ['parameters' => ['metadata' => self::$metadataParameter]],
    'metadata is array' => ['parameters' => ['metadata' => [self::$metadataParameter]]],
    'social is simple' => ['parameters' => ['social' => self::$socialParameter]],
    'social is array' => ['parameters' => ['social' => [self::$socialParameter]]],
    'personnel is simple' => ['parameters' => ['personnel' => self::$personnelParameter]],
    'personnel is array' => ['parameters' => ['personnel' => [self::$personnelParameter]]],
    'availability is simple' => ['parameters' => ['availability' => self::$availabilityParameter]],
    'availability is array' => ['parameters' => ['availability' => [self::$availabilityParameter]]],
    'all optionals' => ['parameters' => $allParameters]
];

The tests just make sure the appropriate repo is called given the presence/absence of the optional part(s) in the inputs.

BTW I was mindful that the complexity of the tests is also part of the complexity of the code. That's a lot of test variations for one method.

Anyhow, the first thing I spotted - pretty much immediately - is the conditional logic could be simplified if we had a function which was "key exists in array and its value is itself an array". This straight-away halves the conditional expressions in the function:

public function update(int $id, array $parameters): void
{
    $this->propertyRepository->update($id, $parameters);

    if ($this->keyExistsAndIsArray('media', $parameters)) {
        $this->mediaRepository->update($id, $parameters['media']);
    }
    if ($this->keyExistsAndIsArray('metadata', $parameters)) {
        $this->metadataRepository->update($id, $parameters['metadata']);
    }
    if ($this->keyExistsAndIsArray('social', $parameters)) {
        $this->socialRepository->update($id, $parameters['social']);
    }
    if ($this->keyExistsAndIsArray('personnel', $parameters)) {
        $this->personnelRepository->update($id, $parameters['personnel']);
    }
    if ($this->keyExistsAndIsArray('availability', $parameters)) {
        $this->availabilityRepo->update($id, $parameters['availability']);
    }
}

private function keyExistsAndIsArray(string $key, array $array) : bool
{
    return array_key_exists($key, $array) && is_array($array[$key]);
}

Because I've got full test coverage, I know this solution is analogous to the original one. And the tests also cover the private method 100% too.  And it is slightly clearer. Win.

Next I took a closer look again, and each conditional block is pretty much the same too. Just a different key, and a different repo to send the optional parameters too. So I could do this in another single private function too:

private function updateOptional($id, $parameters, $optional, $repository)
{
    if (array_key_exists($optional, $parameters) && is_array($parameters[$optional])) {
        $repository->update($id, $parameters[$optional]);
    }
}

This is just a homogenised version of each if statement from the original method. The original method is now greatly simplified:

public function update(int $id, array $parameters): void
{
    $this->propertyRepository->update($id, $parameters);

    $this->updateOptional($id, $parameters, 'media', $this->mediaRepository);
    $this->updateOptional($id, $parameters, 'metadata', $this->metadataRepository);
    $this->updateOptional($id, $parameters, 'social', $this->socialRepository);
    $this->updateOptional($id, $parameters, 'personnel', $this->personnelRepository);
    $this->updateOptional($id, $parameters, 'availability', $this->availabilityRepo);
}

I'm not that cool with having a method that takes four arguments: one should aim for zero arguments, after all; and each additional argument beyond that is reducing yer code cleanliness, but in this case it's not part of the public interface and it's only used in one place, so I think it's OK. Also I've got rid of eight of the ten boolean expressions the original version had, and that is definitely a clean-code win.

And, of course, all the tests still run fine, and I've 100% coverage still.

Looking over the three solutions here, I do actually think this last one is more clear than the first, and we've not added any real complexity for complexity's sake. So I think I'll probably suggest this to the parties concerned. We won't apply it straight away as the first version of the code is written and released and working, but we'll bear it in mind should we come to maintain that code further, and perhaps as a technique to consider next time a similar situation comes up.

Also important to note that I'd not take this approach from the outset, if say we had only 2-3 conditions in that method. This really is just an "edge-case" improvement, and like I said I don't think it's worth diving in and changing unless we have reason to touch that code again. And like if we were still at the code review stage of the original code, I'd perhaps make this observation, but I'd not suggest we revisit the code before approving it. The code that stands is, after all, pretty much A-OK. It's important to not get too dogmatic about these things, and slow proceedings down just to make some OK code slightly more OK. "Good enough" is, well: good enough.

I've got another article brewing about the unit tests I wrote for this. I'll hope to tackle that during the week, but I've beer to drink this evening, so I won't be doing it then.

Righto.

--
Adam

Thursday, 3 November 2016

PHP: different ways of defining one-off objects

G'day:
This one came from a Stack Overflow question I watched be raised, commented-on and summarily closed again because there are rules to follow, and by golly the community moderators on S/O looove following rules.

Anyway, the question got asked, but never answered.

'ere 'tis:

In PHP 7, what's the use of stdClass with the availability of anonymous classes?


As of PHP 7 we have access to anonymous classes. This means we could ditch stdClass when we need generic object as a return value... But which one would be cleaner?
  • What other differences are between those two guys?
  • Is there any performance drawback on creating an anonymous class? i.e. as each one has a different "class name", is there any additional impact?
  • Do both work "the same way" from the developer point of view, or are they actually two different beasts in some cases?
<?php
$std_obj = new stdClass(); //get_class($std_obj) == 'stdClass'
$anonymous = new class {}; //get_class($std_obj) == 'class@anonymous\0file.php0x758958a3s'

Good question. Well the bit about performance ain't, cos at this stage of one's exploration into these things - asking initial questions - one clearly doesn't have a performance issue to solve, so just don't worry about it.

OK, so I like messing around with this sort of thing, and I still need to learn a lot about PHP so I decided to refresh my memory of the various approaches here.

Firstly, the "anonymous classes" thing is a new feature in PHP 7, and allows one to model objects via a literal expression, much like how one can define a function via an expression (making a closure) rather than a statement. Here's an example:

$object = new class('Kate', 'Sheppard') {

    private $firstName;
    private $lastName;
    
    public function __construct($firstName, $lastName){
        $this->firstName = $firstName;
        $this->lastName = $lastName;
    }
    
    public function getFullName(){
        return "{$this->firstName} {$this->lastName}";
    }
};

See how go new class, and then proceed to define the class right there in the following block. Note this doesn't create a class... person1 does not contain a class; it creates an object. As evidenced by my subsequent usage of it:

echo $person1->getFullName();

Output:

Kate Sheppard

TBH, I dunno about the merits of this. In PHP one can already declare classes in the same file as other code, and more than one class in the same file. I guess it's handy for when one needs to define a one-off modelled object within another function or something? I dunno.

The other version mentioned was via stdclass, but let's not go there yet.

Now bear in mind the OP was asking about creating empty objects. One can do that with an anonymous class by just not giving the class definition any body:

$empty = new class(){};

Dumping that out gives:

object(class@anonymous)#1 (0) {
}

And with stdclass:

$empty = new stdClass();

Dumping that out gives:

object(stdClass)#1 (0) {
}

Not much to it. But I dunno if the intent in PHP is to use stdClass directly. The docs just say:

Created by typecasting to object.

It doesn't say "instantiate one of these to create an empty object". And that's where it'd say it, if that was the intent.

What it does allude to is this:

$empty = (object) [];

Dumping that out gives:

object(stdClass)#1 (0) {
}

(so same as with stdClass). To me the intended usage of stdClass is just one of inference. An array cast to an object needs to be of some type, so there's stdClass to accommodate that.

Also: how often does one want an empty object, and that's it? At least with casting an array to an object one can give it some properties off the bat:

$person = (object) [
    'firstName' => 'Kate,
    'lastName' => 'Sheppard'
];

One can add properties to the object afterwards with a new stdClass(), but that seems clunky.

Thinking about it, using new stdClass() seems about as right a way to do things as using array() to declare an array, as opposed to using a literal []. Bleah. Just use the literal.

But this all got me thinking though. Empty objects be damned. How can I implement the same as that anonymous class using these other syntaxes.

I managed to write some really crap code implementing analogues using stdclass and an object literal.

First the object literal, as it's not too awful:

$person2 = (object) [
    'firstName' => null,
    'lastName' => null,
    '__construct' => function ($firstName, $lastName) use (&$person2) {
        $person2->firstName = $firstName;
        $person2->lastName = $lastName;
    },
    'getFullName' => function () use (&$person2) {
        return "{$person2->firstName} {$person2->lastName}";
    }
];

It's not sooooo bad. Until one ones to try to use it. Check this out:

$person2->__construct('Emmeline', 'Pankhurst');
echo $person2->getFullName() . PHP_EOL;

This just gives:

PHP Fatal error:  Uncaught Error: Call to undefined method stdClass::__construct() in literalVsObjectCast.php:37
Stack trace:
#0 {main}
  thrown in literalVsObjectCast.php on line 37

Yikes. This turns out to be some weirdo order of operation things, and I can't quite remember what the story is, but it's down to how and what the function call is bound to the function, and the -> operator taking precedence over the () operator. Or something.

It's easily enough solved:

($person2->__construct)('Emmeline', 'Pankhurst');
echo ($person2->getFullName)() . PHP_EOL;

See how I've used parentheses there so the method calls are on the reference to the object/method. This works:

Emmeline Pankhurst

But it's unorthodox code, so I'll be avoiding that. Oh yeah, also notice the weirdness I had to do with the reference with the closure there. Plus having to reference the object's properties by this "external" reference. Still: good to be able to do it though.

We can go from "unorthodox" to "shit" now. There's the equiv using stdclass:

$person3 = new stdclass();
$person3->firstName = null;
$person3->initial = null;
$person3->lastName = null;
$person3->__construct = function ($firstName, $initial, $lastName) use (&$person3) {
    $person3->firstName = $firstName;
    $person3->initial = $initial;
    $person3->lastName = $lastName;
};
$person3->getFullName = function () use (&$person3) {
    return "{$person3->firstName} {$person3->initial}  {$person3->lastName}";
};

And again to run it, we need to use those extra parentheses:

($person3->__construct)('Susan', 'B', 'Anthony');
echo ($person3->getFullName)() . PHP_EOL;


Susan B Anthony

It's all a bit grim for words: the literal version was a bit awkward, but this as all the awkwardness as well as a lot of clutter (and, yes, this version also supports a middle initial, but that's not really the bulk of the bloat).

Another thing with these latter two versions is that - until the class literal version - firstName and lastName are public. I guess it's no big thing for throwaway objects, but it seems a bit "leaky" to me.

One of these JavaScript tricks we learn when trying to fit the oval peg into the round hole that is "doing OO in JS", is the Module Pattern: using an IIFE to emulate an object with private data. PHP 7 also added IIFEs, it already has closure, so we can do this in PHP too:

$person4 = (function($firstName, $lastName){
    return (object)[
        'getFullName' => function() use ($firstName, $lastName){
            return "$firstName $lastName";
        }
    ];
})('Vida', 'Goldstein');


And running it:

echo ($person4->getFullName)() . PHP_EOL;

Output:

Vida Goldstein

Now I reckon that's the tidiest syntax of all! Even beating the anonymous class version. But one is still stuck with those extra parentheses on the function call though. Pity.

In conclusion, I reckon this:

  • if you want an empty object... use (object)[]. Don't use a mechanism - anonymous classes - specifically intended to derive fully modelled objects, just to create empty objects.
  • If you want a throw-away object with a few public properties in it it: still use (object) []. It's tidier and less typing.
  • If you need some methods in there as well... go for the anonymous class literal; its usage syntax is more orthodox than using the IIFE.
  • But do remember PHP7 has IIFEs! They come in handy for things other than emulating JS design patterns.
  • I don't reckon one ought to directly use stdClass. It's clunky, leads to bloated code, and the docs kinda position it as a necessary implementation, not something that one's supposed to actively use.
  • But... generally speaking use a proper named class. These anonymous classes are not a replacement for them; they're for throw-away objects. You'll seldom need those.
  • Don't start an exercise worrying about performance. Write good clean code, using TDD, and worry about performance if it comes up. And differences like this are never gonna be where to make savings anyhow.

Thoughts? Obviously I'm just a relative n00b at PHP so I might be missing some important thing that seasoned PHPers all "get" but I've not come across yet. Lemme know.

Righto.

--
Adam

Thursday, 7 April 2016

Maintaining untested code, and mocking too much when backfilling test (with JavaScript & Jasmine)

G'day:
Here's another example of me doing something wrong. Well: there's a litany of wrongness in this exercise, and I'll show the whole thing from start to finish. But this all stems from me ballsing something up.

Caveat

Be warned: there's a lot of code in this one, but the main point is the volume of the code I found necessary to write to test and fix a tiny bug. You don't need to worry too much about the content of the code... the important part is the line number at the bottom of the file.

The background to this is that I was helping a former colleague with some CFML code ("just when I thought I was out..."), and it was whilst reconfiguring stuff, I got hit by a trivial bug... the resolution to which is the basis of this article. I could not use their actual code in the article here due to NDA reasons, but I also didn't want to be discussing CFML either. So I've undertaken to "refactor" the entire thing into JavaScript instead (I don't mean their code; I mean a version of their logic which I will discuss here). The JavaScript code closely resembles the actual code I was looking at, but obviously some of the approaches to doing things needed to change, and I've repurposed some of the elements of it to better suit what I'm discussing. The logic and "code distribution" is about a 95% match to the original case though, I think. Also I'm using Jasmine to do JavaScript testing in this example, and I was using a CFML-based solution for the original code. This is irrelevant to the situation as both work mostly the same way, and produce similar amounts of code for the equivalent testing and mocking tasks.

I will start by saying that the code is a product of its time and their coding policies at that time. I'm happy to say they've since revised their approach to things, and are fine-tuning it as they go. the code base - like all others - is an evolving beast. And indeed to be fair to them: it's a testament to their evolving coding process that I'm able to write this article.

Background

This code comes some a project which is built with an MVC framework: the details of which are irrelevant to this article. They also use a dependency injection container to handle their dependencies.

The code in this article is part of the TranslatorProvider, which creates a TranslationService, and loads it with translations originally stored in some (optionally cached) data repository. The implementation details of these are not relevant here. These translations are for on-screen labels and other text. The bit of code I'm focusing on in this article is the logic to load the translations from storage. They don't quite follow the service / repository model I have presented here, but it's the same beast by different names, and the service / repository model suits me better so that's how I've labelled things here.

My mate's organisation produces numerous small websites, all of which are multi-lingual, and all share this code.

New site

They're in the process of building a new site, which is where I came in. We'd been piggy-backing onto the translation dictionaries for one of the existing sites whilst we were protoyping. the onscreen-labels we need to translate follow the same general theme, so it's workable - during development - to use one site's translations for another site, to a point (some translations don't work, but that's OK). It took a while to prepare the dictionaries for the new site (a different team was doing that), but when they came back from the translator I switched the dictionary bundles over.

I did this, and the whole site broke. Switched the dictionary back to the older one: site worked again. Switch it forward: site broke. OK: there's something wrong with our translations. Oh, I switched the translations on another one of their test sites - one I knew in which all the work was sound (ie: not written by me) - and that site broke, Harrumph. What is it about some translation dictionaries that can break a site?

Bug

I dug into the translator code and quickly (well: it took a few hours) found the problem:

var translations;

// ...

for (var key in rawTranslations.primary){
    translations.primary = translations.primary || {};
    translations.primary[key] = rawTranslations.primary[key];
}

for (var key in rawTranslations.secondary){
    translations.secondary = translations.secondary || {};
    translations.secondary[key] = rawTranslations.secondary[key];
}

var ttl = config.ttl;

cacheService.put(cacheKeyPrimary, translations.primary, ttl);
cacheService.put(cacheKeySecondary, translations.secondary, ttl);


Can you see it?

What happens if - as it is in our case - one hasn't got any secondary translations? translations.secondary never gets created, so that last line will break.

The fix is simply to initialise that object outside of the loop:

translations = {primary : {}, secondary : {}};

Simple.

Well except for one thing. No test coverage. This code was written before they had a TDD policy, so it's got no tests at all. Note that this bug never would have happened had they used TDD when developing this code, because the very first test would have been with empty arrays. I made sure to point this out to them, but they rightfully pointed out when it was written and they now do have a full TDD policy. And that comes into play for my fix: I needed to use TDD for implementing my fix.

Back-fill

My policy with maintaining untested code using our new TDD approach is that I will not back-fill all missing tests. I will only implement the minimum testing needed for the case I'm fixing. if I have time I'll make shell tests with a "skip" in them so the missing coverage gets flagged in the test run, but I'll not bother too much other than that.

So in this case all I need to test is that if the secondary object is null, that null gets send to the cache. Simple.

Or is it?

Unclean code

Unfortunately this code was also written before the era of "write Clean Code", so the function itself is doing way too much. It's 50 lines long for a start, which is a red flag, but one can easily see different sections of processing in it too, so it violates the "functions should do one thing" rule. This becomes a problem because even to test my wee fix, I need to make sure the preceding 40-odd lines of code don't break or do the wrong thing before I get to my line of code. And as this code leverages a fair number of dependencies, there's a bunch of mocking work to do before I can even get to what I want to test. Note: the dependencies in the original code were handled differently in that they passed the DI container into this provider, and just plucked out whatever they need when they need it. I cannot bring myself to show code like that (even for a blog article), so I've broken them down into multiple dependencies with specific roles. The logic is the same, and it produces the same amount of code (both sample code and test code).

If the code was Clean and I was just looking at a function that map the source objects(s) into the target objects(s), then my test job would be easy. But no.

Let's look at what stands between me and where I can test the results of my line of code. I've highlighted what needs mocking:

if (!requestService.isTranslatorEnabled()){
    return;
}

var currentLocale = requestService.getLocale();

var cacheKeyPrimary = getCacheKey("primary", currentLocale);
var cacheKeySecondary = getCacheKey("secondary", currentLocale);

var okToGetFromCache = cacheService.isActive()
    && cacheService.exists(cacheKeyPrimary)
    && cacheService.exists(cacheKeySecondary);

var translations = {};
if (okToGetFromCache) {
    translations.primary = cacheService.get(cacheKeyPrimary);
    translations.secondary = cacheService.get(cacheKeySecondary);
}else{
    var rawTranslations = {
        primary : translationRepository.loadBundle("primary", currentLocale),
        secondary : translationRepository.loadBundle("secondary", currentLocale)
    };

    for (var key in rawTranslations.primary){
        translations.primary = translations.primary || {};
        translations.primary[key] = rawTranslations.primary[key];
    }

    for (var key in rawTranslations.secondary){
        translations.secondary = translations.secondary || {};
        translations.secondary[key] = rawTranslations.secondary[key];
    }

    var ttl = config.ttl;

    cacheService.put(cacheKeyPrimary, translations.primary, ttl);
    // [...]

  1. requestService.isTranslatorEnabled
  2. requestService.getLocale
  3. cacheService.isActive
  4. translationRepository.loadBundle
  5. config.ttl
  6. cacheService.put
But that's not all. The function has to complete without error. So I also need to mock this lot:

var translationService = translationFactory.getTranslator();
var arrayLoader = translationFactory.getArrayLoader();

translationService.load(arrayLoader.load(translations.primary), 'primary', currentLocale);
translationService.load(arrayLoader.load(translations.secondary), 'secondary', currentLocale);

  1. translationFactory.getTranslator
  2. translationFactory.getArrayLoader
  3. arrayLoader.load
  4. translationService.load

What a nightmare. If this code was written cleanly, I'd simply have to test a method that takes a coupla arguments and returns a value. Simple.

Monday, 28 March 2016

Another example of premature optimisation

G'day:
I initially posted this to Stack Overflow, but with a mind to releasing it here too. It's worth repeating.

The original question was along these lines. Given two approaches to making two decisions, which is the most performant:

function f(){

    // other code redacted

    if (firstBooleanValue && secondBooleanValue){
        return true;
    }
    return false;
}

Or:
function f(){

    // other code redacted

    if (firstBooleanValue){
        if (secondBooleanValue){
            return true;
        }
    }
    return false;
}

Someone else answered with a loop-de-loop amplification test to demonstrate [something, I don't care which way it came out, and nor should you], but they did caveat it with "it doesn't matter".

My answer was as follows:

I know @tom starts his answer by saying "it doesn't matter", but this cannot be stressed enough, so I'm going to elaborate further in a separate answer.

@piotr-gajdowski: just don't worry about this sort of thing. If one needs to loop thousands of times to see a difference, this implies it's a case of micro-optimisation. The differences here will be lost in the background noise of any other consideration going on at the same time, for example:
  • what else the server is doing at the time;
  • network latency variation;
  • how busy external dependencies like DBs are at the time.
So... it simply doesn't matter.

Always go for the code that you find clearest, and - more importantly - that other people will find the clearest. This generally means the simplest.

For your specific situation, given you given us "fake" code, it's impossible even to say given your example what that might be:
  • how complex are the two conditions?
  • Are they really just single variables?
  • Do the variables have explanatory names?
  • If they conditions are actually boolean expressions rather than variables... consider putting them in well-named intermediary variables.

These are the things you should be worried about.

Write your code as clearly as possible (this doesn't not necessarily mean as tersely as possible, btw). If at some point you find you have performance issues, then (and only then) start looking for optimisations. But don't start looking for this sort of optimisation. Start looking for things that will make a difference.

I think the best thing you could do with your code example would be to submit the real world code to Code Review, and then we can help you come up with the clearest approach to coding it. That's the question you should be asking here.
If all we had to go on was the example provided, then it's difficult to say what the clearest approach would be here. The variable names are nonsense in the posted code, so one cannot infer much from them.

If say we are talking about two variables, and they're clearly named, this would be a clear option:

function canAttendMeeting(){

    // other code redacted

    return hasFreeTime && isInOffice;
}

If however there's no intermediary descriptive variables hasFreeTime and isInOffice, then don't do this:

function canAttendMeeting(){

    // other code redacted

    return (meetingStart > dayBegin & meetingStart < datEnd & withinFreePeriod(meetingStart, meetingEnd)) && (person.locationAtTime(meetingStart, meetingEnd) == "office");
}

Stick 'em in intermediary variables.

If the logic to derive the second boolean expression is somehow predicated on the first condition condition being true, and short circuiting won't work, then I'd consider taking an early exit over a nested if:

function f(){

    // other code redacted

    if (not firstBooleanValue){
        return false;
    }
    if (secondBooleanValue){
        return true;
    }
    return false;
}

All things being equal, I attempt to keep my code as close to the left margin as possible. If I have to scan for the end of a logic block, rather than just see it at the same time as the top of the logic block, I've probably got a chance to simplify the logic. I am always a bit wary of of having a negative expression in a decision statement, so if poss I'd try to reverse the name of an intermediary variable:

function canAttendMeeting(){

    // other code redacted

    if (doesNotHaveFreeTime) {
        return false;
    }

    // other code only possible if the person has free time
    
    return isInOffice;
}

And there will be other variations which are more or less viable given the circumstances. But generally the consideration is code clarity, not micro-optimising a situation not yet known to need it. And in that case, one needs to consider things a bit more clearly than how to order if statements.

Righto.

--
Adam

Wednesday, 18 February 2015

Clean Code

G'day:
Not something by me, but just a heads-up to go read one of Gav's blog articles: "CommandBox - Commands can be Clean Code Too". Don't worry about the fact it's about *Box stuff, that's irrelevant. It's a good practical demonstration of applying Clean Code concepts to CFML code.

--
Adam

Monday, 1 December 2014

Hammers and screws (and when not to use a list in CFML)

G'day:
I hope Salted (/Ross) and Ryan don't mind me pinching our IRC conversation for a blog article, but... well... too bad if they do. It's done now. This is it.

Ross was asking "whats the most efficient way to remove the first two items in a list and get back the remainder", which made my eyes narrow and decide there's probably something to get to the bottom of here. Because he mentioned "lists". And these should - in general - be avoided in CFML.



After some back and forth with Ryan, Ross settled on:

listtoarray(foo," "), arrayslice(bar,3), arraytolist(bar," ")


(pseudo code, obviously).

Now... I think lists are overused in CFML, and - indeed - they are a bit rubbish. Having a string of data and then using one character (OK, or "some characters ~") from the string as a delimiter is seldom going to work, especially with the default delimiter of comma. Because in really a lot of situations, commas are actually part of the data. And same with any other char-based delimiter (except say characters specifically intended to be delimiters like characters 28-31 (file, group record and unit separators, respectively). I really don't think lists should generally be part of a solution to anything.

With this in mind, I decided to check what was going on. This is copy-and-pasted from the IRC log (spelling and formatting addressed, otherwise it's unabridged):

16:36 <adam_cameron> Salted: first question that should be asked... why is it a list in the first place? Do you have any control over it?
16:36 <Salted> no, it's returned from a postcode lookup service
16:36 <Salted> I'm fairly certain that the address string will always have the postcode first so [p1] [pt2] [line1]
16:37 <Salted> it's not a list its a string but I'm treating it as a space delimited list for these purposes...
16:37 <adam_cameron> so it's a string? Not a list per se?
16:37 <adam_cameron> ok
16:37 <Salted> indeed
16:37 <adam_cameron> So what you actually want is everything after the second space?
16:37 <Salted> essentially yes
16:37 <adam_cameron> Not a list with the first two elements dropped off
16:37 <Salted> in this instance what you've mentioned is one and the same, if I treat it as a list
16:38 <adam_cameron> well literally, yes. However once you start treating it as a list, you're thinking about it wrong
16:39 <Salted> I don't see how thats different from thinking of strings as char arrays, for example
16:40 <Salted> "my name is ross" is both a char array of 14 elements and a list of 4 elements
16:40 <adam_cameron> because you start trying to use list functions on it
16:40 <adam_cameron> Which makes your code shit


My point here - one that Ross doesn't entirely agree with - is that just because one can use a string as a list, doesn't mean - semantically - it is a list. And one should try to keep one's code as semantically clear as possible.

What Ross has here is not a space-separated list. Later in the conversation I extracted form him what the actual data was like, and this was a sample:

X1 2YZ Apartment 1903, 10 Random Street Suburbia, OURTOWN


What we have here is... a mess. It can be assessed in a number of ways:

  • a post code part followed by an address part
  • a post code followed by a street address followed by a city
  • a post code followed by the street part of the address, the suburb part of the address, and the city part of the address
But what it isn't is a space- (or comma-, or anything-) separated list. So one should not solve this problem using lists.

One should look at the actual requirement, and solve that.