Showing posts with label Design Patterns. Show all posts
Showing posts with label Design Patterns. Show all posts

Tuesday 20 September 2022

CFWheels: a recommendation for their dev team

 G'day

I'm going to pass this on to the CFWheels Dev Team, but it's applicable in anyone's code, so posting it here first.

A lot of the code in the CFWheels codebase hasn't really been "designed". It's been written as if the developer concerned just went "OK, I have this file open… I'll start typing code in this file…", rather than giving thought to the notions of design patterns or the SOLID principles and stuff like that that exist to keep codebases… usable.

It's legacy code, and we have all written code like this in the past, so let's not dwell too much on how the codebase got to where it is now. Let's just accept that it could be better than it is.

But It's 2022 now, and there's no real excuse for perpetuating coding practices like this.

An example in front of me ATM is the onApplicationStart method. It is 1000 lines long. That is ludicrous.

The design problem in this particular method (and this is the same for a lot of code in the CFWheels codebase) is that it confuses "the things I need to get done", with "how to do those things". It's onApplicationStart's job to… start the application. There are numerous steps to this, but instead of it just calling all the steps it needs to call to define "starting the application", it also has all the implementation of the steps inline in it as well. This is an anti-pattern.

One can even see what each of the steps are. There are comments identifying them. In fact the first two are done "right":

// Abort if called from incorrect file.
$abortInvalidRequest();
// Setup the CFWheels storage struct for the current request.
$initializeRequestScope();

(Except the comments here are pointless as they simply repeat what the method names already say clearly).

But then the wheels (sic) fall off:

if (StructKeyExists(application, "wheels")) {
    // Set or reset all settings but make sure to pass along the reload password between forced reloads with "reload=x".
    if (StructKeyExists(application.wheels, "reloadPassword")) {
        local.oldReloadPassword = application.wheels.reloadPassword;
    }
    // Check old environment for environment switch
    if (StructKeyExists(application.wheels, "allowEnvironmentSwitchViaUrl")) {
        local.allowEnvironmentSwitchViaUrl = application.wheels.allowEnvironmentSwitchViaUrl;
        local.oldEnvironment = application.wheels.environment;
    }
}
application.$wheels = {};
if (StructKeyExists(local, "oldReloadPassword")) {
    application.$wheels.reloadPassword = local.oldReloadPassword;
}

That could should not be inline. There should be a method call inline, called something like configureReloadOptions (or whatever this is doing… a case in point is that I can't really tell).

The thing is onApplicationStart should not be defining the implementation of this, something else should be doing that, and onApplicationStart should just know how to call it.

Why is this an issue?

Because for us, one of the sections of processing in onApplicationStart is wrong, and sufficiently wrong it's been flagged up as a pen vector.

If each part of the application-start-up sequence was in its own functions, then we could override the function with our own implementation to address the issue. And the change would be local to that one small function, so the testing surface area would be small. But instead I now have to hack a third-party codebase to solve this. It's 100s of lines into this method, and accordingly its test surface area is huge (and largely unreachable, I think).

This is basic Open/Closed Principle stuff. The O in SOLID.

My recommendation / rule of thumb is that whenever one is tempted to put a comment in that explains a block of code (or delimit it from other blocks), then that block of code should be in its own function. If one follows this rule of thumb, one seldom gets into the situation that one has Open/Closed Principle issues. It's not foolproof, but it's a good start.

Template Method design pattern

2023-01-31

As Ben points out in the comments, this is an example of the Template Method design pattern at work. So it's not like I am advocating some esoteric approach that I made up myself, it's an established design pattern we ought to be using.

Cheers for the heads-up Ben.

Also having a code design rule that functions should start to raise a flag if they are over 20 lines long. They are probably doing too much, or aren't factored well. If they are more than 50 lines? Stop. Recode them.

If a function has unrelated flow-control blocks in them? Also a red flag. Each block is likely more appropriate to be in their own functions.

A function like onApplicationStart that needs to carry-out a _lot_ of steps to consider the app to be "started" should look something like this:

function onApplicationStart() {
    doTheThing()
    doAnotherThing()
    doSomethingElse()
    doThisThingRelatingToStuff()
    doOtherStuff()
    doThis()
    doThat()
   // ...
}

It really shouldn't be implementing any stuff or things or this or that itself.

Lastly, someone drew my attention to this statement on the CFWheels home page:

Good Organization

Stop thinking about how to organize your code and deal with your business specific problems instead.

This possibly explains a lot of the way the CFWheels codebase got to be the way it is. It is really bad advice. One should always be thinking about how one's application ought to be designed. It's important.

NB: this is not just related to onApplicationStart. That's just what currently has me sunk. It relates to really a lot of the code in the codebase.

This is also something that can be tackled in a piecemeal fashion. If one has to maintain some code in a really long method: extract the block of code in question into its own function, and maintain (and test that).

Righto.

--
Adam

Monday 18 April 2022

CFML: Adding a LogBox logger to a CFWheels app via dependency injection

G'day:

This follows on from CFML: implementing dependency injection in a CFWheels web site. In that article I got the DI working, but only with a test scenario. For the sake of completeness, I'm gonna continue with the whole point of the exercise, which is getting a logger service into my model objects, via DI.


0.6 adding LogBox

I ran into some "issues" yesterday whilst trying to write this article. Documented here: "A day in the life of trying to write a blog article in the CFML ecosystem", and there's some file changes committed as 0.5.1. So that killed my motivation to continue this work once I'd waded through that. I appear to be back on track now though.

Right so I'll add a quick test to check LogBox is installed. It's a bit of a contrived no-brainer, but this is playing to my strengths. So be it:

describe("Tests the LogBox install", () => {
    it("is instantiable", () => {
        expect(() => getComponentMetadata("LogBox")).notToThrow(regex="^.*can't find component \[LogBox\].*$")
    })
})

If LogBox is where it's supposed to be: it'll pass. Initially I had a new LogBox() in there, but it needs some config to work, and that requires a bit of horsing around, so I'll deal with that next. For now: is it installed? Test sez "no":

Test sez… yes.

OK. That was unexpected. Why did that pass? I have checked that LogBox is not installed, so WTH??

After a coupla hours of horsing about looking at TestBox code, I worked out there's a logic… um… shortfall (shall I say) in its implementation of that regex param, which is a bit wayward. The code in question is this (from /system/Assertion.cfc):

 if (
    len( arguments.regex ) AND
    (
        !arrayLen( reMatchNoCase( arguments.regex, e.message ) )
        OR
        !arrayLen( reMatchNoCase( arguments.regex, e.detail ) )
    )
) {
    return this;
}

Basically this requires both of message and detail to not match the regex for it to be considered "the same" exception. This is a bit rigorous as it's really unlikely for this to be the case in the real world. I've raised it with Ortus (TESTBOX-349), but for now I'll just work around it. Oh yeah, there's a Lucee bug interfering with this too. When an exception does have the same message and details, Lucee ignores the details. I've not raised a bug for this yet: I'm waiting for them to fee-back as to whether I'm missing something. When there's a ticket, I'll cross-post it here.

Anyway, moving on, I'll just check for any exception, and that'll do:

expect(() => getComponentMetadata("LogBox")).notToThrow()

That was a lot more faff to get to this point than I expected. Anyway. Now I install LogBox:

root@60d2920f4c60:/var/www# box install logbox
√ | Installing package [forgebox:logbox]

0.7 wiring LogBox into the DependencyInjectionService

One of the reasons the previous step really didn't push the boat out with testing if LogBox was working, is that to actually create a working LogBox logger takes some messing about; and I wanted to separate that from the installation. And also to give me some time to come up with the next test case. I want to avoid this sort of thing:

Possibly © 9gag.com - I am using it without permission. If you are the copyright owner and wish me to remove this, please let me know.

I don't want to skip to a test that is "it can log stuff that happens in the Test model object". I guess it is part ofthe requiremnt that the logger is handled via dependency injection into the model, so we can first get it set up and ready to go in the DependencyInjectionService. I mean the whole thing here is about DI: the logger is just an example usage. I think the next step is legit.

I've never used LogBox before, so I am RTFMing all this as I type (docs: "Configuring LogBox"). It seems I need to pass a Config object to my LogBox object, then get the root logger from said object… and that's my logger. Al of that can go in a factory method in configureDependencies, adn I'll just put the resultant logger into the IoC container.

it("loads a logger", () => {
    diService = new DependencyInjectionService()

    logger = diService.getBean("Logger")

    expect(logger).toBeInstanceOf("logbox.system.logging.Logger")

    expect(() => logger.info("TEST")).notToThrow()
})

I'm grabbing a logger and logging a message with it. The expectation is simply that the act of logging doesn't error. For now.

First here's the most minimal config I seem to be able to get away with:

component {
    function configure() {
        variables.logBox = {
            appenders = {
                DummyAppender = {
                    class = "logbox.system.logging.appenders.DummyAppender"
                }
            }
        }
    }
}

The docs ("LogBox DSL ") seemed to indicate I only needed the logBox struct, but it errored when I used it unless I had at least one appender. I'm just using a dummy one for now because I'm testing config, not operations. And there's nothing to test there: it's all implementation, so I think it's fine to create that in the "green" phase of "red-green-refactor" from that test above (currently red). With TDD the red phase is just to do the minimum code to make the test pass. That doesn't mean it needs to be one line of code, or one function or whatever. If my code needed to call a method on this Config object: then I'd test that separately. But I'm happy that this is - well - config. It's just data.

Once we have that I can write my factory method on DependencyInjectionService:

private function configureDependencies() {
    variables.container.declareBean("DependencyInjectionService", "services.DependencyInjectionService")
    variables.container.declareBean("TestDependency", "services.TestDependency")

    variables.container.factoryBean("Logger", () => {
        config = new Config()
        logboxConfig = new LogBoxConfig(config)

        logbox = new LogBox(logboxConfig)
        logger = logbox.getRootLogger()

        return logger
    })
}

I got all that from the docs, and I have nothing to add: it's pretty straight forward. Let's see if the test passes:

Cool.

Now I need to get my Test model to inject the logger into itself, and verify I can use it:

it("logs calls", () => {
    test = model("Test").new()
    prepareMock(test)
    logger = test.$getProperty("logger")
    prepareMock(logger)
    logger.$("debug")

    test.getMessage()

    loggerCallLog = logger.$callLog()
    expect(loggerCallLog).toHaveKey("debug")
    expect(loggerCallLog.debug).toHaveLength(1)
    expect(loggerCallLog.debug[1]).toBe(["getMessage was called"])
})

Here I am mocking the logger's debug method, just so I can check it's being called, and with what. Having done this, I am now wondering about "don't mock what you don't own", but I suspect in this case I'm OK because whilst the nomenclature is all "mock", I'm actually just spying on the method that "I don't own". IE: it's LogBox's method, not my application's method. I'll have to think about that a bit.

And the implementation for this is way easier than the test:

// models/Test.cfc
private function setDependencies() {
    variables.dependency = variables.diService.getBean("TestDependency")
    variables.logger = variables.diService.getBean("Logger")
}

public function getMessage() {
    variables.logger.debug("getMessage was called")
    return variables.dependency.getMessage()
}

Just for the hell of it, I also wrote a functional test to check the append was getting the expected info:

it("logs via the correct appender", () => {
    test = model("Test").new()

    prepareMock(test)
    logger = test.$getProperty("logger")

    appenders = logger.getAppenders()
    expect(appenders).toHaveKey("DummyAppender", "Logger is not configured with the correct appender. Test aborted.")

    appender = logger.getAppenders().DummyAppender
    prepareMock(appender)
    appender.$("logMessage").$results(appender)

    test.getMessage()

    appenderCallLog = appender.$callLog()

    expect(appenderCallLog).toHaveKey("logMessage")
    expect(appenderCallLog.logMessage).toHaveLength(1)
    expect(appenderCallLog.logMessage[1]).toSatisfy((actual) => {
        expect(actual[1].getMessage()).toBe("getMessage was called")
        expect(actual[1].getSeverity()).toBe(logger.logLevels.DEBUG)
        expect(actual[1].getTimestamp()).toBeCloseTo(now(), 2, "s")
        return true
    }, "Log entry is not correct")
})

It's largely the same as the unit test, except it spies on the appender instead of the logger. There's no good reason for doing this, I was just messing around.


And that's it. I'm pushing that up to Git as 0.7

… and go to bed.

Righto.

--
Adam

Wednesday 14 July 2021

Switch refactoring on the road to polymorphism

G'day:

A while back I wrote an article "Code smells: a look at a switch statement". That deals with a specific case which was pretty tidy to start with, and wasn't completely clear as to whether the original or my refactoring was "better" (I think the initial switch was OK in this case).

Exceptions will always prove the rule as "they" say, and I'm gonna chalk that one up to a case of that. Also a good example of why sometimes it's not good to be dogmatic about rules.

However there a coupla general forms of switch statements that one can codify general refactoring advice for. Well: the main general rule is "switches are seldom necessary in your business logic code". If you find yourself writing one: stop to think if yer letting yerself down from an OOP perspective.

The other two situations I'm thinking of boil down to sticking to the Single Responsibility Principle: not munging more than one thing into a function basically. In the case of a switch, the two things are "what to do" and "how to do it". As well as when you fire a switch right into the middle of some other code, it's obvious it's breaking the SRP.

I'll start with that latter one.

Instead of this:

function f() {

    // code statements
    
    switch
    
    // code statements
}

Do this:

function f() {
    // code statements
    
    switchHandler()
    
    // code statements
}

function switchHandler() {
    switch
}

Note that switchHandler quite possibly doesn't even belong in the same class as the code using it.

In this case the first function is already probably doing more than one thing… it's got three sections, so that says "three things" to me. The whole thing should be looked at, but take the switch out completely. It is definitely and clearly doing at least one additional thing just by itself.

The second one is the other way around. Don't do this:

function f() {
    switch (expression) {

        case "A":
            // implementation of A
            break;

        case "B":
            // implementation of B
            break;

        case "C":
            // implementation of C
            break;

        default:
            // implementation of default
    }
}

Do this:

function f() {
    switch (expression) {

        case "A":
            caseAHandler()
            break;

        case "B":
            caseBHandler()
            break;

        case "C":
            caseCHandler()
            break;

        default:
            defaultHandler()
    }
}

function caseAHandler() {
    // implementation of A
}

// etc

Keep the "what to do" and the "how to do it" separate from each other: they are two different things (and the "how to do it" are each different things, so you have an 1+n degree of code smell going on there).

If you do these two things, a proper polymorphic handling of the situation - wherein you just use a type to deal with the logic - might start becoming more clear. Even if it doesn't straight away, this will make the refactoring easier when it does. You could use a factory to get the correct case handler, or if you know ahead of time what expression is, you can just pass in the correct handler in the first place. Or some other way of doing things. But even without taking it further, your code is less smelly, so is better.

Also note that if you have multiple switch statements with the same expression, then it's a clue that the whole lot should come out into its own class, and that a more OOP handling of the situation would simplify your code.

I only had 15min to write this this morning, so apologies it's a bit clipped-sounding. Must dash.

Righto.

--
Adam

Sunday 11 July 2021

Another thought on controllers and where the buck should stop

G'day:

I just wrote an article on what a controller ought to limit itself to: "What logic should be in a controller? (and a wee bit of testing commentary)". But then I read an old question from Mingo on the article that inspired that, and I have some thoughts on that too.

In most of my code examples around controller methods, I have this sort of method signature:

function handleGet(rawArgs)

What I mean by "raw args" is "all the stuff from an HTTP request, including: query string parameters and arguments (URL scope if yer a CFMLer), request body keys and values (form scope), general request metadata (CGI scope), cookies, and headers. Kind of like how Symfony would do it in the PHP world.

In CFML land things are seldom (never?) this organised. One seems to get a hotchpotch of things possibly put into a request context or something, or possibly - as in CFWheels - actually nothing(!!!) gets passed into the controller method; you just need to know the magical place to go look for them. And by "them" I mean a struct that has the form and URL scope munged into it. Ugh. Anyhow, there's all these elements of an HTTP request and the application lifecycle (application, session, request scopes) available to yer CFML code somehow.

And the controller is the only place one should ever access those. Your business logic should never be tightly-coupled the notion of "this stuff came from a specific sort of HTTP request", or from a specific stage in the application's lifecycle. Or even be aware of the ideas of HTTP requests and the like.

Your model tier should just get "values". Ideally by the time you are applying any actual application logic to them, the values will have been modelled into their own objects, not simply a bunch of primitive values.

I guess one could consider a controller to be similar in role to a repository class, just the other way around. A repository encapsulates the mapping between [for example] storage records - which it fetches - and collections of objects - which it returns - to the business-logic tier. A controller is slightly skinnier than that, it takes the values needed from that selection of HTTP request components (URL scope, form scope, what-have-you), and just passes them as independent values - distinct from how they arrived - to the model. I guess it's not a direct parallel because the controller doesn't do the "mapping" part that is intrinsic to a repository; it just removes the context from the values it receives, and leaves it up to the model tier to know what to do with them. But anyhow, there's a clear separation of concerns here, and the separation is that only the controller should ever deal with these things: CGI scope, the value returned from GetHttpRequestData(), form scope, URL scope, cookie scope, application scope, session scope, request scope.

Righto.

--
Adam

What logic should be in a controller? (and a wee bit of testing commentary)

G'day:

This topic has come up for me twice from different directtions in the last week or so, so I'm gonna dump some thoughts. I've actually discussed this before in "I actively consider what I think ought to go into a controller", and the conclusion I came to doesn't quite fit with how I'm writing code now so I'm gonna revise it a bit.

To start with though, I'll reiterate this question: "should this go in the controller?", and I'll repeat Mingo's answer that is still pithy but spot on:

Isn't the answer always "No"?
~ mjhagen 2017

This is a good starting point. 95% of the time if yer asking yerself that question, Mingo has answered it for you there.

The example of what I'd put in a controller from that article is along these lines:

class ContentController {
    function handleGet(rawArgs){
        args = validationService.validate(rawArgs)
        
        content = contentService.getById(args.id)
        
        response = {
            articles = content.published,
            socialContent = {
                twitter = content.twitter,
                facebook = content.facebook
            }
        }
        return new Response(response)
    }
}

It's not the worst code I've written, but I now think this is wrong. The problem lies with how I had a habit of abusing the Anaemic Domain Model pattern in the past, where I had a bunch of really skinny service classes, and used them to apply behaviour to behaviourless model objects that were just bags of data. Not great.

Looking at that code now, I see these lines:

args = validationService.validate(rawArgs)
content = contentService.getById(args.id)

And I think "nah, those don't belong there. They belong in the model too". I'm doing too much "asking" when I should be "telling" (see "TellDontAsk" by Martin Fowler).

Basically the model here should know what it is to be "valid", so just give it the raw data and let it crack on with it.

My generic controller method these days would be formed along these lines:

function handleRequest(rawRequestValues) {
    try {
        dataFromModel = someModel.getSomeDataFromThisLot(rawRequestValues)
        
        renderedResponse = viewService.renderView("someView", dataFromModel)
        
        return new HtmlResponse(renderedResponse)
        
    } catch (ClientException e) {
        return new ClientErrorResponse(e)
    }
}

Here we clearly have a separation of controller, model and view. It's the controller's job to marshal getting values to a model, and getting the values from that to a view, deal with any error responses that might arise due to those two, or return what came back from the view tier as the response. That's it.

There's an assumption that the framework will deal with any unhandled exceptions there as a controlled 5xx type response. Also there could well be more catch statements, if different types of exception could bubble out of the model, for instance a ValidationException which returns details of validation failures in its response; or a 404 response being returned if a UserNotFoundException came back from some business-logic validation or whatever. But that's the pattern.

The key here is that the only time I'm using a value created by the model is to pass it to the view. I do not pass it to anything else in the interim, like some other model call. That action is not controller logic. It's business logic that we get an x and then pass it to a y. It should be encapsulated in the model.

On the other hand if there was more than one piece of view data to be derived directly from the incoming request values, then that would to me still possibly be legit to be in the controller, eg this is OK:

dataFromModel = someModel.getSomeDataFromThisLot(rawRequestValues)
moreDataFromDifferentModel = someModelOther.getSomeDifferentDataFromThisLot(rawRequestValues)

This would not be OK:

dataFromModel = someModel.getSomeDataFromThisLot(rawRequestValues)
moreDataFromDifferentModel = someModelOther.getSomeDifferentDataFromThisLot(dataFromModel.someValue)

It's a small distinction. But the thing to focus on more than that small example is just to be thinking "no" when you ask yerself "does this belong in the controller?". You're more likely to be right than wrong.


How do we apply that pattern to the example in the old article? Like this I think:

// UserContentController.cfc
component {

    function init(ViewService viewService, UserContentFactory userContentFactory) {
        variables.viewService = arguments.viewService
        variables.userContentFactory = arguments.userContentFactory
    }

    function getContent(rawArgs) {
        try {
            userContent = userContentFactory.getUserContent().loadContentByFilters(rawArgs)
            
            renderedResponse = viewService.renderView("userContentView", userContent)
            
            return new HtmlResponse(renderedResponse)
            
        } catch (ValidationException, e) {
            return new ClientErrorResponse(400, e)
        } catch (UserNotFoundException e) {
            return new ClientErrorResponse(404, e)
        }
    }
}

(I've changed what the controller is returning so as to still integrate the view tier into the example).

I've done away with the controller handling the validation itself, and left that to the model. If things don't pan out: the model will let the controller know. That's it's job. And it's just the controller's job to do something about it. Note that in this case I don't really need both catches. I could just group the exceptions into one ClientException, probably. But I wanted to demonstrate two potential failures from the logic in loadContentByFilters.


What's with this factory I'm using? It's just one of my idiosyncrasies. I like my models' constructors to take actual valid property values, like this:

// UserContent.cfc
component accessors=true invokeImplicitAccessor=true {

    property publishedContent;
    property twitterContent;
    property facebookContent;

    function init(publishedContent, twitterContent, facebookContent) {
        variables.publishedContent = arguments.publishedContent
        variables.twitterContent = arguments.twitterContent
        variables.facebookContent = arguments.facebookContent
    }

Our UserContent represents the data that are those content items. However we've not been given the content items, we've just been given a means to get them. So we can't just create a new object in our controller and slap the incoming values into them. We need to have another method on the UserContent model that works with what the controller can pass it:

function loadContentByFilters(required struct filters) {
    validFilters = validationService.validate(filters, getValidationRules()) // @throws ValidationException
    
    user = userFactory.getById(validFilters.id) // @throws UserNotFoundException
    
    variables.publishedContent = contentService.getUserContent(validFilters)
    variables.twitterContent = twitterService.getUserContent(validFilters)
    variables.facebookContent = facebookService.getUserContent(validFilters)
}

And this demonstrates that to do that work, UserContent needs a bunch of dependencies.

I'm not going to pass these in the constructor because they aren't 100% needed for the operation of a UserContent object, and I want the constructor focusing on its data. So instead these need to be injected as properties:

// UserContent.cfc
component accessors=true invokeImplicitAccessor=true {

    property publishedContent;
    property twitterContent;
    property facebookContent;

    function init(publishedContent, twitterContent, facebookContent) {
        variables.publishedContent = arguments.publishedContent
        variables.twitterContent = arguments.twitterContent
        variables.facebookContent = arguments.facebookContent
    }
    
    function setValidationService(ValidationService validationService) {
        variables.validationService = arguments.validationService
    }
    
    function setUserFactory(UserFactory userFactory) {
        variables.userFactory = arguments.userFactory
    }
    
    function setContentService(UserContentService contentService) {
        variables.contentService = arguments.contentService
    }
    
    function setTwitterService(TwitterService twitterService) {
        variables.twitterService = arguments.twitterService
    }
    
    function setFacebookService(FacebookService facebookService) {
        variables.facebookService = arguments.facebookService
    }

That's all a bit of a mouthful every time we want a UserContent object that needs to use alternative loading methods to get its data, so we hide all that away in our dependency injection set-up, and use a factory to create the object, set its properties, and then return the object:

// UserContentFactory.cfc
component {

    function init(
        ValidationService validationService,
        UserFactory userFactory,
        UserContentService contentService,
        TwitterService twitterService,
        FacebookService facebookService
    ) {
        variables.validationService = arguments.validationService
        variables.userFactory = arguments.userFactory
        variables.contentService = arguments.contentService
        variables.twitterService = arguments.twitterService
        variables.facebookService = arguments.facebookService
    }

    function getUserContent() {
        userContent = new UserContent()
        userContent.setValidationService(validationService)
        userContent.setUserFactory(userFactory)
        userContent.setContentService(contentService)
        userContent.setTwitterService(twitterService)
        userContent.setFacebookService(facebookService)
        
        return userContent
    }
}

The controller just needs to be able to ask the factory for a UserContent object, and then call the method it needs, passing its raw values:

userContent = userContentFactory.getUserContent().loadContentByFilters(rawArgs)

You'll noticed I kept the validation separate from the UserContent model:

function loadContentByFilters(required struct filters) {
    validFilters = validationService.validate(filters, getValidationRules()) // @throws ValidationException

(And then there's also this private method with the rules):

private function getValidationRules() {
    return {
        id = [
            {required = true},
            {type = "integer"}
        ],
        startDate = [
            {required = true},
            {type = "date"},
            {
                range = {
                    max = now()
                }
            }
        ],
        endDate = [
            {required = true},
            {type = "date"},
            {
                range = {
                    max = now()
                }
            }
        ],
        collection = [
            {callback = (collection) => collection.startDate.compare(collection.endDate) < 0}
        ]
    }
}

Validation is fiddly and needs to be accurate, so I don't believe how to validate some values is the job of the UserContent class. I believe it's just perhaps its job to know "what it is to be valid". Hence that separation of concerns. I could see a case for that private method to be its own class, eg UserContentValidationRules or something. But for here, just a private method is OK. Wherever those rules are homed, and whatever the syntax of defining them is, we then pass those and the data to be validated to a specialist validation service that does the business. In this example the validation service itself throws an exception if the validation fails. In reality it'd more likely return a collection of rules violations, and it'd be up to the model making the call to throw the exception. That's implementation detail not so relevant to the code here.


There's probably more off-piste code in this an on-~, but I think it shows how to keep yer domain / business logic out of your controllers, which should be very very light, and simply marshall the incoming request values to the places that need them to be able to come up with a response. That's all a controller ought to do.


Oh before I go. There's an attitude from some testing quarters that one doesn't test one's controllers. I don't actually agree with that, but even if I did: that whole notion is predicated on controllers being very very simple, like I show above. If you pile all (or any of ~) yer logic into yer controller methods: you do actually need to test them! Even in this case I'd still be testing the flow control around the try/catch stuff. If I didn't have that, I'd probably almost be OK if someone didn't test it. Almost.

Righto.

--
Adam

Wednesday 4 January 2017

PHP: looking at the Chain of Responsibility pattern

G'day:
We're working on a proof of concept for some stuff at work at the moment, and I've been put on a code review of the first round of the code. Part of the code is dealing with conditionally getting some data from cache, or if it ain't there, getting it from the DB instead: a fairly common trope. Normally I'd fall back on the decorator pattern for this, indeed I've written about it in the past ("Using a decorator pattern to reduce inappropriate code complexity"). I didn't see this in play in this first tranche of code, so suggested its use. The authoring dev suggested they felt that the decorator pattern was not a good fit for this, so they were using the Chain of Responsibility pattern instead. Another of our devs - when prodded - tended to agree.

I gave this some thought, read-up on the Chain of Responsibility pattern (and some more on the Decorator Pattern), and wasn't quite seeing it.

Just to back up a bit... if yer like me and kinda knew about the Chain of Responsibility pattern, but wasn't intimately familiar with it: read that Wikipedia link above, and do some googling of yer own. But in short it's characterised by having a task that needs doing, and having a sequence (a chain) of agents that possibly can (or possibly cannot) resolve all or part of the task. The process is that the task is passed to the chain of agents, and each in turn decides for itself whether it can fulfil the task in whole or in part. If an agent can resolve the whole task it does so: returning the result. If a given agent cannot fulfil the whole task (either it cannot fulfil it at all, or can only do part of it), it does its work, then passes responsibility on to the next task in the chain. Each agent knows two things: how to do its part of the job, and that it can pass work on to the "next" agent. It does not (and should not!) know what the next agent does, nor should it rely on what the next agent does. It simply does its job and returns it or passes on incomplete work to the next agent in the chain. I can't be arsed with a diagram: go google one.

Now: for two thirds of our challenge, I think the Chain of Responsibility pattern is a good fit: for the "Is it in the cache? Yes [/No. Is it in the DB? Yes/No]" bit. No qualms there. However it sticks in my craw once we get to the step after getting from the DB: putting it in the cache for next time. That step does not belong in the chain as it's not performing the same sort of task as the other two: getting the data. Second to that it's hitting the cache before and after the DB step, which makes it more seem like a decoration to me (of sorts... I might get back to the "of sorts..." bit later). Even if one decides "get from cache" is one agent's job, and "get from DB, oh and put it in the cache" is another agent's job, that latter agent is doing a mishmash of "unrelated" work, and seems to be a bit tightly-coupled to me, as well as having a wodge of code (that's a technical term) doing two things, rather than two wodges of code doing one thing each.

That said, I don't like being so emphatic about "that's wrong" without being happy about my understanding of things, so figured I should try to make it work in a way that sits well with me in a proof of concept. That and it seemed like good material for a blog article.

In my proof of concept I am fetching some Person data, for example:

$person = $personService->getById(1);

The PersonService defers to a "RetrievalHandler" for this:

class PersonService {

    private $retrievalHandler;

    public function __construct(PersonRetrievalHandler $retrievalHandler) {
        $this->retrievalHandler = $retrievalHandler;
    }

    public function getById($id) : Person {
        $record = $this->retrievalHandler->getById($id);

        if (is_array($record)){
            $person = new Person($record['firstName'], $record['lastName']);

            return $person;
        }
        return new Person();
    }
}


This just finds and returns a Person object, implement thus:

class Person {

    public $firstName;
    public $lastName;

    public function __construct($firstName=null, $lastName=null) {
        $this->firstName = $firstName;
        $this->lastName = $lastName;
    }
}

As noted, the service uses a the RetrievalHandler to retreive stuff. A handler is implementation of this:

abstract class PersonRetrievalHandler {

    protected $nextHandler;
    protected $logger;

    public function __construct() {
        $this->nextHandler = new class {
            function getById(){
                return null;
            }
        };
    }

    public abstract function getById(int $id) : array;

    public function setNextHandler(PersonRetrievalHandler $nextHandler) {
        $this->nextHandler = $nextHandler;
    }
}

All it's required to do here is know how to use getById, and also it can setNextHandler. I'll come back to the code in the constructor later. Don't worry about that for now.

I've got two implementations of this. One for getting the thing from a caching service:

class CachedPersonHandler extends PersonRetrievalHandler {

    private $cacheService;

    public function __construct(CacheService $cacheService){
        parent::__construct();
        $this->cacheService = $cacheService;
    }

    public function getById(int $id) : array {
        if ($this->cacheService->exists($id)) {
            return $cachedPerson = $this->cacheService->get($id);
        }

        return $this->nextHandler->getById($id);
    }
}


And one for getting it from a database repository:

class DatabasePersonHandler extends PersonRetrievalHandler {

    private $repository;

    public function __construct(PersonRepository $repository) {
        parent::__construct();
        $this->repository = $repository;
    }

    public function getById($id) : array {
        $record = $this->repository->getById($id);
        if (count($record)) {
            return $record;
        }

        return $this->nextHandler->getById($id);
    }
}


Note this is very woolly and not production-sound code. It's just to demonstrate the concept.

Important to note in both these situations is the agents don't themselves do any of the work; they just try to get the work done by something else, and return it if done, or pass the task to the next agent if the current agent itself couldn't fulfil the requirement.

I've created a stub caching service and repository:

class CacheService {

    private $cache = [];
    private $logger;

    public function __construct(LoggingService $logger)
    {
        $this->logger = $logger;
    }

    public function get($key) {
        $this->logger->info("Cache hit for $key");
        return $this->cache[$key];
    }

    public function exists(string $key) : bool {
        $exists = array_key_exists($key, $this->cache);

        $this->logger->info(sprintf("Cache %s for %d", $exists ? 'hit' : 'miss', $key));

        return $exists;
    }

    public function put(string $key, $value) {
        $this->cache[$key] = $value;
    }
}


class PersonRepository {

    private $data = [
        [],
        ['firstName' => 'Tui', 'lastName' => 'Tahi'],
        ['firstName' => 'Rewi', 'lastName' => 'Rua'],
        ['firstName' => 'Tama', 'lastName' => 'Toru'],
        ['firstName' => 'Whina', 'lastName' => 'Wha']
    ];
    private $logger;

    public function __construct(LoggingService $logger)
    {
        $this->logger = $logger;
    }

    public function getById($id) {
        if (array_key_exists($id, $this->data)){
            $this->logger->info("Database hit for $id");
            return $this->data[$id];
        }
        $this->logger->info("Database miss for $id");
        return [];
    }
}


These also log activity in a very naïve way:

class LoggingService {
    public function info(string $message) {
        echo $message . PHP_EOL;
    }
}


All of this is stitched together in a factory, as there's a lot of moving parts:

class PersonServiceFactory {

    public static function getPersonService()
    {
        $loggingService = new LoggingService();
        $cacheService = new CacheService($loggingService);
        $personRepository = new PersonRepository($loggingService);

        $cachedHandler = new CachedPersonHandler($cacheService);
        $databaseHandler = new DatabasePersonHandler($personRepository);

        $cachedHandler->setNextHandler($databaseHandler);

        $personService = new PersonService($cachedHandler);

        return $personService;
    }

}


Note how the only knowledge each agent has of another is what a given agent's next handler might be. Speaking of next handlers. let's go back to that code in the PersonRetrievalHandler's constructor:

public function __construct() {
    $this->nextHandler = new class {
        function getById(){
            return null;
        }
    };
}


You'll remember each getById implementation will either fulfil the task and return the result, or it'll defer to the next agent in the chain. This is just a mechanism to make this "work" even if the last agent in the chain calls return $this->nextHandler->getById($id). The base class will take the call and just return null. I was semi-pleased to be able to use a class literal here (or anonymous class as PHP wants to call them). So this means I do not need to set the next handler on the $databaseHandler: it'll just use the inbuilt one if it cannot fulfil the requirement itself.

Right so we can now run this to see what happens. Remember the expectation is that the requested data will be fetched from the cache as a priority, but if not found: get it from the DB. Note that I am not doing anything here about putting the data into the cache if it wasn't there in the first place. I'll get to that. Anyway, here's some tests:

$personService = PersonServiceFactory::getPersonService();
$person = $personService->getById(1);
var_dump($person);
$person = $personService->getById(1);
var_dump($person);
$person = $personService->getById(4);
var_dump($person);
$person = $personService->getById(5);
var_dump($person);

Here we're:
  • getting the same person twice to start with: ostensibly this'd demonstrate a cache-miss then a cache-hit;
  • getting a different person from the first (back to a cache miss);
  • getting a person who is neither in the cache nor the DB.

Results:

C:\src\php\php.local\src\chainOfResponsibility\public>php manualTest.php
Cache miss for 1
Database hit for 1
object(me\adamcameron\cor\model\Person)#9 (2) {
  ["firstName"]=>
  string(3) "Tui"
  ["lastName"]=>
  string(4) "Tahi"
}
Cache miss for 1
Database hit for 1
object(me\adamcameron\cor\model\Person)#10 (2) {
  ["firstName"]=>
  string(3) "Tui"
  ["lastName"]=>
  string(4) "Tahi"
}
Cache miss for 4
Database hit for 4
object(me\adamcameron\cor\model\Person)#9 (2) {
  ["firstName"]=>
  string(5) "Whina"
  ["lastName"]=>
  string(3) "Wha"
}
Cache miss for 5
Database miss for 5
object(me\adamcameron\cor\model\Person)#10 (2) {
  ["firstName"]=>  NULL
  ["lastName"]=>  NULL
}

C:\src\php\php.local\src\chainOfResponsibility\public>

That's all good:
  • the cache never finds anything (although it does look),
  • so passes the job on to the DB. The DB will return the data if found,
  • otherwise it'll fall back to the default agent which just returns null.


So far: so good. But why did I omit the "putting it into the cache" step. Well... using this model, it simply won't work. The rule is that if an agent can resolve the work, then it returns it. Otherwise it defers to the next agent. I figured I could implement a "pass-thru" agent which just caches the result found by the DB agent: job done. But if the DB agent finds the record - read: "it can fulfil the task" - then it doesn't pass the task on to the next agent. It considers the task fulfilled, so just returns its result. So the only time the "put it in cache" agent will be called is if the DB agent doesn't find anything. Precisely the wrong time to cache something. That aside, the agent only passes on the inputs (the $id) to the next agent, so there's no mechanism here for an agent to receive the result from the previous agent even if the result was there to pass.

What we'd need to do here is just use the chain of responsibility to get the data, and perhaps leave it to the PersonService to cache the result it receives. However this is a bit unbalanced in my view, as caching considerations of the same data are being handled at not only two different places in the code, but two different "levels" of the task at hand. That's a bit clumsy in my book.

Next I thought I could adjust the DB handler to unconditionally pass its result on to the next handler, rather than break out of the chain once done. But that's a bit crap as it makes the DB handler behave differently than the other handlers in the chain, which also implies it knows more about what's going on than it should: ie that there's a reason to always pass the result on, rather than return it if it's done the work. Not cool.

Update:

My colleague (a different one) suggested a third open: roll the cache-put into the CachedPersonHandler:

class CachedPersonHandler extends PersonRetrievalHandler {

    private $cacheService;

    public function __construct(CacheService $cacheService){
        parent::__construct();
        $this->cacheService = $cacheService;
    }

    public function getById(int $id) : array {
        if ($this->cacheService->exists($id)) {
            return $cachedPerson = $this->cacheService->get($id);
        }

        $person = $this->nextHandler->getById($id);
        $this->cacheService->put($id, $person);
        return $person;
    }
}

Well: indeed. Now it's basically a decorator: you've decorated a DB call with some caching. Contrast it with this from my earlier article on the Decorator Pattern:

class CachedUserRepository implements UserRepositoryInterface {

    private $repository;
    private $cacheService;

    public function __construct(UserRepositoryInterface $repository, CacheServiceInterface $cacheService) {
        $this->repository = $repository;
        $this->cacheService = $cacheService;
    }

    public function getById($id) {
        if ($this->cacheService->isCached($id)) {
            return $this->cacheService->get($id);
        }
        $object = $this->repository->getById($id);

        $this->cacheService->put($id, $object);
        return $object;
    }

}

You've arrived at the same conclusion I started with ;-)

I did try one other tactic. Instead of passing just the inputs to the agents, I had a look at passing the inputs and the outputs through the chain, and also always passing on to the next agent, irrespective of whether a given handler arrived at the result. This was very proof-of-concept-y. It's possibly a legit approach if the Chain of Responsibility was one in that a given agent was only supposed to fulfil part of the solution, relying on all links in the chain to arrive at the complete result. This is not what I'm doing here, but decided to leverage this approach to get the caching working "properly".


Intermission

Go have a pee or get some more popcorn or something.


Now I have an interface for a Handler:

interface Handler {

    public function perform($request, $response);

}


This just reflects a more generic solution than getById. More importantly note that unlike getById, perform takes a request and a response. For our examples the "request" would be the ID. And the response would be any result a given agent was able to come up with.

This is implemented in an abstract way by PersonRetrievalHandler:

abstract class PersonRetrievalHandler implements Handler {

    protected $nextHandler;
    protected $logger;

    public function __construct() {
        $this->nextHandler = $this->getPassThroughHandler();
    }

    public function setNextHandler(PersonRetrievalHandler $nextHandler) {
        $this->nextHandler = $nextHandler;
    }

    private function getPassThroughHandler(){
        return new class  {
            function perform($_, $response){
                return $response;
            }
        };
    }
}

It's still down to the individual handlers to implement the perform method though.

I've also refactored that class literal out into a helper method; and its perform method now returns whatever response it was passed, rather than returning null.

The perform method in CachedPersonHandler is thus:

public function perform($request, $response=null) {
    $id = $request;
    if ($this->cacheService->exists($id)) {
        $response = $this->cacheService->get($id);
    }

    return $this->nextHandler->perform($request, $response);
}


Note that it doesn't return the result itself, it just passes it on to the next agent. Otherwise it's doing the same thing.

And the DatabasePersonHandler has also been slightly refactored:

public function perform($request, $response=null) {
    if (is_null($response)){
        $id = $request;
        $response = $this->repository->getById($id);
    }

    return $this->nextHandler->perform($request, $response);
}


It infers that the cache didn't find anything by it not receiving anything in the response, and only if so does it hit the DB. Otherwise it just passes everything through to the next agent.

Now we have a third handler, CacheablePersonHandler:

class CacheablePersonHandler extends PersonRetrievalHandler {

    private $cacheService;

    public function __construct(CacheService $cacheService){
        parent::__construct();
        $this->cacheService = $cacheService;
    }

    public function perform($request, $response=null) {
        if (!is_null($response)) {
            $id = $request;
            $this->cacheService->put($id, $response);
        }

        return $this->nextHandler->perform($request, $response);
    }
}


This one makes no pretence about finding data, it just takes any received response and caches it, keyed on the request. And passes the response on. At this point remember the default handler will just return whatever response it receives.

We wire all this together in the factory again:

public static function getPersonService()
{
    $loggingService = new LoggingService();
    $cacheService = new CacheService($loggingService);
    $personRepository = new PersonRepository($loggingService);

    $cachedHandler = new CachedPersonHandler($cacheService);
    $databaseHandler = new DatabasePersonHandler($personRepository);
    $cacheableHandler = new CacheablePersonHandler($cacheService);

    $databaseHandler->setNextHandler($cacheableHandler);
    $cachedHandler->setNextHandler($databaseHandler);

    $personService = new PersonService($cachedHandler);

    return $personService;
}


This differs only in that it also creates the CacheablePersonHandler too, and stick it at the end of the chain.

Here's our revised test:

$personService = PersonServiceFactory::getPersonService();

foreach ([1,1,4,5,5] as $id){
    $person = $personService->getById($id);
    var_dump($person);
}


(it's the same, just without the tautological repeated duplication of the same stuff again).

And the result:

C:\src\php\php.local\src\chainOfResponsibility\public>php manualTest.php
Cache miss for 1
Database hit for 1
Cache put for 1
object(me\adamcameron\cor\model\Person)#8 (2) {
  ["firstName"]=>
  string(3) "Tui"
  ["lastName"]=>
  string(4) "Tahi"
}
Cache hit for 1
Cache put for 1
object(me\adamcameron\cor\model\Person)#11 (2) {
  ["firstName"]=>
  string(3) "Tui"
  ["lastName"]=>
  string(4) "Tahi"
}
Cache miss for 4
Database hit for 4
Cache put for 4
object(me\adamcameron\cor\model\Person)#8 (2) {
  ["firstName"]=>
  string(5) "Whina"
  ["lastName"]=>
  string(3) "Wha"
}
Cache miss for 5
Database miss for 5
object(me\adamcameron\cor\model\Person)#11 (2) {
  ["firstName"]=>
  NULL
  ["lastName"]=>
  NULL
}
Cache miss for 5
Database miss for 5
object(me\adamcameron\cor\model\Person)#8 (2) {
  ["firstName"]=>
  NULL
  ["lastName"]=>
  NULL
}

C:\src\php\php.local\src\chainOfResponsibility\public>

So it "works", in that DB hits are now going into the cache.

But it also re-caches cache hits too :-(

Sigh. I could work around this by doing an exists call in the latter handler too:

public function perform($request, $response=null) {
    if (!is_null($response)) {
        $id = $request;
        if (!$this->cacheService->exists($id)) {
            $this->cacheService->put($id, $response);
        }
    }

    return $this->nextHandler->perform($request, $response);
}


But it all seems a bit clumsy to me. And when something seems clumsy to me, I start wondering if this square peg is supposed to be going in this round hole. And my conclusion is generally: "no".

The problem here goes back to the "cache the data" part of this chain doesn't belong in the chain. It's not a congruent task compared to the data-fetch tasks. The "responsibility" in the "Chain of Responsibility" is that the responsibility is "fulfilling the task". The caching agent makes no attempt to fulfil the requirement of the chain, so it doesn't belong in the chain to me.

So where does it belong? Well it belongs as near as possible to when we know the database was hit, and we have the result. So we could put it in the database agent. But then we have this imbalance that the cache-look-up is part of the chain, and the cache-put is within part of the chain. I don't like that asymmetry. Also it means the DB agent is doing two things: getting data and caching it, which seems like it's doing too much to me: it needs to know how to do two different things. We could go down the route of having like a CachedPersonRepository, and that at least removes those two responsibilities from the agent, but it does just move them: now the repo needs to know about the DB and the caching.

What I'd end up doing is using a caching decorator around a DB-only repository. That's symmetrical, and all concerns are separated.

Hmmm.

Second Intermission

Go find a suitable blade so you can open a vein and end it all instead of having to read on… (and on…)


Oh, as a footnote to this another requirement I wanted to prove is that the chain links behaved "sensibly" if called without their adjacent agents. IE: the DatabasePersonAgent agent didn't need the CachedPersonAgent, nor did it need the CacheablePersonAgent.

Here's some examples:

Just the database handler:

$loggingService = new LoggingService();
$personRepository = new PersonRepository($loggingService);

$databaseHandler = new DatabasePersonHandler($personRepository);

$personService = new PersonService($databaseHandler);

foreach ([1,5] as $id){
    $person = $personService->getById($id);
    var_dump($person);
}


Result:

C:\src\php\php.local\src\chainOfResponsibility\public>php manualTestWithJustDatabaseHandler.php
Database hit for 1
object(me\adamcameron\cor\model\Person)#7 (2) {
  ["firstName"]=>
  string(3) "Tui"
  ["lastName"]=>
  string(4) "Tahi"
}
Database miss for 5
object(me\adamcameron\cor\model\Person)#8 (2) {
  ["firstName"]=>
  NULL
  ["lastName"]=>
  NULL
}

C:\src\php\php.local\src\chainOfResponsibility\public>

Yup: it's getting data from the DB, and doesn't require the other agents.

Now one with just the CachedPersonHandler:

$loggingService = new LoggingService();
$cachingService = new CacheService($loggingService);

$cachedHandler = new CachedPersonHandler($cachingService);

$personService = new PersonService($cachedHandler);

$cachingService->put(5, ['firstName'=>'Ria', 'lastName'=>'Rima']);

foreach ([1,5] as $id){
    $person = $personService->getById($id);
    var_dump($person);
}

Result:

C:\src\php\php.local\src\chainOfResponsibility\public>php manualTestWithJustCachedHandler.php
Cache put for 5
Cache miss for 1
object(me\adamcameron\cor\model\Person)#7 (2) {
  ["firstName"]=>
  NULL
  ["lastName"]=>
  NULL
}
Cache hit for 5
object(me\adamcameron\cor\model\Person)#8 (2) {
  ["firstName"]=>
  string(3) "Ria"
  ["lastName"]=>
  string(4) "Rima"
}

C:\src\php\php.local\src\chainOfResponsibility\public>

Here I'm priming the cache manually, just to make sure it'll find the cached item OK. And it does.

For the sake of completeness I also test with just the cache-put agent. It won't do anything useless (a warning flag in itself), but it should "work":

$loggingService = new LoggingService();
$cachingService = new CacheService($loggingService);

$cacheableHandler = new CacheablePersonHandler($cachingService);

$personService = new PersonService($cacheableHandler);

foreach ([1,5] as $id){
    $person = $personService->getById($id);
    var_dump($person);
}


Result:

C:\src\php\php.local\src\chainOfResponsibility\public>php manualTestWithJustCacheableHandler.php
object(me\adamcameron\cor\model\Person)#7 (2) {
  ["firstName"]=>
  NULL
  ["lastName"]=>
  NULL
}
object(me\adamcameron\cor\model\Person)#8 (2) {
  ["firstName"]=>
  NULL
  ["lastName"]=>
  NULL
}

C:\src\php\php.local\src\chainOfResponsibility\public>

This is all unhelpful to the stated goal, but it's doing what it's told and is self-contained. "Win".

In conclusion I think this has helped me get a handle on the Chain of Responsibility pattern, which is good. I still want to have a look at chains wherein the agents achieve partial fulfilment, and possibly some recursion as well. That'd be cool.

But I really don't think this is the right solution to the job at hand at work.

I'll leave you be now. Oh... as per usual: all the code is on Github, including the working application. The first iteration of the work is tagged separately.

Righto.

--
Adam

Sunday 1 May 2016

PHP: Scaling the decorator pattern

G'day:
I while back I had a look at using the Decorator Pattern to simplify some code:


One again me mate Brian has come to the fore with a tweak to make this tactic an even more appealing prospect: leveraging PHP's "magic" __call method to simplify decorator code.

Let's have a look at one of the examples I used earlier:

class User {

    public $id;
    public $firstName;
    public $lastName;

    function __construct($id, $firstName, $lastName){
        $this->id = $id;
        $this->firstName = $firstName;
        $this->lastName = $lastName;
    }

}

class DataSource {

    function getById($id){
        return json_encode([
            'id' => $id,
            'firstName' => 'Zachary',
            'lastName' => 'Cameron Lynch',
        ]);
    }
}

class LoggerService {

    function logText($text){
        echo "LOGGED: $text" . PHP_EOL;
    }
}


class UserRepository {

    private $dataSource;

    public function __construct($dataSource) {
        $this->dataSource = $dataSource;
    }

    public function getById($id) {
        $userAsJson = $this->dataSource->getById($id);

        $rawUser = json_decode($userAsJson);
        $user = new User($rawUser->id, $rawUser->firstName, $rawUser->lastName);

        return $user;
    }

}

class LoggedUserRepository {

    private $repository;
    private $loggerService;

    public function __construct($repository, $loggerService) {
        $this->repository = $repository;
        $this->loggerService = $loggerService;
    }

    public function getById($id) {
        $this->loggerService->logText("$id requested");
        $object = $this->repository->getById($id);

        return $object;
    }

}

$dataSource = new DataSource();
$userRepository = new UserRepository($dataSource);

$loggerService = new LoggerService();
$loggedUserRepository = new LoggedUserRepository($userRepository, $loggerService);

$user = $loggedUserRepository->getById(5);
var_dump($user);

That looks like a chunk of code, but the User, DataSource and LoggerService are just dependencies. The code you really wannna look at are the two repo variations: the basic UserRepository, and the decorated LoggedUserRepository.

This code all works fine, and outputs:

C:\src>php baseline.php
LOGGED: 5 requested
object(User)#6 (3) {
  ["id"]=>
  int(5)
  ["firstName"]=>
  string(7) "Zachary"
  ["lastName"]=>
  string(13) "Cameron Lynch"
}

C:\src>

Fine.

But this is a very simple example: the repo has only one one method. So the decorator only needs to implement one method. But what if the repo has ten public methods? Suddenly the decorator is getting rather busy, as it needs to implement wrappers for those methods too. Yikes. This is made even worse if the decorator is only really interested in decorating one method... it still needs those other nine wrappers. And it'd be getting very boiler-plate-ish to have to implement all these "empty" wrapper methods.

Let's add a coupla more methods to our repo:

class UserRepository {

    private $dataSource;

    public function __construct($dataSource) {
        $this->dataSource = $dataSource;
    }

    public function getById($id) {
        $userAsJson = $this->dataSource->getById($id);

        $rawUser = json_decode($userAsJson);
        $user = new User($rawUser->id, $rawUser->firstName, $rawUser->lastName);

        return $user;
    }

    public function getByFilters($filters) {
        $usersAsJson = $this->dataSource->getByFilters($filters);

        $rawUsers = json_decode($usersAsJson);
        $users = array_map(function($rawUser){
            return new User($rawUser->id, $rawUser->firstName, $rawUser->lastName);
        }, $rawUsers);

        return $users;
    }

    public function create($firstName, $lastName){
        $rawNewUser = json_decode($this->dataSource->create($firstName, $lastName));
        return new User($rawNewUser->id, $rawNewUser->firstName, $rawNewUser->lastName);
    }
}

We have two new methods: getByFilters() and create(). Even if we don't want to log calls to those for some reason, we still need the LoggedUserRepository to implement "pass-through" methods for them:

class LoggedUserRepository {

    private $repository;
    private $loggerService;

    public function __construct($repository, $loggerService) {
        $this->repository = $repository;
        $this->loggerService = $loggerService;
    }

    public function getById($id) {
        $this->loggerService->logText("$id requested");
        $object = $this->repository->getById($id);

        return $object;
    }

    public function getByFilters($filters) {
        return $this->repository->getByFilters($filters);
    }

    public function create($firstName, $lastName) {
        return $this->repository->create($firstName, $lastName);
    }

}


See how we still need methods for getByFilters and create? Suck. I mean it's not a huge amount of code, but given the LoggedUserRepository doesn't actually wanna log those methods, they're out of place.

Sunday 3 April 2016

Decorator Pattern vs simple inheritance

G'day:
This is a sequel to my earlier article "Using a decorator pattern to reduce inappropriate code complexity". You'd better go breeze through that before reading this one, otherwise this one won't make too much sense.

One of my colleagues read the above article and said "yeah good... but I wonder why one would do that over simple inheritance?" Good question. Especially as he is my boss, so all his questions are good (like I said: he also reads this blog ;-). My prepared answer was that the decorator pattern was an implementation representation of an interface, and accordingly it can kinda reflect multiple inheritance (albeit by composition) where an inheritance-based approach only allows a single inheritance chain: and that will get clumsy quickly. Note that I am not suggesting that the decorator pattern actually reflects a multiple inheritance pattern, I'm just using it as a comparative metaphor. I think it's reasonable.

In the previous article I showed how to simplify and focus the implementations of a UserRepository and a LoggedUserRepository (it writes to a log as well as making a given repo call), and a CachedUserRepository (it caches the repo call). And, indeed from there it was easy to decorate the UserRepository with both logging and caching, to effect either a LoggedCachedUserReppository, or a CachedLoggedUserRepository. If you see the subtle difference there: it's what order the ancillary operations take place in. This was in lieu of having all the caching and logging code baked into the UserRepository, which I think is less-than-ideal design, makes testing harder, and also presupposed an implemenation (which had already been demonstrated - in our case - to be a bad supposition, hence the origin of this investigation).

Code-wise, I had this:

class UserRepository implements RepositoryInterface {

    public function getById($id) {
        return (object) [
            "id" => $id,
            "firstName" => "Number $id",
            "recordAccessed" => new \DateTime()
        ];
    }

}

And the LoggedRepository used to decorate a UserRepository with logging might be like this:

class LoggedRepository implements RepositoryInterface {

    private $repository;
    private $loggerService;

    public function __construct(RepositoryInterface $repository, LoggerServiceInterface $loggerService) {
        $this->repository = $repository;
        $this->loggerService = $loggerService;
    }

    public function getById($id) {
        $this->loggerService->logText("$id requested");
        $object = $this->repository->getById($id);

        return $object;
    }

}

The key points are that the UserRepository does all the data-getting, the LoggedRepository just decorates that with some logging. But each repo just plays to its strengths, and lets the other one get on with its own strength. That's the decorator pattern. The CachedRepository was analogous to the Logged~ one, just doing caching instead of logging. And to get both a logged & cached or a cached & logged UserRepository, it's just a matter of a further layer of decoration. Roughly like a matryoshka doll. Of sorts. Anyway: there's that whole other article about that.

So what about using inheritance instead? Let's swap out the repository type here to just a PersonRepository (the reason for this is solely cos all this code is in the same namespace, so I had already used the User~ metaphor). The PersonRespository is the same as the User one:

class PersonRepository implements RepositoryInterface {

    public function getById($id) {
        return (object) [
            "id" => $id,
            "firstName" => "Number $id",
            "recordAccessed" => new \DateTime()
        ];
    }

}

(I'm still just faking the actual data-fetch operation, as it's irrelevant here. But imagine getById() gets a user via its ID from some data store).

One could consider that a LoggedPersonRepository is a specialisation of UserRepository: it's for the same purpose, it just throws logging into the mix:

class LoggedPersonRepository extends PersonRepository {

    protected $loggerService;

    public function __construct(LoggerServiceInterface $loggerService) {
        $this->loggerService = $loggerService;
    }

    public function getById($id) {
        $this->loggerService->logText("$id requested");
        $object = parent::getById($id);

        return $object;
    }

}

One can accurately say a LoggedPersonRepository IS A PersonRepository, so the inheritance seems sound, and if one thinks through scenarios, I think it passes a Liskov substitution principle test as well. So the design is - thusfar - sound. And looking at the code, this is actually more simple than the decorator pattern version because there's no need for this LoggedPersonRepository to take a PersonRepository argument, as it is a PersonRepository.

And it's samesame for a CachedPersonRepository, but I'll not bore you with the implementation detail. I'm guessing you can see what I mean.

But what about when we do add caching to our repositories? We'd need two more classes:

class CachedPersonRepository extends PersonRepository {

    protected $cacheService;

    public function __construct(CacheServiceInterface $cacheService) {
        $this->cacheService = $cacheService;
    }

    public function getById($id) {
        if ($this->cacheService->isCached($id)) {
            return $this->cacheService->get($id);
        }
        $object = parent::getById($id);

        $this->cacheService->put($id, $object);
        return $object;
    }

}

And:
class CachedLoggedPersonRepository extends LoggedPersonRepository {

    protected $cacheService;

    public function __construct(LoggerServiceInterface $loggingService, CacheServiceInterface $cacheService) {
        parent::__construct($loggingService);
        $this-&gt;cacheService = $cacheService;
    }

    public function getById($id) {
        if ($this-&gt;cacheService-&gt;isCached($id)) {
            return $this-&gt;cacheService-&gt;get($id);
        }
        $object = parent::getById($id);

        $this-&gt;cacheService-&gt;put($id, $object);
        return $object;
    }

}

IE: CachedPersonRepository is an entirely different class from a CachedLoggedPersonRepository.  This is still sound inheritance, and stands up to the LSP test too; so from that side of things, the design is still "valid". This doesn't make it optimal though. Also if we wanted to just have the caching, or the caching just higher in the hierarchy than the logging? Two more classes again.

And say you then want to add an encryption layer? That's potentially another five different class variations:

  • EncrytedPersonRepository
  • EncryptedLoggedPersonRepository
  • EncryptedCachedPersonRepository
  • EncryptedCachedLoggedPersonRepository
  • EncryptedLoggedCachedPersonRepository
And if we want the encryption done at a different level than just the outer one... they you see how many more possible permutations there might potentially be a use case for. Now I'm not saying any one given system would need all these permutations (that'd be weird), but it does demonstrate that the approach doesn't scale so well. Using the decorator pattern, one needs just one class per task, for all variations of sequencing the operations. In this example one would need a PersonRepository, a LoggedPersonRepository, a CachedPersonRepository, and an EncryptedPersonRepository. Those four classes can be used to make a EncryptedLoggedCachedPersonRepository, or a CachedLoggedEncryptedPersonRepository, or any other implementation of a subset of those four notions.

I think my example of a LoggedCachedEncryptedPersonRepository is slightly egregious, and I don't want to use an appeal to extremes to make my case here. However I think - all things being equal - using the decorator pattern instead of inheritance to solve this sort of thing is going to be a more robust way of preempting potential scaling requirements; whilst still being a simple and recognised solution; and an easy way to facilitate a possible future situation, whilst not in any way actually going down the rabbit-hole of actually writing any code for that potential future.

All the code and sanity checks of the behaviour can be found in my github account: decorator.local.

Righto.

--
Adam