Wednesday 14 July 2021

Switch refactoring on the road to polymorphism

G'day:

A while back I wrote an article "Code smells: a look at a switch statement". That deals with a specific case which was pretty tidy to start with, and wasn't completely clear as to whether the original or my refactoring was "better" (I think the initial switch was OK in this case).

Exceptions will always prove the rule as "they" say, and I'm gonna chalk that one up to a case of that. Also a good example of why sometimes it's not good to be dogmatic about rules.

However there a coupla general forms of switch statements that one can codify general refactoring advice for. Well: the main general rule is "switches are seldom necessary in your business logic code". If you find yourself writing one: stop to think if yer letting yerself down from an OOP perspective.

The other two situations I'm thinking of boil down to sticking to the Single Responsibility Principle: not munging more than one thing into a function basically. In the case of a switch, the two things are "what to do" and "how to do it". As well as when you fire a switch right into the middle of some other code, it's obvious it's breaking the SRP.

I'll start with that latter one.

Instead of this:

function f() {

    // code statements
    
    switch
    
    // code statements
}

Do this:

function f() {
    // code statements
    
    switchHandler()
    
    // code statements
}

function switchHandler() {
    switch
}

Note that switchHandler quite possibly doesn't even belong in the same class as the code using it.

In this case the first function is already probably doing more than one thing… it's got three sections, so that says "three things" to me. The whole thing should be looked at, but take the switch out completely. It is definitely and clearly doing at least one additional thing just by itself.

The second one is the other way around. Don't do this:

function f() {
    switch (expression) {

        case "A":
            // implementation of A
            break;

        case "B":
            // implementation of B
            break;

        case "C":
            // implementation of C
            break;

        default:
            // implementation of default
    }
}

Do this:

function f() {
    switch (expression) {

        case "A":
            caseAHandler()
            break;

        case "B":
            caseBHandler()
            break;

        case "C":
            caseCHandler()
            break;

        default:
            defaultHandler()
    }
}

function caseAHandler() {
    // implementation of A
}

// etc

Keep the "what to do" and the "how to do it" separate from each other: they are two different things (and the "how to do it" are each different things, so you have an 1+n degree of code smell going on there).

If you do these two things, a proper polymorphic handling of the situation - wherein you just use a type to deal with the logic - might start becoming more clear. Even if it doesn't straight away, this will make the refactoring easier when it does. You could use a factory to get the correct case handler, or if you know ahead of time what expression is, you can just pass in the correct handler in the first place. Or some other way of doing things. But even without taking it further, your code is less smelly, so is better.

Also note that if you have multiple switch statements with the same expression, then it's a clue that the whole lot should come out into its own class, and that a more OOP handling of the situation would simplify your code.

I only had 15min to write this this morning, so apologies it's a bit clipped-sounding. Must dash.

Righto.

--
Adam

Monday 12 July 2021

Factories, static methods, Law of Demeter and never letting Sean see my code

G'day:

I'm absolutely kidding about one of those things, btw.

Right so yesterday I wrote a coupla articles about what code should and should not go into a controller:

Within that first one, I provided some sample code to demonstrate my point. Part of it was the usage of a factory class to provide some necessary wiring for my model class, which I then had as a dependency of my controller:

// UserContentFactory.cfc
component {

    function init(
        ValidationService validationService,
        UserFactory userFactory,
        UserContentService contentService,
        TwitterService twitterService,
        FacebookService facebookService
    ) {
        variables.validationService = arguments.validationService
        variables.userFactory = arguments.userFactory
        variables.contentService = arguments.contentService
        variables.twitterService = arguments.twitterService
        variables.facebookService = arguments.facebookService
    }

    function getUserContent() {
        userContent = new UserContent()
        userContent.setValidationService(validationService)
        userContent.setUserFactory(userFactory)
        userContent.setContentService(contentService)
        userContent.setTwitterService(twitterService)
        userContent.setFacebookService(facebookService)
        
        return userContent
    }
}
// UserContent.cfc
component accessors=true invokeImplicitAccessor=true {

    property publishedContent;
    property twitterContent;
    property facebookContent;

    function init(publishedContent, twitterContent, facebookContent) {
        variables.publishedContent = arguments.publishedContent
        variables.twitterContent = arguments.twitterContent
        variables.facebookContent = arguments.facebookContent
    }
    
    function setValidationService(ValidationService validationService) {
        variables.validationService = arguments.validationService
    }
    
    function setUserFactory(UserFactory userFactory) {
        variables.userFactory = arguments.userFactory
    }
    
    function setContentService(UserContentService contentService) {
        variables.contentService = arguments.contentService
    }
    
    function setTwitterService(TwitterService twitterService) {
        variables.twitterService = arguments.twitterService
    }
    
    function setFacebookService(FacebookService facebookService) {
        variables.facebookService = arguments.facebookService
    }
    
    function loadContentByFilters(required struct filters) {
        validFilters = validationService.validate(filters, getValidationRules()) // @throws ValidationException
        
        user = userFactory.getById(validFilters.id) // @throws UserNotFoundException
        
        publishedContent = contentService.getUserContent(validFilters)
        twitterContent = twitterService.getUserContent(validFilters)
        facebookContent = facebookService.getUserContent(validFilters)
    }
    
    private function getValidationRules() { 
    	// ... elided to save space
    }
}
// UserContentController.cfc
component {

    function init(ViewService viewService, UserContentFactory userContentFactory) {
        variables.viewService = arguments.viewService
        variables.userContentFactory = arguments.userContentFactory
    }

    function getContent(rawArgs) {
        try {
            userContent = userContentFactory.getUserContent().loadContentByFilters(rawArgs)
            
            renderedResponse = viewService.renderView("userContentView", userContent)
            
            return new HtmlResponse(renderedResponse)
            
        } catch (ValidationException, e) {
            return new ClientErrorResponse(400, e)
        } catch (UserNotFoundException e) {
            return new ClientErrorResponse(404, e)
        }
    }
}

(Note that all of this is untested basically "pseudo"-code. I've never run it, it's just a theoretic approach to the issue. By all means let me know if there's anything syntactically or logically wrong with it, but it's not code intended to be run)

loadContentByFilters relied on five external services to do its stuff, and I didn't want to pass those services to UserContent as constructor arguments because then they'd get mungled up with the data values I wanted the UserContent model's constructor to focus on, so I was using setter injection to configure UserContent with those, and have that factory class and method to deal with all that in the background (via whatever dependency injection / IOC container framework I would be using). By the time the controller received the factory it'd been configured, and it had been able to configure a UserContent object, so all the controller needed to do was to call it.

All simple enough.

Sean popped a comment against the article, I responded, and he came back to me again. At which point I decided I'd write this article instead of replying again:

Sean:

Since you're being so hardcore about drawing lines between the MVC parts of your code, I'm going to call this into question:

userContentFactory.getUserContent().loadContentByFilters(rawArgs)

There's an argument often made that you shouldn't have method chaining like this since it couples the caller to the structure of the chained method calls on another object. The argument goes that you should instead have something like:

userContentFactory.getUserContentByFilters(rawArgs)

Thoughts?


Adam:

Yeah. I actually started with that! But then I didn't think it was the business of the factory to go as far as the actual data loading. But… I was not entirely convinced either way, to be completely honest.

I'm also completely down with the Law of Demeter when it comes to chaining methods like that. I did think about that at the time too.

The conclusion I drew was that the factory was just a "convenience extraction", or a utility or something that is completely "in the service" of UserContent. It serves absolutely no purpose other than a mechanism (I misspelled that as "MEHchanism" at first, perhaps fittingly) to wire-together the dependencies that loadContentByFilters needs to do its job. I did not want the controller to have to know how to do that.

The second draft of the actual code had two separate statements:

userContent = userContentFactory.getUserContent()
userContent.loadContentByFilters(rawArgs)

Which is functionally the same, and has the same Law of Demeter considerations you raised, but maybe the thinking is clearer: useContent = userContentFactory.getUserContent() is just an analogy of useContent = new UserContent(). I just slapped it into one statement because that was more convenient (ahem) for the point I was trying to make.

A simpler scenario would be if I wanted to be able to create a UserContent object via a struct that contained name/value pairs matching the constructor params it needs, I'd simply have a (probably static) method on the UserContent class which was createFromStruct which would internally grab the bits and pieces out of the struct and return a new UserContent object, passing those values to its constructor. No need for any separate factory cos there's no external dependencies to have to deal with somehow.

I knew that bit of the code would cause consternation from some quarters... and was hoping one of the quarters would be your own ;-).


Sean:

I find it interesting that you might subsume a factory for createFromStruct() as a static method but not for loadByFilters()…?

Now that is a very good point.

Truth be told, I simply didn't think of it at the time I was writing the blog article, but I did recall having done this in the past by the time I was writing the reply to Sean's question.

But let me back up a bit. Law of Demeter. In summary it means it's OK to call methods on a dependency, but it gets decreasingly OK to call method on objects that the methods of the dependency return, and so on. This is poor form:

function myMethod() {
    return someDependency.someMethod().someMethodOfAnotherObject().wowWeAreNowALongWayAwayFromTheDependency()
}

This is because not only is your implementation tightly coupled to its dependency - but we kinda need to live with that - it's also tightly coupled to the objects that implement someMethodOfAnotherObject and wowWeAreNowALongWayAwayFromTheDependency. You should not need to do all that. SomeDependency should take care of it for you. It should not be exposing its inner workings like that.

BTW, this does not apply to objects that implement a fluent interface. This is fine:

function myMethod() {
    return obj.someMethod().someOtherMethodOfObj().andAnotherMethodOfObj()
}

All those methods are part of the API exposed by obj.

So that's what Sean pulled me up on:

userContent = userContentFactory.getUserContent().loadContentByFilters(rawArgs)

That's a bit of a LoD violation, and worth pointing out. I did reason through this (before Sean raised it, as I said), and I'm fine with it.

But Sean's nudging of me back towards static methods solves this.

We could implement UserContent like this:

// UserContent.cfc
component accessors=true invokeImplicitAccessor=true {

    property publishedContent;
    property twitterContent;
    property facebookContent;

    function init(publishedContent, twitterContent, facebookContent) {
        variables.publishedContent = arguments.publishedContent
        variables.twitterContent = arguments.twitterContent
        variables.facebookContent = arguments.facebookContent
    }
    
    static function setValidationService(ValidationService validationService) {
         static.validationService = arguments.validationService
     }
     // etc, all the methods to set the services are static methods, setting a static property
    
    static function loadContentByFilters(required struct filters) {
        validFilters = validationService.validate(filters, getValidationRules()) // @throws ValidationException
        
        user = userFactory.getById(validFilters.id) // @throws UserNotFoundException
        
        return new UserContent(
            contentService.getUserContent(validFilters)
            twitterService.getUserContent(validFilters)
            facebookService.getUserContent(validFilters)
        )
    }
    
    // irrelevant stuff snipped
}
  • One sets the dependency services via static methods;
  • loadContentByFilters is also a static method which does its stuff, and returns a new instance of itself with the data values set.
  • Not shown here is that when the IoC framework creates the dependency container, it runs the factory method then, to set the services into the UserContent class (NB: "class", not "object") once.

Now in the controller we have a coupla options:

userContent = UserContent::loadContentByFilters(rawArgs)

No need to pass the class in as a constructor argument or anything (just as well: one cannot pass classes around as arguments), on just calls the method on the class.

Inlining static method calls like this is not so good for testing, that said, as it kinda bypasses dependency injection just like having new UserContent() in there, so this is less than ideal.

However there's nothing to stop us passing in an uninintialised instance of UserContent into the controller as a dependency. The IoC handling of this would be along the lines of:

container.set("UserContent", (container) => createObject("UserContent")) //NB: *not* using `new`, cos that calls the constructor
container.set("UserContentController", (container) => new UserContentController(container.get("ViewService"), container.get("UserContent")))

Then we use the dot operator instead of the :: operator to call the static loadContentByFilters via the object reference to the class: userContent

userContent = userContent.loadContentByFilters(rawArgs)

Better yet, perhaps, would be to just make the constructor arguments optional, and just use new UserContent() ;-)

Note: only ColdFusion 2021 has static support, and only Lucee 5.3.8 and above allows using the dot operator to call a static method on an object.


No doubt Sean will find other points of order from all this, but that's great cos it gives me (/us) an opportunity to think about stuff some more.

Right now I'm thinking about another beer, so I'm gonna do that.

Tomorrow I need to have a prod of CFML's handling of static stuff. I've already spotted something that looks like a bug in ColdFusion (quelle frickin surprise), but I need to check some stuff first, and I CBA doing it now. Because beer.

Righto.

--
Adam

Sunday 11 July 2021

Another thought on controllers and where the buck should stop

G'day:

I just wrote an article on what a controller ought to limit itself to: "What logic should be in a controller? (and a wee bit of testing commentary)". But then I read an old question from Mingo on the article that inspired that, and I have some thoughts on that too.

In most of my code examples around controller methods, I have this sort of method signature:

function handleGet(rawArgs)

What I mean by "raw args" is "all the stuff from an HTTP request, including: query string parameters and arguments (URL scope if yer a CFMLer), request body keys and values (form scope), general request metadata (CGI scope), cookies, and headers. Kind of like how Symfony would do it in the PHP world.

In CFML land things are seldom (never?) this organised. One seems to get a hotchpotch of things possibly put into a request context or something, or possibly - as in CFWheels - actually nothing(!!!) gets passed into the controller method; you just need to know the magical place to go look for them. And by "them" I mean a struct that has the form and URL scope munged into it. Ugh. Anyhow, there's all these elements of an HTTP request and the application lifecycle (application, session, request scopes) available to yer CFML code somehow.

And the controller is the only place one should ever access those. Your business logic should never be tightly-coupled the notion of "this stuff came from a specific sort of HTTP request", or from a specific stage in the application's lifecycle. Or even be aware of the ideas of HTTP requests and the like.

Your model tier should just get "values". Ideally by the time you are applying any actual application logic to them, the values will have been modelled into their own objects, not simply a bunch of primitive values.

I guess one could consider a controller to be similar in role to a repository class, just the other way around. A repository encapsulates the mapping between [for example] storage records - which it fetches - and collections of objects - which it returns - to the business-logic tier. A controller is slightly skinnier than that, it takes the values needed from that selection of HTTP request components (URL scope, form scope, what-have-you), and just passes them as independent values - distinct from how they arrived - to the model. I guess it's not a direct parallel because the controller doesn't do the "mapping" part that is intrinsic to a repository; it just removes the context from the values it receives, and leaves it up to the model tier to know what to do with them. But anyhow, there's a clear separation of concerns here, and the separation is that only the controller should ever deal with these things: CGI scope, the value returned from GetHttpRequestData(), form scope, URL scope, cookie scope, application scope, session scope, request scope.

Righto.

--
Adam

What logic should be in a controller? (and a wee bit of testing commentary)

G'day:

This topic has come up for me twice from different directtions in the last week or so, so I'm gonna dump some thoughts. I've actually discussed this before in "I actively consider what I think ought to go into a controller", and the conclusion I came to doesn't quite fit with how I'm writing code now so I'm gonna revise it a bit.

To start with though, I'll reiterate this question: "should this go in the controller?", and I'll repeat Mingo's answer that is still pithy but spot on:

Isn't the answer always "No"?
~ mjhagen 2017

This is a good starting point. 95% of the time if yer asking yerself that question, Mingo has answered it for you there.

The example of what I'd put in a controller from that article is along these lines:

class ContentController {
    function handleGet(rawArgs){
        args = validationService.validate(rawArgs)
        
        content = contentService.getById(args.id)
        
        response = {
            articles = content.published,
            socialContent = {
                twitter = content.twitter,
                facebook = content.facebook
            }
        }
        return new Response(response)
    }
}

It's not the worst code I've written, but I now think this is wrong. The problem lies with how I had a habit of abusing the Anaemic Domain Model pattern in the past, where I had a bunch of really skinny service classes, and used them to apply behaviour to behaviourless model objects that were just bags of data. Not great.

Looking at that code now, I see these lines:

args = validationService.validate(rawArgs)
content = contentService.getById(args.id)

And I think "nah, those don't belong there. They belong in the model too". I'm doing too much "asking" when I should be "telling" (see "TellDontAsk" by Martin Fowler).

Basically the model here should know what it is to be "valid", so just give it the raw data and let it crack on with it.

My generic controller method these days would be formed along these lines:

function handleRequest(rawRequestValues) {
    try {
        dataFromModel = someModel.getSomeDataFromThisLot(rawRequestValues)
        
        renderedResponse = viewService.renderView("someView", dataFromModel)
        
        return new HtmlResponse(renderedResponse)
        
    } catch (ClientException e) {
        return new ClientErrorResponse(e)
    }
}

Here we clearly have a separation of controller, model and view. It's the controller's job to marshal getting values to a model, and getting the values from that to a view, deal with any error responses that might arise due to those two, or return what came back from the view tier as the response. That's it.

There's an assumption that the framework will deal with any unhandled exceptions there as a controlled 5xx type response. Also there could well be more catch statements, if different types of exception could bubble out of the model, for instance a ValidationException which returns details of validation failures in its response; or a 404 response being returned if a UserNotFoundException came back from some business-logic validation or whatever. But that's the pattern.

The key here is that the only time I'm using a value created by the model is to pass it to the view. I do not pass it to anything else in the interim, like some other model call. That action is not controller logic. It's business logic that we get an x and then pass it to a y. It should be encapsulated in the model.

On the other hand if there was more than one piece of view data to be derived directly from the incoming request values, then that would to me still possibly be legit to be in the controller, eg this is OK:

dataFromModel = someModel.getSomeDataFromThisLot(rawRequestValues)
moreDataFromDifferentModel = someModelOther.getSomeDifferentDataFromThisLot(rawRequestValues)

This would not be OK:

dataFromModel = someModel.getSomeDataFromThisLot(rawRequestValues)
moreDataFromDifferentModel = someModelOther.getSomeDifferentDataFromThisLot(dataFromModel.someValue)

It's a small distinction. But the thing to focus on more than that small example is just to be thinking "no" when you ask yerself "does this belong in the controller?". You're more likely to be right than wrong.


How do we apply that pattern to the example in the old article? Like this I think:

// UserContentController.cfc
component {

    function init(ViewService viewService, UserContentFactory userContentFactory) {
        variables.viewService = arguments.viewService
        variables.userContentFactory = arguments.userContentFactory
    }

    function getContent(rawArgs) {
        try {
            userContent = userContentFactory.getUserContent().loadContentByFilters(rawArgs)
            
            renderedResponse = viewService.renderView("userContentView", userContent)
            
            return new HtmlResponse(renderedResponse)
            
        } catch (ValidationException, e) {
            return new ClientErrorResponse(400, e)
        } catch (UserNotFoundException e) {
            return new ClientErrorResponse(404, e)
        }
    }
}

(I've changed what the controller is returning so as to still integrate the view tier into the example).

I've done away with the controller handling the validation itself, and left that to the model. If things don't pan out: the model will let the controller know. That's it's job. And it's just the controller's job to do something about it. Note that in this case I don't really need both catches. I could just group the exceptions into one ClientException, probably. But I wanted to demonstrate two potential failures from the logic in loadContentByFilters.


What's with this factory I'm using? It's just one of my idiosyncrasies. I like my models' constructors to take actual valid property values, like this:

// UserContent.cfc
component accessors=true invokeImplicitAccessor=true {

    property publishedContent;
    property twitterContent;
    property facebookContent;

    function init(publishedContent, twitterContent, facebookContent) {
        variables.publishedContent = arguments.publishedContent
        variables.twitterContent = arguments.twitterContent
        variables.facebookContent = arguments.facebookContent
    }

Our UserContent represents the data that are those content items. However we've not been given the content items, we've just been given a means to get them. So we can't just create a new object in our controller and slap the incoming values into them. We need to have another method on the UserContent model that works with what the controller can pass it:

function loadContentByFilters(required struct filters) {
    validFilters = validationService.validate(filters, getValidationRules()) // @throws ValidationException
    
    user = userFactory.getById(validFilters.id) // @throws UserNotFoundException
    
    variables.publishedContent = contentService.getUserContent(validFilters)
    variables.twitterContent = twitterService.getUserContent(validFilters)
    variables.facebookContent = facebookService.getUserContent(validFilters)
}

And this demonstrates that to do that work, UserContent needs a bunch of dependencies.

I'm not going to pass these in the constructor because they aren't 100% needed for the operation of a UserContent object, and I want the constructor focusing on its data. So instead these need to be injected as properties:

// UserContent.cfc
component accessors=true invokeImplicitAccessor=true {

    property publishedContent;
    property twitterContent;
    property facebookContent;

    function init(publishedContent, twitterContent, facebookContent) {
        variables.publishedContent = arguments.publishedContent
        variables.twitterContent = arguments.twitterContent
        variables.facebookContent = arguments.facebookContent
    }
    
    function setValidationService(ValidationService validationService) {
        variables.validationService = arguments.validationService
    }
    
    function setUserFactory(UserFactory userFactory) {
        variables.userFactory = arguments.userFactory
    }
    
    function setContentService(UserContentService contentService) {
        variables.contentService = arguments.contentService
    }
    
    function setTwitterService(TwitterService twitterService) {
        variables.twitterService = arguments.twitterService
    }
    
    function setFacebookService(FacebookService facebookService) {
        variables.facebookService = arguments.facebookService
    }

That's all a bit of a mouthful every time we want a UserContent object that needs to use alternative loading methods to get its data, so we hide all that away in our dependency injection set-up, and use a factory to create the object, set its properties, and then return the object:

// UserContentFactory.cfc
component {

    function init(
        ValidationService validationService,
        UserFactory userFactory,
        UserContentService contentService,
        TwitterService twitterService,
        FacebookService facebookService
    ) {
        variables.validationService = arguments.validationService
        variables.userFactory = arguments.userFactory
        variables.contentService = arguments.contentService
        variables.twitterService = arguments.twitterService
        variables.facebookService = arguments.facebookService
    }

    function getUserContent() {
        userContent = new UserContent()
        userContent.setValidationService(validationService)
        userContent.setUserFactory(userFactory)
        userContent.setContentService(contentService)
        userContent.setTwitterService(twitterService)
        userContent.setFacebookService(facebookService)
        
        return userContent
    }
}

The controller just needs to be able to ask the factory for a UserContent object, and then call the method it needs, passing its raw values:

userContent = userContentFactory.getUserContent().loadContentByFilters(rawArgs)

You'll noticed I kept the validation separate from the UserContent model:

function loadContentByFilters(required struct filters) {
    validFilters = validationService.validate(filters, getValidationRules()) // @throws ValidationException

(And then there's also this private method with the rules):

private function getValidationRules() {
    return {
        id = [
            {required = true},
            {type = "integer"}
        ],
        startDate = [
            {required = true},
            {type = "date"},
            {
                range = {
                    max = now()
                }
            }
        ],
        endDate = [
            {required = true},
            {type = "date"},
            {
                range = {
                    max = now()
                }
            }
        ],
        collection = [
            {callback = (collection) => collection.startDate.compare(collection.endDate) < 0}
        ]
    }
}

Validation is fiddly and needs to be accurate, so I don't believe how to validate some values is the job of the UserContent class. I believe it's just perhaps its job to know "what it is to be valid". Hence that separation of concerns. I could see a case for that private method to be its own class, eg UserContentValidationRules or something. But for here, just a private method is OK. Wherever those rules are homed, and whatever the syntax of defining them is, we then pass those and the data to be validated to a specialist validation service that does the business. In this example the validation service itself throws an exception if the validation fails. In reality it'd more likely return a collection of rules violations, and it'd be up to the model making the call to throw the exception. That's implementation detail not so relevant to the code here.


There's probably more off-piste code in this an on-~, but I think it shows how to keep yer domain / business logic out of your controllers, which should be very very light, and simply marshall the incoming request values to the places that need them to be able to come up with a response. That's all a controller ought to do.


Oh before I go. There's an attitude from some testing quarters that one doesn't test one's controllers. I don't actually agree with that, but even if I did: that whole notion is predicated on controllers being very very simple, like I show above. If you pile all (or any of ~) yer logic into yer controller methods: you do actually need to test them! Even in this case I'd still be testing the flow control around the try/catch stuff. If I didn't have that, I'd probably almost be OK if someone didn't test it. Almost.

Righto.

--
Adam

Wednesday 7 July 2021

TIL: two things about reduce operations

G'day:

When working through yesterday's article ("TDD, code puzzle, recursing reductions"), I learned two new things about how reduce operations work.

The first thing was that I was having problems with defaulting the initial value of my callback. I had something like this:

array = ["tahi", "rua", "toru", "wha"]

list = array.reduce((result="", value) => {
    return result.listAppend(value.ucase())
})
writeDump(list)

When I was running this on Lucee I was getting an error: can't call method [listAppend] on object, object is null. The callback was not receiving its default value.

After a while I clocked that I had Lucee's "Complete Null Support" switched on, and this was ruining things for everyone. Why?

Because the method signature for Lucee's Array.reduce has two forms:

Array.reduce(callback)

Array.reduce(callback, initialValue)

I was using the first form, but Lucee still needs to cater for the second form. What's happening is that when Lucee encounters the first form, it's saying "OK so no initialValue. But… null support is switched on, so I'll just pass null for that". So when the callback is called, the initial value is null, so the default in the function signature for the callback is "not needed". Harrumph. I don't like this much, but I understand why it is the way it is I s'pose.

The easy fix is to just use the second form all the time.


Brad Wood pointed out another pitfall when using the callback's default value for that first argument. Consider this variant of the code from above:

array = []

list = array.reduce((result="", value) => {
    return result.listAppend(value.ucase())
})
writeDump(list)

There's no elements in the array, so the callback is never called, so the initial value for the result is never set. So on Lucee with null support set, the result is null. On ColdFusion, and on Lucee with null support switched off, it errors along the lines of variable [list] doesn't exist. It's worthwhile remembering that the default value on the callback parameter is only for the callback, and it simply "coincidentally" works as a default value for the entire operation if the callback is used. To give the operation a default starting value, you need to set it on the reduce call, not the callback signature.

Cheers for pointing this out to me, Brad. It seems obvious once you mentioned it, but it had never occurred to me.

That's it. Two things I learned. Cool.

Righto.

--
Adam

Tuesday 6 July 2021

TDD, code puzzle, recursing reductions

G'day:

I couldn't think of a title for this one, so: that'll do.

A few days ago Tom King posted this on the CFML Slack channel:

I'm sure I've asked this before, but has anyone got a good function to take complex values and parse out to a curl style string? i.e, https://trycf.com/gist/0c2672ac4a02d4c52d1e3ee1c9a408e7/lucee5?theme=monokai
The code in the gist is thus:
original = {
    'payment_method_types' = ['card'],
    'line_items' = [
        {
            "price_data" = {
                "product_data" = {
                    "name" = "Something"
                },
                "currency" = 'gbp',
                "unit_amount" = 123
            },
            "quantity" = 1
        }
    ]
};

writeDump(var=original, label="Original");

wantedResult["payment_method_types[0]"] = 'card';
wantedResult["line_items[0][price_data][product_data][name]"] ='Something';
wantedResult["line_items[0][price_data][currency]"] = 'gbp';
wantedResult["line_items[0][price_data][unit_amount]"] = 123;
wantedResult["line_items[0][quantity]"] = 1;

writeDump(var=wantedResult, label="Want this:");

So he wants to flatten out the key-path to the values. Cool. I'll give that a bash. Note: we've actually already solved this, but I'm now going back over the evolution of my solution, following my TDD steps. This is more a demo of using TDD than it is solving the problem: don't worry about how much code their is in this article, it's mostly incremental repetition, or just largely copy-and-pasted-and-tweaked tests. I will admit that for my first take on this I just wrote the one test that compared my result to his final expectation, and it took me far more time to sort it out than it ought to. I did the whole thing again last night to prep the code for this article, and using more focused TDD I solved it faster. Obviously I was benefiting from already kinda remembering how I did it the first time around, but even the first time around I could see how to do it before I even wrote any code, I just pantsed around making mistakes the first time due to not focusing. TDD on my second attempt helped me focus.

Last things first: the final test

I've got the "client's" final requirement already, so I'll create a test for that now, and watch my code error because it doesn't exist (my code, that is). I'll link the test case description through to the current state of the code in Github. I'll tag each step.

it("fulfils the requirement of the puzzle", () => {
    input = {
        "payment_method_types" = ["card"],
        "line_items" = [
            {
                "price_data" = {
                    "product_data" = {
                        "name" = "Something"
                    },
                    "currency" = "gbp",
                    "unit_amount" = 123
                },
                "quantity" = 1
            }
        ]
    }
    expected = {
        "payment_method_types[0]" = "card",
        "line_items[0][price_data][product_data][name]" = "Something",
        "line_items[0][price_data][currency]" = "gbp",
        "line_items[0][price_data][unit_amount]" = 123,
        "line_items[0][quantity]" = 1
    }

    actual = flattenStruct(input)

    expect(actual).toBe(expected)
})

This errors with No matching function [flattenStruct] found, because I ain't even started yet! Note: for this exercise, I'm just building the function in the test class itself, given it's just one function, and it only needs to exist to fulfil this blogging/TDD exercise.

This test will keep erroring until I've finished. This is fine. I am not going to look at that test code again until I have finished the task though.

It handles simple value key/value pairs

I'm not going to be crazily pendantic with the TDD here. For example in this case I'm rolling in "creating the function and solving the first step of the challenge" into one step. I've already got a test failing due to the lack of the function existing, so I can watch it's progress for some of the micro-iterations I am missing out. So for this step I'm gonna iterate over the struct, and deal with the really easy cases: top level key/value pairs that are simple values and do not require any recursion to work. This is also the "bottom" level of the recursion I'll move on to, so know I already need it anyhow. Maybe making the decision that I need to use recursion to sort this out is pre-emptive, but I'm buggered if I know any other way of doing it. And taking recursion as a given, I always like to start with the lowest level solution: the bit that doesn't actually recurse.

Here's my test for this:

it("handles top-level key/values with simple values", () => {
    input = {
        "one" = "tahi",
        "two" = {"second" = "rua"},
        "three" = {
            "third" = {"thrice" = "toru"}
        }
    }
    expected = {
        "[one]" = "tahi"
    }

    actual = flattenStruct(input)

    expect(actual).toBe(expected)
})

And the implementation to make it pass:

function flattenStruct(required struct struct) {
    return struct.reduce((flattened, key, value) => {
        if (isSimpleValue(value)) {
            return {"[#key#]" = value}
        }
        return flattened
    }, {})
}

That's lovely, and the test passes, but there's a bug in it which I didn't notice the first time around.

It handles multiple top-level key/values with simple values

it("handles multiple top-level key/values with simple values", () => {
    input = {
        "one" = "tahi",
        "first" = "tuatahi",
        "two" = {"second" = "rua"},
        "three" = {
            "third" = {"thrice" = "toru"}
        }
    }
    expected = {
        "[one]" = "tahi",
        "[first]" = "tuatahi"
    }

    actual = flattenStruct(input)

    expect(actual).toBe(expected)
})

That fails with Expected [{[one]={tahi}, [first]={tuatahi}}] but received [{[one]={tahi}}]. It's cos I'm a dropkick and not appending each iteration to the previous:

return {"[#key#]" = value}

// should be

return flattened.append({"[#key#]" = value})

Once I fixed that: the two TDD tests are green. Obviously the initial one is still broken cos we ain't done.

It handles complex values

Now all I needed was to add some structs to the sample data for the test:

it("handles complex values", () => {
    input = {
        "one" = "tahi",
        "first" = "tuatahi",
        "two" = {"second" = "rua"},
        "three" = {
            "third" = {"thrice" = "toru"}
        }
    }
    expected = {
        "[one]" = "tahi",
        "[first]" = "tuatahi",
        "[two][second]" = "rua",
        "[three][third][thrice]" = "toru"
    }

    actual = flattenStruct(input)

    expect(actual).toBe(expected)
})

And write a quick implementation:

function flattenStruct(required struct struct) {
    flatten = (flattened, key, value, _, prefix="") => {
        var prefixedKey = "#prefix#[#key#]"
        if (isSimpleValue(value)) {
            return flattened.append({"#prefixedKey#" = value})
        }

        return flattened.append(value.reduce(
            function (flattened, key, value, _, prefix=prefixedKey) {
                return flattened.append(flatten(flattened, key, value, _, prefix))
            },
            {}
        ))
    }

    return struct.reduce(flatten, {})
}

This warrants some explanation:

  • We need to extract all the logic we currently have within the outer function into its own function. We need to do this because when we recurse, we need it to call itself, so it needs a name.
  • The recursion works by saying to each level of the recursion "the previous level had this key, so append your keys to this", so we implement this, and default the prefix to be an empty string to start with.
  • We're leveraging a nice feature of CFML here in that one can pass additional arguments to a function, and "it'll just work", so we pass that after all the other arguments. We're not using the fourth parameter to the reduce callback which contains the original collection being iterated over, so we reflect that by calling the parameter _ (this is a notional standard for this sort of thing).
  • When we get to the "bottom" of the recursion we will always have a simple value for the value (otherwise we're not done recursing), so we prepend its key with the parent key (which will be all the ancestor keys combined).

I also now needed to update my previous test case expectations to include the structs coming out in the result too. In hindsight, I messed up there: I should only have had test data for the case at hand, rather than having some more data that I knew was specifically going to collapse in a heap.

But it also has a bug. The first test completely broke now: Can't cast Complex Object Type Struct to StringUse Built-In-Function "serialize(Struct):String" to create a String from Struct

I find the situation here a bit frustrating. The signature for the callback function for Struct.reduce is this:

function(previous, key, value, collection)

And that's what I have in the code above.

However. I had overlooked that the signature for the callback function for Array.reduce is this:

function(previous, value, key, collection)

It had never occurred to me that key, value and value, key were reversed between the two. Sigh. So my single handling of complex values there doesn't work because I'm passing Array.reduce the key and the value in the wrong order.

It's reasonably easy to fix though.

It handles arrays

I've added an array into the test data:

it("handles arrays values", () => {
    input = {
        "one" = "tahi",
        "first" = "tuatahi",
        "two" = {"second" = "rua"},
        "three" = {
            "third" = {"thrice" = "toru"}
        },
        "four" = ["fourth", "quaternary", "wha", "tuawha"]
    }
    expected = {
        "[one]" = "tahi",
        "[first]" = "tuatahi",
        "[two][second]" = "rua",
        "[three][third][thrice]" = "toru",
        "[four][1]" = "fourth",
        "[four][2]" = "quaternary",
        "[four][3]" = "wha",
        "[four][4]" = "tuawha"
    }

    actual = flattenStruct(input)

    expect(actual).toBe(expected)
})

And added logic to check if the value is a struct or an array before recursing. The variation just being to expect the callback parameters to be in a different order:

function flattenStruct(required struct struct) {
    flatten = (flattened, key, value, _, prefix="") => {
        var prefixedKey = "#prefix#[#key#]"
        if (isSimpleValue(value)) {
            return flattened.append({"#prefixedKey#" = value})
        }

        if (isStruct(value)) {
            return flattened.append(value.reduce(
                function (flattened={}, key, value, _, prefix=prefixedKey) {
                    return flattened.append(flatten(flattened, key, value, _, prefix))
                },
                {}
            ))
        }
        if (isArray(value)) {
            return flattened.append(value.reduce(
                function (flattened={}, value, index, _, prefix=prefixedKey) {
                    return flattened.append(flatten(flattened, index, value, _, prefix))
                },
                {}
            ))
        }
    }

    return struct.reduce(flatten, {})
}

With this amendment all the TDD tests pass, but the original test - which I was kinda expecting to also pass now - wasn't. I had a coupla bugs in my implementation still, basically from me "not reading the requirements"

It qualifies the keys correctly and zero-indexes the arrays

Yes yes, very bad to fix two things at once, I know. I didn't notice that Tom's requirements were to present the flattened keys like this (taken from the previous test): four[0], whereas I'm doing the same one like this: [four][1]. So I'm not supposed to be putting the brackets around the top-level item, plus I need to present the arrays with a zero index. This is easily done, so I did both at once. I figure I'm kinda OK doing this cos I am starting with a failing test afterall. And I also added a more complex test case in (mixing arrays, structs, simple values some more) to make sure I had nailed it:

it("qualifies the keys correctly and zero-indexes the arrays", () => {
    input = {
        "one" = "tahi",
        "first" = "tuatahi",
        "two" = {"second" = "rua"},
        "three" = {
            "third" = {"thrice" = "toru"}
        },
        "four" = ["fourth", "quaternary", "wha", "tuawha"],
        "five" = [
            {"fifth" = "rima"},
            ["tuarima"],
            "quinary"
        ]
    }
    expected = {
        "one" = "tahi",
        "first" = "tuatahi",
        "two[second]" = "rua",
        "three[third][thrice]" = "toru",
        "four[0]" = "fourth",
        "four[1]" = "quaternary",
        "four[2]" = "wha",
        "four[3]" = "tuawha",
        "five[0][fifth]" = "rima",
        "five[1][0]" = "tuarima",
        "five[2]" = "quinary"
    }

    actual = flattenStruct(input)

    expect(actual).toBe(expected)
})

And the fix:

function flattenStruct(required struct struct) {
    flatten = (flattened, key, value, actual, prefix="") => {
        var offsetKey = isArray(actual) ? key - 1 : key
        var qualifiedKey = prefix.len() > 0 ? "[#offsetKey#]" : offsetKey
        var prefixedKey = "#prefix##qualifiedKey#"

        if (isSimpleValue(value)) {
            return flattened.append({"#prefixedKey#" = value})
        }

        if (isStruct(value)) {
            return flattened.append(value.reduce(
                function (flattened={}, key, value, actual, prefix=prefixedKey) {
                    return flattened.append(flatten(flattened, key, value, actual, prefix))
                },
                {}
            ))
        }
        if (isArray(value)) {
            return flattened.append(value.reduce(
                function (flattened={}, value, index, actual, prefix=prefixedKey) {
                    return flattened.append(flatten(flattened, index, value, actual, prefix))
                },
                {}
            ))
        }
    }

    return struct.reduce(flatten, {})
}

Quite simply:

  • If the data we are reducing is an array, we reduce the index by one;
  • and we only put the brackets on if we've already got a prefix value.
  • We're using that _ parameter we were not using before, so I've given it a name.

Now everything works:

Or does it?

It handles the collection being an arguments scope object

I handed this implementation (or something very similar) to Tom with a flourish, and a coupla hours later he was puzzled about what sort of object the arguments scope is. It behaves like both a struct and an array. For some things. And it's ambiguous in others. When Tom was passing an arguments scope to this function, it was breaking again:

The arguments scope passes an isArray check, but its reduce method handles it like a struct, so they key being passed is the parameter name, not the positional index like we'd be expecting with an array that it's just claimed to us that it is.

Let's have a test for this:

it("handles arguments objects", () => {
    input = {
        "one" = "tahi",
        "first" = "tuatahi",
        "two" = {"second" = "rua"},
        "three" = {
            "third" = {"thrice" = "toru"}
        },
        "four" = ["fourth", "quaternary", "wha", "tuawha"],
        "five" = [
            {"fifth" = "rima"},
            ["tuarima"],
            "quinary"
        ],
        "six" = ((sixth, senary) => arguments)("ono", "tuaono", [6])
    }
    expected = {
        "one" = "tahi",
        "first" = "tuatahi",
        "two[second]" = "rua",
        "three[third][thrice]" = "toru",
        "four[0]" = "fourth",
        "four[1]" = "quaternary",
        "four[2]" = "wha",
        "four[3]" = "tuawha",
        "five[0][fifth]" = "rima",
        "five[1][0]" = "tuarima",
        "five[2]" = "quinary",
        "six[sixth]" = "ono",
        "six[senary]" = "tuaono",
        "six[2][0]" = 6
    }

    actual = flattenStruct(input)

    expect(actual).toBe(expected)
})

Note I'm using CFML's IIFE feature to call an inline function expression there.

I am not entirely happy with my fix for this, but I've gaffer-taped it up:

function flattenStruct(required struct struct) {
    flatten = (flattened, key, value, actual, prefix="") => {
        var offsetKey = isArray(actual) && isNumeric(key) ? key - 1 : key
        var qualifiedKey = prefix.len() > 0 ? "[#offsetKey#]" : offsetKey

I'm just checking if the "key" is numeric before I decrement it.


There's perhaps one more case I should put in there. This will silently ignore anything other than simple values, arrays or structs. I had a mind to throw an exception at the end if we'd not already returned, but I didn't get around to it. This solved the need we had, so that's all good.

Hopefully you found another incremental TDD exercise useful, and also found the problem we were trying to solve an interesting one as well. I did.

Righto.

--
Adam

Thursday 1 July 2021

One last one! CFML higher-order functions compared to tag-based code: reduceRight

G'day:

I forgot one!

I've already discussed map, reduce, filter, sort, some, every and each, operations; but recently reduceRight was added to CFML (well: at least in ColdFusion it was; it's not in Lucee yet) as well.

I have to start my day job in 16min, so this will be quick.

reduceRight is the same as reduce, except it starts from the end of the collection, not the beginning:

colours = ["Whero","Karaka","Kowhai","Kakariki","Kikorangi","Poropango","Papura"]

coloursAsList = colours.reduce((all="", colour) => all.listAppend(colour))
coloursAsReversedList = colours.reduceRight((all="", colour) => all.listAppend(colour))

writeOutput("coloursAsList: #coloursAsList#<br>coloursAsReversedList: #coloursAsReversedList#<br>")
coloursAsList: Whero,Karaka,Kowhai,Kakariki,Kikorangi,Poropango,Papura
coloursAsReversedList: Papura,Poropango,Kikorangi,Kakariki,Kowhai,Karaka,Whero

Yes yes Mingo; one would not use reduce to convert an array of strings to a list. That is beside the point. But thanks for letting me know Lucee (but not ColdFusion) has an Array.reverse method, which would be a better way to reverse the list order here: colours.reverse().toList().

And the tags version, just a reversed counting loop does the trick here:

<cfset coloursAsReversedList = "">
<cfloop index="i" from="#arrayLen(colours)#" to="1" step="-1">
    <cfset coloursAsReversedList = listAppend(coloursAsReversedList, colours[i])>
</cfloop>

That's it. four minutes to get to work. Fortunately that's just a matter of switching desktops…

Righto.

--
Adam

CFML higher-order functions compared to tag-based code: some, every and each functions

G'day:

I'm gonna try to round out this short series today: there's not much to say about the some, every and each methods in the context of comparing their functionality to old-school tag-based code. As a reminder, I've already covered map, reduce, filter and sort operations.

some

some iterates over the collection, calling a callback on each element, and will exit as soon as the callback returns true for an element. An example might be checking if at least some class members passed (or failed) their test:

examResults = [
    {person="Alex", mark=75},
    {person="Billie", mark=52},
    {person="Charlie", mark=41},
    {person="Daryl", mark=29},
    {person="Evan", mark=53}
]

somePassed = examResults.some((result) => result.mark >= 50)

writeOutput("Some of the class passed the test? #somePassed#<br><hr>")


someFailed = examResults.some((result) => {
    writeOutput("Called for #result.person#, #result.mark#<br>")
    return result.mark < 50
})

In the second example there I show a difference between this iteration function and the others we've encountered so far. All the others always iterate through the entire collection, however some and every do not. They exit as soon as they can answer the question. So as soon as some gets a true it exits; as soon as every gets a false it exits. The output of this is:

Some of the class passed the test? true

Called for Alex, 75
Called for Billie, 52
Called for Charlie, 41

In this case it only got as far as the first false result from the callback (because, sadly, Charlie did not make the cut)

The tag-based version of this would be:

<cfset somePassed = false>
<cfloop array="#examResults#" item="result">
    <cfif result.mark GTE 50>
        <cfset somePassed = true>
        <cfbreak>
    </cfif>
</cfloop>
<cfoutput>Some of the class passed the test? #somePassed#<br><hr></cfoutput>

<cfset someFailed = false>
<cfloop array="#examResults#" item="result">
    <cfoutput>Called for #result.person#, #result.mark#<br></cfoutput>
    <cfif result.mark LT 50>
        <cfset someFailed = true>
        <cfbreak>
    </cfif>
</cfloop>

Again with the boilerplate code (ref from previous articles).

BTW, don't get carried away with these higher-order functions if there's another built-in function to do the job. Recently I checked if something was in an array by doing this:

colours = ["Whero","Karaka","Kowhai","Kakariki","Kikorangi","Poropango","Papura"]

containsGreen = colours.some((colour) => colour == "Kakariki")
writeOutput("It contains green: #containsGreen#<br>")

My boss gently pointed out I could just do this:

containsGreen = !!colours.find("Kakariki")

Use the simpler option where possible ;-)

every

every is the opposite of some: it exits as soon as the callback returns false. Our example here would be to check if everyone passed the exam:

everyonePassed = examResults.every((result) => {
    writeOutput("Called for #result.person#, #result.mark#<br>")
    return result.mark >= 50
})
writeOutput("Everyone passed the test? #everyonePassed#<br><hr>")
Called for Alex, 75
Called for Billie, 52
Called for Charlie, 41
Everyone passed the test? false

The tag-based equivalent is the usual "mostly boilerplate" thing:

<cfset everyonePassed = true>
<cfloop array="#examResults#" item="result">
    <cfoutput>Called for #result.person#, #result.mark#<br></cfoutput>
    <cfset personPassed =  result.mark GTE 50>
    <cfif NOT personPassed>
        <cfset everyonePassed = false>
        <cfbreak>
    </cfif>
</cfloop>
<cfoutput>Everyone passed the test? #everyonePassed#<br><hr></cfoutput>

each

Sometimes it's not a data transformation that one needs when iterating over a collection. If none of the other options do the trick, there's the generic each method:

examResults.each((result) => {
    writeOutput("Name: #result.person#, mark: #result.mark#<br>")
})

As a general rule never start solving an iteration task with each. Consider if one of the other more situation-specific methods are a better fit. It's seldom that each is the right answer.

And the tag equivalent is pretty much the same, because - really - all the tag version does is "each"; it's down to the inner code block to distinguish between the various iteration possibilities:

<cfloop array="#examResults#" item="result">
    <cfoutput>Name: #result.person#, mark: #result.mark#<br></cfoutput>
</cfloop>

OK that's it. Tag-based CFML versions of the more situation-descriptive and less boilerplate iteration higher-order functions. If you need anything else about them explained, let me know.

Righto.

--
Adam