Thursday, 4 November 2021

A question about the overhead of OOP in CFML

G'day:

A question cropped up on the CFML Slack channel the other day. My answer was fairly long-winded so I decided to post it here as well. I asked the original questioner, and they are OK with me reproducing their question.

Again, I have a question to experienced OOP cfml coders. From the clean code concept I know I should break code into smaller (er even its smallest ) pieces. Is there any possible reason to stop doing that at a certain level in CFML? Eg. for performance reasons? Eg. lets assume I have a component named Car.cfc. Should I always break a Car.cfc component into Wheel.cfc, Engine.cfc, CarBody.cfc accordingly? Does the createObject behave like include files that would come with a certain overhead because of physical file request? What is when I also break Engine.cfc into many little pieces (and Wheel.cfc also)?
Andreas @ CFML Slack Channel

Here's my answer. I've tidied up the English in some places, but have not changed any detail of what I said.


Eventually there will be a meaningful overhead cost of creating a lot of objects.

Note that your comment about "behave like include files that would come with a certain overhead because of physical file request" isn't really accurate because the files are only read once, and after that they're in memory. Same applies with includes, for that matter. The process is (sort of):

  • code calls for an object to be created
  • if implementation for object is not found in memory, its code is loaded from disk
  • object is created
  • object is used
  • object is at some point cleaned up once it's not being referenced any more (at the discretion of the garbage collector)

That second step is only performed if needed, and all things being equal, will only be needed once in the lifetime of yer app in the JVM.

So don't worry about file system overhead; it's not going to be significant here.

Creating objects does come at a cost, and neither CFML engine has traditionally been particularly efficient at doing so (Lucee is better I believe; and CF is not as slow as it used to be). This could be a consideration at some point.

However performance considerations like this shouldn't be worried about until they start becoming an issue.

Design your application in a way that best represents the data and behaviour of your business domain. Make it modular, employing a sense of reusability and following the Single Responsibility Principle.

Keep an eye on your JVM. Use FusionReactor or something else similar. I suspect FR is the only game in town for CFML code; but there are other general JVM profiling tools out there as well which will do as good a job, but be Java-centric. If you see performance spikes: sort them out.

Load test your application with real-world load. This doesn't mean looping over object-creation one million times and doing "tada! It took x seconds". This means close to nothing and is not really a meaningful test. Use a load testing tool to load test your application, not your code. Back when I uses to do such things, There was tooling that could re-run a web server log, so one could easily test with real-world traffic. This is important because concurrency issues which might cause locking bottlenecks, and application slow-downs.

[I forgot to say this bit in my original answer]. Irrespective of the overhead of creating objects, these will be (orders of magnitude more ~) trivial compared to the overhad of a poorly-written DB query, or bad indexing, or bad locking of code, heavy (and possibly badly-designed) string processing etc. There's stacks of things I'd be worrying about before I wondered if I was calling new Thing() too often.

[conted…]

That said, don't go crazy with decomposing your domain models. A Car doesn't intrinsically need to have a collection of Wheel objects. It might just need to know the number "4" (number of wheels). Wait until there is behaviour or data needed for the wheels, and make sure to keep those elements separate in your Car class. At some point if there's a non-trivial proportion of data and/or behaviour around the Wheel implementation, or you need another sort of Car that has Wheels with different (but similar) properties: then extract the Wheels into a separate class.

Making sure you have all your domain requirements tested makes this sort of refactoring much safer, so that one can continually engineer and improve one's domain design without worrying too much about breaking our client's requirements.


Andreas followed up with an observation of falling into the trap of using bad practices when writing ones code, due to not knowing what the good practices are (my wording, not his). To this, I've just responded:

I think doing stuff that is "bad practice" when you don't know any better is fine; it's when you do know better and still follow bad practice for invented reasons (fictitious deadlines, mythical complexity, general "CBA") that's an issue.

That said one can mitigate some of this by actively improving one's knowledge of what is good and bad practice. This is why I advocate all devs should - at a minimum - have read these books:

  • Clean Code - Martin
  • Head First Design Patterns (second ed) - various
  • Test-Driven Development by Example - Beck
  • Refactoring - Fowler

There's other ones like Code Complete (I couldn't wade through it I'm afraid) and The Pragmatic Programmer (am about 20% of the way through it and it's only slightly engaging me so far) which other people will recommend.

One should make sure one is comfortable with the testing framework of choice for one's environment, and testing one's code. Either before via TDD or even just afterwards is essential to writing good stable, scalable, maintainable code.


I'm pretty sure there's not an original thought here, but hey: most of my writing is like that, eh? Anyway, there you go.

Righto.

--
Adam

Saturday, 30 October 2021

TDD: writing a micro testing framework, using the framework to test itself as I build it

G'day:

Being back in CFML land, I spend a lot of time using trycf.com to write sample code. Sometimes I just want to see how ColdFusion and Lucee behave differently in a given situation. Sometimes I'm "helping" someone on the CFML Slack channel, and want to give them a runnable example of what I think they might want to be doing. trycf.com is cool, but I usually want to write my example code as a series of tests demonstrating variants of its behaviour. I also generally want to TDD even sample code that I write, so I know I'm staying on-point and the code works without any surprises. trycf.com is a bit basic, one can only really have one CFML script and run that. One cannot import other libraries like TestBox so I can write my code the way I want.

Quite often I write a little stub tester to help me, and I find myself doing this over and over again, and I never think to save these stubs across examples. Plus having to include the stubs along with the sample code I'm writing clutters things a bit, and also means my example code doesn't really stick to the Single Responsibility Principle.

I found out from Abram - owner of trycf.com - that one can specify a setupCodeGistId in the trycf.com URL, and it will load a CFML script from that and pre-include that before the scratch-pad code. This is bloody handy, and armed with this knowledge I decided I was gonna knock together a minimal single-file testing framework which I could include transparently in with a scratch-pad file, so that scratch-pad file could focus on the sample code I'm writing, and the testing of the same.

A slightly hare-brained idea I have had when doing this is to TDD the whole exercise… and using the test framework to test itself as it evolves. Obviously to start with the test will need to be separate pieces of code before the framework is usable, but beyond a point it should work enough for me to be able to use it to test the latter stages of its development. Well: we'll see, anyhow.

Another challenge I am setting for myself is that I'm gonna mimic the "syntax" of TestBox, so that the tests I write in a scratch-pad should be able to be lifted-out as-is and dumped into a TestBox test spec CFC. Obviously I'm not going to reimplement all of TestBox, just a subset of stuff to be able to do simple tests. I think I will need to implement the following functions:

  • void run() - well, like in TestBox all my tests will be implemented in a function called run, and then one runs that to execute the tests. This is just convention rather than needing any development.
  • void describe(required string label, required function testGroup) - this will output its label and call its testGroup.
  • void it(required string label, required function implementation) - this will output its label and call its implementation. I'm pretty sure the implementation of describe and it will be identical. I mean like I will use two references to the same function to implement this.
  • struct expect(required any actual) - this will take the actual value being tested, and return a struct containing a matcher to that actual value.
  • boolean toBe(required any expected) - this will take the value that the actual value passed to expect should be. This will be as a key in the struct returned by expect (this will emulate the function-chaining TestBox uses with expect(x).toBe(y).

If I create that minimalist implementation, then I will be able to write this much of a test suite in a trycf.com scratch pad:

function myFunctionToTest(x) {
    // etc
}

function run() {
    describe("Tests for myFunctionToTest" ,() => {
        it("tests some variant", () => {
            expect(myFunctionToTest("variant a")).toBe("something")
        })

        it("tests some other variant", () => {
            expect(myFunctionToTest("variant b")).toBe("something else")
        })
    })
}

run()

And everything in that run function will be compatible with TestBox.

I am going to show every single iteration of the process here, to demonstrate TDD in action. This means this article will be long, and it will be code-heavy. And it will have a lot of repetition. I'll try to keep my verbiage minimal, so if I think the iteration of the code speaks for itself, I possibly won't have anything to add.

Let's get on with it.


It runs the tests via a function "run"

<cfscript>
// tests    
try {
    run()
    writeOutput("OK")
} catch (any e) {
    writeOutput("run function not found")
}
</cfscript>

<cfscript>
// implementation    
    
</cfscript>

I am going to do this entire implementation in a trycf.com scratch-pad file. As per above, I have two code blocks: tests and implementation. For each step I will show you the test (or updates to existing tests), and then I will show you the implementation. We can take it as a given that the test will fail in the way I expect (which will be obvious from the code), unless I state otherwise. As a convention a test will output "OK" if it passed, or "Failure: " and some error explanation if not. Later these will be baked into the framework code, but for now, it's hand-cranked. This shows that you don't need any framework to design your code via TDD: it's a practice, it's not a piece of software.

When I run the code above, I get "Failure: run function not found". This is obviously because the run function doesn't even exist yet. Let's address that.

// implementation    
void function run(){
}

Results: OK

We have completed one iteration of red/green. There is nothing to refactor yet. Onto the next iteration.


It has a function describe

We already have some of our framework operational. We can put tests into that run function, and they'll be run: that's about all that function needs to do. Hey I didn't say this framework was gonna be complicated. In fact the aim is for it to be the exact opposite of complicated.

// tests
// ...

void function run(){
    try {
        describe()
        writeOutput("OK")
    } catch (any e) {
        writeOutput("Failure: describe function not found")
    }
}
// implementation
void function describe(){
}

The tests already helped me here actually. In my initial effort at implementing describe, I misspelt it as "desribe". Took me a few seconds to spot why the test failed. I presume, like me, you have found a function with a spelling mistake in it that has escaped into production. I was saved that here, by having to manually type the name of the function into the test before I did the first implementation.


describe takes a string parameter label which is displayed on a line by itself as a label for the test grouping

// tests
// ...
savecontent variable="testOutput" {
    describe("TEST_DESCRIPTION")
};
if (testOutput == "TEST_DESCRIPTION<br>") {
    writeOutput("OK<br>")
    return
}
writeOutput("Failure: label not output<br>")
// implementation
void function describe(required string label){
    writeOutput("#label#<br>")
}

This implementation passes its own test (good), but it makes the previous test break:

try {
    describe()
    writeOutput("OK")
} catch (any e) {
    writeOutput("Failure: describe function not found")
}

We require describe to take an argument now, so we need to update that test slightly:

try {
    describe("NOT_TESTED")
    writeOutput("OK")
} catch (any e) {
    writeOutput("Failure: describe function not found")
}

I make a point of being very clear when arguments etc I need to use are not part of the test.

It's also worth noting that the test output is a bit of a mess at the moment:

OKNOT_TESTED
OKOK

For the sake of cosmetics, I've gone through and put <br> tags on all the test messages I'm outputting, and I've also slapped a cfsilent around that first describe test, as its output is just clutter. The full implementation is currently:

// tests    
try {
    cfsilent() {
        run()
    }
    writeOutput("OK<br>")
} catch (any e) {
    writeOutput("Failure: run function not found<br>")
}

void function run(){
    try {
        cfsilent(){describe("NOT_TESTED")}
        writeOutput("OK<br>")
    } catch (any e) {
        writeOutput("Failure: describe function not found<br>")
    }

    savecontent variable="testOutput" {
        describe("TEST_DESCRIPTION")
    };
    if (testOutput == "TEST_DESCRIPTION<br>") {
        writeOutput("OK<br>")
    }else{
    	writeOutput("Failure: label not output<br>")
    }
}

run()
</cfscript>

<cfscript>
// implementation
void function describe(required string label){
    writeOutput("#label#<br>")
}

And now the output is tidier:

OK
OK
OK

I actually did a double-take here, wondering why the TEST_DESCRIPTION message was not displaying for that second test: the one where I'm actually testing that that message displays. Of course it's because I've got the savecontent around it, so I'm capturing the output, not letting it output. Duh.


describe takes a callback parameter testGroup which is is executed after the description is displayed

savecontent variable="testOutput" {
    describe("TEST_DESCRIPTION", () => {
        writeOutput("DESCRIBE_GROUP_OUTPUT")
    })
};
if (testOutput == "TEST_DESCRIPTION<br>DESCRIBE_GROUP_OUTPUT") {
    writeOutput("OK<br>")
}else{
    writeOutput("Failure: testGroup not executed<br>")
}
void function describe(required string label, required function testGroup){
    writeOutput("#label#<br>")
    testGroup()
}

This implementation change broke earlier tests that called describe, because they were not passing a testGroup argument. I've updated those to just slung an empty callback into those calls, eg:

describe("TEST_DESCRIPTION", () => {})

That's all I really need from describe, really. But I'll just do one last test to confirm that it can be nested OK (there's no reason why it couldn't be, but I'm gonna check anyways.


describe calls can be nested

savecontent variable="testOutput" {
    describe("OUTER_TEST_DESCRIPTION", () => {
        describe("INNER_TEST_DESCRIPTION", () => {
            writeOutput("DESCRIBE_GROUP_OUTPUT")
        })
    })
};
if (testOutput == "OUTER_TEST_DESCRIPTION<br>INNER_TEST_DESCRIPTION<br>DESCRIBE_GROUP_OUTPUT") {
    writeOutput("OK<br>")
}else{
    writeOutput("Failure: describe nested did not work<br>")
}

And this passes without any adjustment to the implementation, as I expected. Note that it's OK to write a test that just passes, if there's not then a need to make an implementation change to make it pass. One only needs a failing test before one makes an implementation change. Remember the tests are to test that the implementation change does what it's supposed to. This test here is just a belt and braces thing, and to be declarative about some functionality the system has.


The it function behaves the same way as the describe function

it will end up having more required functionality than describe needs, but there's no reason for them to not just be aliases of each other for the purposes of this micro-test-framework. As long as the describe alias still passes all its tests, it doesn't matter what extra functionality I put into it to accommodate the requirements of it.

savecontent variable="testOutput" {
    describe("TEST_DESCRIPTION", () => {
        it("TEST_CASE_DESCRIPTION", () => {
            writeOutput("TEST_CASE_RESULT")
        })
    })
};
if (testOutput == "TEST_DESCRIPTION<br>TEST_CASE_DESCRIPTION<br>TEST_CASE_RESULT") {
    writeOutput("OK<br>")
}else{
    writeOutput("Failure: the it function did not work<br>")
}
it = describe

it will not error-out if an exception occurs in its callback, instead reporting an error result, with the exception's message

Tests are intrinsically the sort of thing that might break, so we can't have the test run stopping just cos an exception occurs.

savecontent variable="testOutput" {
    describe("NOT_TESTED", () => {
        it("tests an exception", () => {
            throw "EXCEPTION_MESSAGE";
        })
    })
};
if (testOutput CONTAINS "tests an exception<br>Error: EXCEPTION_MESSAGE<br>") {
    writeOutput("OK<br>")
}else{
    writeOutput("Failure: the it function did not correctly report the test error<br>")
}
void function describe(required string label, required function testGroup) {
    try {
        writeOutput("#label#<br>")
        testGroup()
    } catch (any e) {
        writeOutput("Error: #e.message#<br>")
    }
}

At this point I decided I didn't like how I was just using describe to implement it, so I refactored:

void function runLabelledCallback(required string label, required function callback) {
    try {
        writeOutput("#label#<br>")
        callback()
    } catch (any e) {
        writeOutput("Error: #e.message#<br>")
    }
}

void function describe(required string label, required function testGroup) {
    runLabelledCallback(label, testGroup)
}

void function it(required string label, required function implementation) {
    runLabelledCallback(label, implementation)
}

Because everything is test-covered, I am completely safe to just make that change. Now I have the correct function signature on describe and it, and an appropriately "general" function to do the actual execution. And the tests all still pass, so that's cool. Reminder: refactoring doesn't need to start with a failing test. Intrinsically it's an activity that mustn't impact the behaviour of the code being refactored, otherwise it's not a refactor. Also note that one does not ever alter implementation and refactor at the same time. Follow red / green / refactor as separate steps.


it outputs "OK" if the test ran correctly

// <samp>it</samp> outputs OK if the test ran correctly
savecontent variable="testOutput" {
    describe("NOT_TESTED", () => {
        it("outputs OK if the test ran correctly", () => {
            // some test here... this actually starts to demonstrate an issue with the implementation, but we'll get to that
        })
    })
};
if (testOutput CONTAINS "outputs OK if the test ran correctly<br>OK<br>") {
    writeOutput("OK<br>")
}else{
    writeOutput("Failure: the it function did not correctly report the test error<br>")
}

The implementation of this demonstrated that describe and it can't share an implementation. I don't want describe calls outputting "OK" when their callback runs OK, and this was what started happening when I did my first pass of the implementation for this:

void function runLabelledCallback(required string label, required function callback) {
    try {
        writeOutput("#label#<br>")
        callback()
        writeOutput("OK<br>")
    } catch (any e) {
        writeOutput("Error: #e.message#<br>")
    }
}

Only it is a test, and only it is supposed to say "OK". This is an example of premature refactoring, and probably going overboard with DRY. The describe function wasn't complex, so we were gaining nothing by de-duping it from it, especially when it's implementation will have more complexity still to come.

I backed out my implementation and my failing test for a moment, and did another refactor to separate-out the two functions completely. I know I'm good when all my tests still pass.

void function describe(required string label, required function testGroup) {
    writeOutput("#label#<br>")
    testGroup()
}

void function it(required string label, required function implementation) {
    try {
        writeOutput("#label#<br>")
        implementation()
    } catch (any e) {
        writeOutput("Error: #e.message#<br>")
    }
}

I also needed to update the baseline it test to expect the "OK<br>". After that everything is green again, and I can bring my new test back (failure as expected), and implementation (all green again/still):

void function it(required string label, required function implementation) {
    try {
        writeOutput("#label#<br>")
        implementation()
        writeOutput("OK<br>")
    } catch (any e) {
        writeOutput("Error: #e.message#<br>")
    }
}

describe will not error-out if an exception occurs in its callback, instead reporting an error result, with the exception's message

Back to describe again. It too needs to not error-out if there's an issue with its callback, so we're going to put in some handling for that again too. This test is largely copy and paste from the equivalent it test:

savecontent variable="testOutput" {
    describe("TEST_DESCRIPTION", () => {
        throw "EXCEPTION_MESSAGE";
    })
};
if (testOutput CONTAINS "TEST_DESCRIPTION<br>Error: EXCEPTION_MESSAGE<br>") {
    writeOutput("OK<br>")
}else{
    writeOutput("Failure: the it function did not correctly report the test error<br>")
}
void function describe(required string label, required function testGroup) {
    try {
        writeOutput("#label#<br>")
        testGroup()
    } catch (any e) {
        writeOutput("Error: #e.message#<br>")
    }
}

it outputs "Failed" if the test failed

This took some thought. How do I know a test "failed"? In TestBox when something like expect(true).toBeFalse() runs, it exits from the test immediately, and does not run any further expectations the test might have. Clearly it's throwing an exception from toBeFalse if the actual value (the one passed to expect) isn't false. So this is what I need to do. I have not written any assertions or expectations yet, so for now I'll just test with an actual exception. It can't be any old exception, because the test could break for any other reason too, so I need to differentiate between a test fail exception (the test has failed), and any other sort of exception (the test errored). I'll use a TestFailedException!

savecontent variable="testOutput" {
    describe("NOT_TESTED", () => {
        it("outputs failed if the test failed", () => {
            throw(type="TestFailedException");
        })
    })
};
if (testOutput.reFind("Failed<br>$")) {
    writeOutput("OK<br>")
}else{
    writeOutput("Failure: the it function did not correctly report the test failure<br>")
}
void function it(required string label, required function implementation) {
    try {
        writeOutput("#label#<br>")
        implementation()
        writeOutput("OK<br>")
    } catch (TestFailedException e) {
        writeOutput("Failed<br>")
    } catch (any e) {
        writeOutput("Error: #e.message#<br>")
    }
}

This is probably enough now to a) start using the framework for its own tests; b) start working on expect and toBe.


has a function expect

it("has a function expect", () => {
    expect()
})

Check. It. Out. I'm actually able to use the framework now. I run this and I get:

has a function expect
Error: Variable EXPECT is undefined.

And when I do the implementation:

function expect() {
}
has a function expect
OK

I think once I'm tidying up, I might look to move the test result to be on the same line as the label. We'll see. For now though: functionality.


expect returns a struct with a key toBe which is a function

it("expect returns a struct with a key toBe which is a function", () => {
    var result = expect()
    if (isNull(result) || isNull(result.toBe) || !isCustomFunction(local.result.toBe)) {
        throw(type="TestFailedException")
    }
})

I'd like to be able to just use !isCustomFunction(local?.result?.toBe) in that if statement there, but Lucee has a bug that prevents ?. to be used in a function expression (I am not making this up, go look at LDEV-3020). Anyway, the implementation for this for now is easy:

function expect() {
    return {toBe = () => {}}
}

toBe returns true if the actual and expected values are equal

it("toBe returns true if the actual and expected values are equal", () => {
    var actual = "TEST_VALUE"
    var expected = "TEST_VALUE"

    result = expect(actual).toBe(expected)
    if (isNull(result) || !result) {
        throw(type="TestFailedException")
    }
})

The implementation for this bit is deceptively easy:

function expect(required any actual) {
    return {toBe = (expected) => {
        return actual.equals(expected)
    }}
}

toBe throws a TestFailedException if the actual and expected values are not equal

it("toBe throws a TestFailedException if the actual and expected values are not equal", () => {
    var actual = "ACTUAL_VALUE"
    var expected = "EXPECTED_VALUE"

    try {
        expect(actual).toBe(expected)
    } catch (TestFailedException e) {
        return
    }
    throw(type="TestFailedException")
})
function expect(required any actual) {
    return {toBe = (expected) => {
        if (actual.equals(expected)) {
            return true
        }
        throw(type="TestFailedException")
    }}
}

And with that… I have a very basic testing framework. Obviously it could be improved to have more expectations (toBeTrue, toBeFalse, toBe{Type}), and could have some nice messages on the toBe function so a failure is more clear as to what went on, but for a minimum viable project, this is fine. I'm going to do a couple more tests / tweaks though.


toBe works with a variety of data types

var types = ["string", 0, 0.0, true, ["array"], {struct="struct"}, queryNew(""), xmlNew()]
types.each((type) => {
    it("works with #type.getClass().getName()#", (type) => {
        expect(type).toBe(type)
    })
})

The results here differ between Lucee:

works with java.lang.String
OK
works with java.lang.Double
OK
works with java.lang.Double
OK
works with java.lang.Boolean
OK
works with lucee.runtime.type.ArrayImpl
OK
works with lucee.runtime.type.StructImpl
OK
works with lucee.runtime.type.QueryImpl
OK
works with lucee.runtime.text.xml.struct.XMLDocumentStruct
Failed

And ColdFusion:

works with java.lang.String
OK
works with java.lang.Integer
OK
works with coldfusion.runtime.CFDouble
Failed
works with coldfusion.runtime.CFBoolean
OK
works with coldfusion.runtime.Array
OK
works with coldfusion.runtime.Struct
OK
works with coldfusion.sql.QueryTable
OK
works with org.apache.xerces.dom.DocumentImpl
Failed

But the thing is the tests actually work on both platforms. If you compare the objects outside the context of the test framework, the results are the same. Apparently in ColdFusion 0.0 does not equal itself. I will be taking this up with Adobe, I think.

It's good to know that it works OK for structs, arrays and queries though.


The it function puts the test result on the same line as the test label, separated by a colon

As I mentioned above, I'd gonna take the <br> out from between the test message and the result, instead just colon-separating them:

// The <samp>it</samp> function puts the test result on the same line as the test label, separated by a colon
savecontent variable="testOutput" {
    describe("TEST_DESCRIPTION", () => {
        it("TEST_CASE_DESCRIPTION", () => {
            writeOutput("TEST_CASE_RESULT")
        })
    })
};
if (testOutput == "TEST_DESCRIPTION<br>TEST_CASE_DESCRIPTION: TEST_CASE_RESULTOK<br>") {
    writeOutput("OK<br>")
}else{
    writeOutput("Failure: the it function did not work<br>")
}
void function it(required string label, required function implementation) {
    try {
    	writeOutput("#label#<br>")
        writeOutput("#label#: ")
        implementation()
        writeOutput("OK<br>")
    } catch (TestFailedException e) {
        writeOutput("Failed<br>")
    } catch (any e) {
        writeOutput("Error: #e.message#<br>")
    }
}

That's just a change to that line before calling implementation()


Putting it to use

I can now save the implementation part of this to a gist, save another gist with my actual tests in it, and load that into trycf.com with the setupCodeGistId param pointing to my test framework Gist: https://trycf.com/gist/b8e6322c291ba6308bd82c3ee499dd0e?setupCodeGistId=816ce84fd991c2682df612dbaf1cad11. And… (sigh of relief, cos I'd not tried this until now) it all works.


Outro

This was a long one, but a lot of it was really simple code snippets, so hopefully it doesn't overflow the brain to read it. If you'd like me to clarify anything or find any bugs or I've messed something up or whatever, let me know. Also note that whilst it'll take you a while to read, and it took me bloody ages to write, if I was just doing the actual TDD exercise it's only about an hour's effort. The red/green/refactor cycle is very short.

Oh! Speaking of implementations. I never showed the final product. It's just this:

void function describe(required string label, required function testGroup) {
    try {
        writeOutput("#label#<br>")
        testGroup()
    } catch (any e) {
        writeOutput("Error: #e.message#<br>")
    }
}

void function it(required string label, required function implementation) {
    try {
        writeOutput("#label#: ")
        implementation()
        writeOutput("OK<br>")
    } catch (TestFailedException e) {
        writeOutput("Failed<br>")
    } catch (any e) {
        writeOutput("Error: #e.message#<br>")
    }
}

struct function expect(required any actual) {
    return {toBe = (expected) => {
        if (actual.equals(expected)) {
            return true
        }
        throw(type="TestFailedException")
    }}
}

That's a working testing framework. I quite like this :-)

All the code for the entire exercise - including all the test code - can be checked-out on trycf.com.

Righto.

--
Adam

Thursday, 21 October 2021

Unit testing back-fill question: how thorough to be?

G'day:

I'm extracting this from a thread I started on the CFML Slack channel (it's not a CFML-specific question, before you browse-away from here ;-), because it's a chunk of text/content that possibly warrants preservation beyond Slack's content lifetime, plus I think the audience here (such as it is, these days), are probably more experienced than the bods on the Slack channel.

I've found myself questioning my approach to a test case I'm having to backfill in our application. I'm keen to hear what people think.

I am testing a method like this:

function testMe() {
    // irrelevant lines of code here
    
    if (someService.isItSafe()) {
        someOtherService.doThingWeDontWantToHappen()

        things = yetAnotherService.getThings()
        for (stuff in things) {
            stuff.doWhenSafe()
            if (someCondition) {
                return false
            }
        }
        moarService.doThisOnlyWhenSafe()
        
        return true
    }
    
    return false
}

That implementation is fixed for now, So I can't refactor it to make it more testable just yet. I need to play with this specific hand I've been dealt.

The case I am currently implementing is for when it's not safe (someService.isItSafe() returns false to indicate this). When it's not safe, then a subprocess doesn't run (all the stuff in the if block), and the method returns false.

I've thrown-together this sort of test:

it("doesn't do [a description of the overall subprocess] if it's not safe to do so", () => {
    someService = mockbox.createMock("someService")
    someService.$("isItSafe").$results(false)

    someOtherService = mockbox.createMock("SomeOtherService")
    someOtherService.$(method="doThingWeDontWantToHappen", callback=() => fail("this should not be called if it's not safe"))
    
    yetAnotherService = mockbox.createMock("YetAnotherService")
    moarService = mockbox.createMock("MoarService")
    
    serviceToTest = new ServiceToTest(someService, someOtherService, yetAnotherService, moarService)
    
    result = serviceToTest.testMe()
    
    expect(result).toBeFalse
})

In summary, I'm forcing isItSafe to return false, and I'm mocking the first part of the subprocess to fail if it's called. And obviously I'm also testing the return value.

That works. Exactly as written, the test passes. If I tweak it so that isItSafe returns true, then the test fails with "this should not be called if it's not safe".

However I'm left wondering if I should also mock yetAnotherService.getThings, Stuff.doWhenSafe and moarService.doThisOnlyWhenSafe to also fail if they are called. They are part of the sub-process that must not run as well (well I guess it's safe for getThings to run, but not the other two).

In a better world the subprocess code could all actually be in a method called doSubprocess, rather than inline in testMe and the answer would be to simply make sure that wasn't called. We will get there, but for now we need the testing in before we do the refactor to do so.

So the things I'm wondering about are:

  • Is it "OK" to only fail on the first part of the subprocess, or should I belt and braces the rest of it too?
  • Am I being too completist to think to fail each step, especially given they will be unreachable after the first fail?
  • I do not know the timeframe for the refactor. Ideally it's "soon", but it's not scheduled. We have just identified that this code is too critical to not have tests, so I'm backfilling the test cases. If it was "soon", I'd keep the test brief (as it is now); but as it might not be "soon" I wonder if I should go the extra mile.
  • I guess once this test is in place, if we do any more maintenance on testMe, we will look at the tests first (TDD...), and adjust them accordingly if merely checking doThingWeDontWantToHappen doesn't get called is no longer sufficient to test this case. But that's in an ideal world that we do not reside in yet, so I can't even be certain of that.

What do you think?

Righto.

--
Adam

Sunday, 5 September 2021

Testing: reasoning over "testing implementation detail" vs "testing features"

G'day:

I can't see how this article will be a very long one, as it only really dwells on one concept I had to reason through before I decided I wasn't talking horseshit.

I'm helping some community members with improving their undertanding of testing, and I was reviewing some code that was in need of some test coverage. The code below is not the actual code - for obvious "plausible deniability" reasons - but it's pretty much the same situation and logic flow we were looking at, just with different business entities and the basis for the rule has been "uncomplicated" for the sake of this article:

function validateEmail() {
    if ((isInstanceOf(this.orderItems, "ServicesCollection") || isInstanceOf(this.orderItems, "ProductsCollection")) && !isValid("email", this?.email) ) {
        var hasItemWithCost = this.orderItems.some((item) => item.cost > 0)
        if (hasItemWithCost) { // (*)
            this.addError("Valid email is required")
        }
    }
}

// (*) see AdamT's comment below. I was erroneously calling .len() on hasItemWithCost due to me not having an operational brain. Fixed.

This function is called as part of an object's validation before it's processed. It's also slightly contrived, and the only real bit is that it's a validation function. For this example the business rule is that if any items in the order have a cost associated with them; then the order also needs an email address. Say the receipt needs to be emailed or something. The real world code was different, but this is a fairly common-knowledge sort of parallel to the situation.

All the rest of the code contributing to the data being validated already existed, we're just got this new validation rule around the email not always being optional now.

We're not doing TDD here (yet!), so it's a matter of backfilling tests. The exercise was to identify what test need to be written.

We picked two obvious ones:

  • make an orderItems collection with no costs in it, and the email shouldn't be checked;
  • make an orderItems collection with at least one oneitem having a costs in it, and the email should be present and valid;

But then I figured that we've skipped a big chunk of the conditional logic: we're ignoring the check of whether the collection is one of a ServicesCollection or a ProductsCollection. NB: unfortunately the order can comprise collections of two disparate types, and there's no common interface we can check against instead. There's possibly some tech debt there.

I'm thinking we have three more cases here:

  • the collection is a ServicesCollection;
  • the collection is a ProductsCollection;
  • the collection is neither of those, somehow. This seems unlikely, but it seems it could be true.

Then I stopped and thought about whether by dissecting that if statement so closely, I am just chasing 100% line-of-code coverage. And worse: aren't the specifics of that part of the statement just implementation detail, rather than part of the actual requirement? The feature was "update the order system so that orders requiring a financial transaction require a valid email address". It doesn't say anything about what sort of collections we use to store the order.

Initially I concluded it was just implementation detail and there was no (test) case to answer to here. But it stuck in my craw that we were not testing that part of the conditional logic. I chalked that up to me being pedantic about making sure every line of code has coverage. Remember I generally use TDD when writing my code, so it's just weird to me to have conditional code with no specific tests, because the condition would only ever be written because a feature called for it.

But it kept gnawing at me.

Then the penny dropped (this was a few hours later, I have to admit).

It's not just testing implementation detail. The issue / confusion has arisen because we're not doing TDD.

If I rewind to when we were writing the function, and take a TDD approach, I'd be going "right, what's the first test case here?". The first test case - the reason why want to type in isInstanceOf(this.orderItems, "ServicesCollection") - is because part of the stated requirement is implicit. When the client said "if any items in the order have a cost associated with them…" they were not explicit about "this applies to both orders for services as well as orders for products", because in the business's vernacular the distinction between types of orders isn't one that's generally considered. For a lot of requirements there's just the concept of "orders" (this is why there should be an interface for orders!). However the fully-stated requirement could be made more accurately "if any items in an order for services have a cost associated with them…", followed by "likewise, if any items in an order for products have a cost associated with them…". It's this business information that inform our test cases, which would be better enumerated as:

  • it requires an email address if any items in a services order have an associated cost;
  • it requires an email address if any items in a products order have an associated cost;
  • it does not require an email address if the order is empty;

I had to chase up what the story was if the collection was neither of those types, and it turns out there wasn't a third type that was immune to the email address requirement, it's just - as per the nice clear test case! - the order might be empty, but the object as a whole still needs validation.

There is still "implementation detail" here - the code checking the collection type is obviously the implementation of that part of the business requirement: all code is implementation detail after all. The muddiness here arose because we were backfilling tests, rather than using TDD. Had we used TDD I'd not have been questioning myself when I was deciding on my test cases. The test cases are driven by business requirements; they implementation is driven by the test cases. And, indeed, the test implementation has to busy itself with implementation detail too, because to implement those tests we need to pass-in collections of the type the function is checking for.


As an aside, I was not happy with how busy that if condition was. It seemed to be doing too much to me, and mixing a coupla different things: type checks and a validity check. I re-reasoned the situation and concluded that the validation function has absolutely nothing to do if the email is there and is valid: this passes all cases right from the outset. So I will be recommending we separate that out into its own initial guard clause:

function validateEmail() {
    if (isValid("email", this.email)) {
        return
    }
    if ((isInstanceOf(this.orderItems, "ServicesCollection") || isInstanceOf(this.orderItems, "ProductsCollection"))) {
        var hasItemsWithCost = this.orderItems.some((item) => item.cost > 0)
        if (hasItemsWithCost.len()) {
            this.addError("Valid email is required")
        }
    }
}

I think this refactor makes things a bit easier to reason through what's going on. And note that this is a refactor, so we'll do it after we've got the tests in place and green.

I now wonder if this is something I should be looking out for more often? Is a compound if condition that is evaluating "asymmetrical" sub-conditions indicative of a code smell? Hrm… will need to keep an eye on this notion.

Now if only I could get that OrderCollection interface created, the code would be tidier-still, and closer to the business requirements.

Righto.

--
Adam

 

PS: I actually doubted myself again whilst writing this, but by the time I had written it all out, I'm happy that the line of thinking and the test cases are legit.

Tuesday, 24 August 2021

Test coverage: it's not about lines of code

G'day

A coupla days ago someone shared this Twitter status with me:

I'll just repeat the text here for google & copy-n-paste-ability:

My personal opinion: Having 100% test coverage doesn't make your code more effective. Tests should add value but at some point being overly redundant and testing absolutely every line of code is ineffective. Ex) Testing that a component renders doesn't IN MY OPINION add value.

Emma's position here is spot on, IMO. There were also a lot of "interesting" replies to it. Quelle surprise. Go read some of them, but then give up when they get too repetitive / unhelpful.

In a similar - but slightly contrary - context, I was reminded of my erstwhile article on this topic: "Yeah, you do want 100% test coverage".

But hang on: on one hand I say Emma is spot on. On the other hand I cross-post an old article that seems to contradict this. What gives?

The point of differentiation is "testing absolutely every line of code" vs "100% test coverage". Emma is talking about lines of code. I'm talking about test cases.

Don't worry about testing lines of code. That's intrinsically testing "implementation". However do care about your test cases: the variations that define the feature you are working on. you've been asked to develop a feature and its variations, and you don't know you've delivered the feature (or in the case of automated testing: continue to having been delivered in a working state) unless you test it.

Now, yeah, sure: features are made of lines of code, but don't worry about those. A trite example I posited to a mate yesterday is this: consider this to be the implementation of your feature:

doMyFeature() {
    // line 1
    // line 2
    // line 3
    // line 4
    // line 5
    // line 6
    // line 7
    // line 8
    // line 9
    // line 10
}

I challenge you to write a test for "My Feature", and not by implictation happen to test all those lines of code. But it's the feature we care about, not the lines of code.

On the other hand, let's consider a variant:

doMyFeature() {
    // line 1
    // line 2
    // line 3
    // line 4
    // line 5
    if (someVariantOfMyFeature) {
        // line 6
    }
    // line 7
    // line 8
    // line 9
    // line 10
}

If you run your test coverage analysis and see that line 6 ain't tested, this is not a case of wringing one's hands about line 6 of the code not being tested; it's that yer not testing the variation of the feature. It's the feature coverage that's missing. Not the lines-of-code coverage.

Intrinsically your code coverage analysis tooling probably marks individual lines of code as green or red or whatever, but only if you look that closely. If you zoom out a bit, you'll see that the method the lines of code are in is either green or orange or red; and out further the class is likewise green / orange / red, and probably says something like "76% coverage". The tooling necessarily needs to work in lines-of-code because its a dumb machine, and those are the units it can work on. You are the programmer don't need to focus on the lines-of-code. You have a brain, and what the report is saying is "you've not tested part of your feature", and it's saying "the bit of the feature that is represented by these lines of code". That bit. You're not testing it. Go test yer feature properly.

There's parts of the implementation of a feature I won't test maybe. Your DAOs and your other adapters for external services? Maybe not something I'd test during the TDD cycle of things, but it might still be prudent to chuck in an integration test for those. I mean they do need to work and continue to work. But I see this as possibly outside of feature testing. Maybe? Is it?

Also - now I won't repeat too much of that other article - there's a difference between "coverage" and "actually tested". Responsible devs can mark some code as "won't test" (eg in PHPUnit with a @codeCoverageIgnore), and if the reasons are legit then they're "covered" there.

Why I'm writing this fairly short article when it's kinda treading on ground I've already covered is because of some of the comments I saw in reply to Emma. Some devs are a bit shit, and a bit lazy, and they will see what Emma has written, and their take away will be "basically I don't need to test x because x is some lines of code, and we should not be focusing on lines of code for testing". That sounds daft, but I know people that have rationalised their way out of testing perfectly testable (and test-necessary) code due to "oh we should not test implementation details", or "you're just focusing on lines of code, that's wrong", and that sort of shit. Sorry pal, but a feature necessarily has an implementation, and that's made out of lines of code, so you will need to address that implementation in yer testing, which will innately report that those lines of code are "covered".

Also this notion of "100% coverage is too high, so maybe 80% is fine" (possibly using the Pareto Pinciple, poss just pulling a number of out their arses). Serious question: which 80%? If you set that goal, the a lot of devs will go "aah, it's in that 20% we don't test". Or they'll test the easy 80%, and leave the hard 20% - the bit that needs testing - to be the 20% they don't test (as I said in the other article: cos they basically can't be arsed).

Sorry to be a bit of a downer when it comes to devs' level of responsibility / professionalism / what-have-you. There are stack of devs out there who rock and just "get" what Emma (et al) is saying, and crack on with writing excellently-designed and well-tested features. But they probably didn't need Emma's messaging in the first place. I will just always think some more about advice like this, and wonder how it can be interpreted, and then be a contrarian and say "well actually…" (cringe, OK I hope I'm not actually doing that here?), and offer a different informed observation.

So my position remains: set your sights on 100% feature coverage. Use TDD, and your code will be better and leaner and you'll also likely end up with 100% LOC coverage too (but that doesn't matter; it's just a test-implementation detail ;-).

Righto.

--
Adam

Tuesday, 3 August 2021

CFML: static methods and properties

G'day:

Context

In Lucee 5.0 (so: ages ago) and ColdFusion 2021 (so: you know… just now), support for static properties and methods was added to CFML. This isn't a feature that you'd use very often, but it's essential to at least know about it. And it's bloody handy, and makes your code clearer sometimes. I had reason to be googling for docs / examples today, and what's out there ain't great so I thought I'd write my own "probably still not great" effort.

OK so that's the bar set suitably low. Let's get on with it.

What?

What are static properties and methods. I'm going to start from the other end of things. What are general properties and methods? They're properties and methods of an object; you know, one of those things one gets as the result of calling new MyComponent() (or from a createObject call if yer old skool). An object is a set of properties and methods that maintain a current state: values specific to that object.

son = new Person("Zachary")
dad = new Person("Adam")

We have two instances of the Person CFC: one for Zachary and one for me. The behaviour - ie: methods - of these Person objects is defined in Person.cfc, but when one calls an object method on a given object, it acts on the state of that specific object. So if one was to call son.getName(), one would get "Zachary"; if one was to call dad.getName(), one would get "Adam". The implementation of getName is the same for both son and dad - as defined by Person - but their implementation acts on the values associated with the object ("Zachary" and "Adam" respectively). You know all this stuff already, but it's just to contextualise things.

So what these object-based properties and methods are to an object; static properties and methods are to the class (the CFC itself). To get that, one has to have clear in one's mind that objects - instances of a class - are not the same thing as the class. Also it's important to understand that a CFC is not simply a file of source code: it still gets loaded into memory before any objects are made, and it is still a data structure in its own right, and it too can have its own properties, and methods to act on them.

We saw above examples of calling methods on an object. We can also call methods on the class itself, without needing an object. EG:

dog = Animal::createFromSpecies("canis familiaris")

Here we have an Animal class - and that is a reference to Animal.cfc, not an object created from Animal.cfc - and it has a factory method which we use to return a Dog object. Internally to createFromSpecies there might be some sort of look-up table that saying "if they ask for a 'canis familiaris', return a new Dog object"; the implementation detail doesn't matter (and I'll get to that), the important bit is that we are calling the method directly on the Animal class, not on an object. We an also reference static properties - properties that relate to the class - via the same :: syntax. I'll show a decent example of that shortly.

How?

Here's a completely contrived class that shows static syntax:

// Behaviour.cfc
component {

    static {
        static.defaultMyVarValue = "set in static constructor"
        static.myVar = static.defaultMyVarValue
    }

    static.myVar = "set in pseudo-constructor"

    public static function resetMyVar() {
        static.myVar = static.defaultMyVarValue
    }
}

There's a coupla relevant bits here.

The static constructor. This is to the class what the init method is to an object. When the class is first loaded, that static constructor is executed. It can only reference other static elements of the class. Obviously it can not access object properties or object methods, because when the static constructor is executed, it's executed on the class itself: there are no objects in play. the syntax of this looks a bit weird, and I don't know why this was picked instead of having:

public static function staticInit() {
    static.defaultMyVarValue = "set in static constructor"
    static.myVar = static.defaultMyVarValue
}

That's more clear I think, and doesn't require any new syntax. Oh well.

Note that when the class is first initialised the pseudo-constructor part of the CFC is not executed. That is only executed when a new object is created from the class.

A static method can only act on other static elements of the class, same as the static constructor, and for the same reason.

I demonstrate the behaviour of this code in its tests:

import testbox.system.BaseSpec
import cfmlInDocker.miscellaneous.static.Behaviour

component extends=BaseSpec {

    function run() {

        describe("Tests for Behaviour class", () => {
            beforeEach(() => {
                Behaviour::resetMyVar()
            })

            it("resets the test variable when resetMyVar is called", () => {
                behaviour = new Behaviour()
                expect(behaviour.myVar).toBe("set in pseudo-constructor")
                expect(Behaviour::myVar).toBe("set in pseudo-constructor")
                Behaviour::resetMyVar()
                expect(behaviour.myVar).toBe("set in static constructor")
                expect(Behaviour::myVar).toBe("set in static constructor")
            })

            it("doesn't use the pseudo-constructor for static values using class reference", () => {
                expect(Behaviour::myVar).toBe("set in static constructor")
            })

            it("does use the pseudo-constructor for static values using object reference", () => {
                behaviour = new Behaviour()
                expect(behaviour.myVar).toBe("set in pseudo-constructor")
            })
        })
    }
}

I need that resetMyVar method there just to make the testing easier. One thing to consider with static properties is that they belong to the class, and that class persists for the life of the JVM, so I want to make the initial state of the class for my tests the same each time. It's important to fully understand that when one sets a static property on a class, that property value will be there for all usages of that class property for the life of the JVM. So it persists across requests, sessions and even the lifetime of the application itself.

Why?

Properties

That Behaviour.cfc example was rubbish. It gives one no sense of why on might want to use static properties or methods. Here's a real world usage of static properties. I have a very cut down implementation of a Response class; like one might have in an MVC framework to return the value from a controller that represents what needs to be sent back to the client agent.

// Response.cfc
component {

    static {
        static.HTTP_OK = 200
        static.HTTP_NOT_FOUND = 404
    }

    public function init(required string content, numeric status=Response::HTTP_OK) {
        this.content = arguments.content
        this.status = arguments.status
    }

    public static function createFromStruct(required struct values) {
        return new Response(values.content, values.status)
    }
}

Here we are using static properties to expose some labelled values that calling code can use to create their response objects. One of its tests demonstrates this:

it("creates a 404 response", () => {
    testContent = "/bogus/path was not found"
    response = new Response(testContent, Response::HTTP_NOT_FOUND)

    expect(response.status).toBe(Response::HTTP_NOT_FOUND)
    expect(response.content).toBe(testContent)
})

Why not just use the literal 404 here? Well for common HTTP status codes like 200 and 404, I think they have ubiquity we could probably get away with. But what about a FORBIDDEN response. What status code is that? 403? Or is it 401? I can never remember, can you? So what wouldn't this be more clear, in the code:

return new Response("nuh-uh, sunshine", Response::HTTP_FORBIDDEN)

I think that's fair enough. But OK, why are they static? Why not just use this.HTTP_FORBIDDEN? Simply to show intended usage. Do HTTP status codes vary from object to object? Would one Response object have a HTTP_BAD_GATEWAY of 502, but another one having a HTTP_BAD_GATEWAY value of 702? No. These properties are not part of an object's state, which is what's implied by using the this scope (or variables scope). They are specific to the Response class; to the concept of what it is to be a Response.

Methods

That Response.cfc has a static method createFromStruct which I was going to use as an example here:

it("returns a Response object with expected values", () => {
    testValues = {
        content = "couldn't find that",
        status = Response::HTTP_NOT_FOUND
    }
    response = Response::createFromStruct(testValues)

    expect(response.status).toBe(testValues.status)
    expect(response.content).toBe(testValues.content)
})

But it occurred to me after I wrote it that in CFML one would not use this strategy: one would simply use response = new Response(argumentCollection=testValues). So I have a request class instead:

// Request.cfc
component {

    public function init(url, form) {
        this.url = arguments.url
        this.form = arguments.form
    }

    public static Request function createFromScopes() {
        return new Request(url, form)
    }
}

This is the other end of the analogy I started with an MVC controller. This is a trimmed down implementation of a Request object that one would pass to one's controller method; encapsulating all the elements of the request that was made. Here I'm only using URL and form values, but a real implementation would also include cookies, headers, CGI values etc too. All controller methods deal in the currency of "requests", so makes sense to pass a Request object into them.

Here we have a basic constructor that takes each property individually. As a rule of thumb, my default constructor always just takes individual values for each property an object might have. And the constructor implementation just assigns those values to the properties, and that's it. Any "special" way of constructing an object (like the previous example where's there's not discrete values, but one struct containing everything), I use a separate method. In this case we have a separate "factory" method that instead of taking values, just "knows" that most of the time our request will comprise the actual URL scope, and the actual form scope. So it takes those accordingly.

That's all fairly obvious, but why is it static? Well, if it wasn't static, we'd need to start with already having an object. And to create the object, we need to give the constructor some values, and that… would defeat the purpose of having the factory method, plus would also push implementation detail of the Request back into the calling code, which is not where that code belongs. We could adjust the constructor to have all the parameters optional, and then chain a call to an object method, eg:

request = new Request().populateFromScopes() 

But I don't think that's semantically as good as having the factory method. But it's still fine, that said.

My rationale for using static methods that way is that sometimes creating the object is harder than just calling its constructor, so wrap it all up in a factory method. Now static methods aren't only for factories like that. Sometimes one might need to implement some behaviour on static properties, and to do that, it generally makes sense to use a static method. One could use an object method, but then one needs to ask "is it the job of this object to be acting on properties belonging to its class?" The answer could very well be "no". So implement the code in the most appropriate way: a static method. But, again, there could be legit situations where it is the job of an object method to act upon static properties. Code in object methods can refer to static properties as much as they need to. Just consider who should be doing the work, and design one's implementation accordingly.

A note on syntax

I could not think of a better place to put this, further up.

I've shown how to call static methods and access static properties via the class, it's: TheClassName::theStaticMethodName() (or TheClassName::theStaticPropertyName). However sometimes you have an object though, and legit need to access static elements of it. In this situation use the dot operator like you usually would when accessing an object's behaviour: someObject.theStaticMethodName() (or someObject.theStaticPropertyName). As per above though, always give thought to whether you ought to be calling the method/property via the object or via the class though. It doesn't matter, but it'll make your code and your intent clearer if you use the most appropriate syntax in the given situation.

Outro

Bottom line it's pretty simple. Just as an object can have state and behaviour; so can the class the object came from. Simple as that. That's all static properties and methods are.

Righto.

--
Adam

 

Oh. There are official docs of sorts:

Wednesday, 14 July 2021

Switch refactoring on the road to polymorphism

G'day:

A while back I wrote an article "Code smells: a look at a switch statement". That deals with a specific case which was pretty tidy to start with, and wasn't completely clear as to whether the original or my refactoring was "better" (I think the initial switch was OK in this case).

Exceptions will always prove the rule as "they" say, and I'm gonna chalk that one up to a case of that. Also a good example of why sometimes it's not good to be dogmatic about rules.

However there a coupla general forms of switch statements that one can codify general refactoring advice for. Well: the main general rule is "switches are seldom necessary in your business logic code". If you find yourself writing one: stop to think if yer letting yerself down from an OOP perspective.

The other two situations I'm thinking of boil down to sticking to the Single Responsibility Principle: not munging more than one thing into a function basically. In the case of a switch, the two things are "what to do" and "how to do it". As well as when you fire a switch right into the middle of some other code, it's obvious it's breaking the SRP.

I'll start with that latter one.

Instead of this:

function f() {

    // code statements
    
    switch
    
    // code statements
}

Do this:

function f() {
    // code statements
    
    switchHandler()
    
    // code statements
}

function switchHandler() {
    switch
}

Note that switchHandler quite possibly doesn't even belong in the same class as the code using it.

In this case the first function is already probably doing more than one thing… it's got three sections, so that says "three things" to me. The whole thing should be looked at, but take the switch out completely. It is definitely and clearly doing at least one additional thing just by itself.

The second one is the other way around. Don't do this:

function f() {
    switch (expression) {

        case "A":
            // implementation of A
            break;

        case "B":
            // implementation of B
            break;

        case "C":
            // implementation of C
            break;

        default:
            // implementation of default
    }
}

Do this:

function f() {
    switch (expression) {

        case "A":
            caseAHandler()
            break;

        case "B":
            caseBHandler()
            break;

        case "C":
            caseCHandler()
            break;

        default:
            defaultHandler()
    }
}

function caseAHandler() {
    // implementation of A
}

// etc

Keep the "what to do" and the "how to do it" separate from each other: they are two different things (and the "how to do it" are each different things, so you have an 1+n degree of code smell going on there).

If you do these two things, a proper polymorphic handling of the situation - wherein you just use a type to deal with the logic - might start becoming more clear. Even if it doesn't straight away, this will make the refactoring easier when it does. You could use a factory to get the correct case handler, or if you know ahead of time what expression is, you can just pass in the correct handler in the first place. Or some other way of doing things. But even without taking it further, your code is less smelly, so is better.

Also note that if you have multiple switch statements with the same expression, then it's a clue that the whole lot should come out into its own class, and that a more OOP handling of the situation would simplify your code.

I only had 15min to write this this morning, so apologies it's a bit clipped-sounding. Must dash.

Righto.

--
Adam