Showing posts with label TDD. Show all posts
Showing posts with label TDD. Show all posts

Saturday, 30 October 2021

TDD: writing a micro testing framework, using the framework to test itself as I build it

G'day:

Being back in CFML land, I spend a lot of time using trycf.com to write sample code. Sometimes I just want to see how ColdFusion and Lucee behave differently in a given situation. Sometimes I'm "helping" someone on the CFML Slack channel, and want to give them a runnable example of what I think they might want to be doing. trycf.com is cool, but I usually want to write my example code as a series of tests demonstrating variants of its behaviour. I also generally want to TDD even sample code that I write, so I know I'm staying on-point and the code works without any surprises. trycf.com is a bit basic, one can only really have one CFML script and run that. One cannot import other libraries like TestBox so I can write my code the way I want.

Quite often I write a little stub tester to help me, and I find myself doing this over and over again, and I never think to save these stubs across examples. Plus having to include the stubs along with the sample code I'm writing clutters things a bit, and also means my example code doesn't really stick to the Single Responsibility Principle.

I found out from Abram - owner of trycf.com - that one can specify a setupCodeGistId in the trycf.com URL, and it will load a CFML script from that and pre-include that before the scratch-pad code. This is bloody handy, and armed with this knowledge I decided I was gonna knock together a minimal single-file testing framework which I could include transparently in with a scratch-pad file, so that scratch-pad file could focus on the sample code I'm writing, and the testing of the same.

A slightly hare-brained idea I have had when doing this is to TDD the whole exercise… and using the test framework to test itself as it evolves. Obviously to start with the test will need to be separate pieces of code before the framework is usable, but beyond a point it should work enough for me to be able to use it to test the latter stages of its development. Well: we'll see, anyhow.

Another challenge I am setting for myself is that I'm gonna mimic the "syntax" of TestBox, so that the tests I write in a scratch-pad should be able to be lifted-out as-is and dumped into a TestBox test spec CFC. Obviously I'm not going to reimplement all of TestBox, just a subset of stuff to be able to do simple tests. I think I will need to implement the following functions:

  • void run() - well, like in TestBox all my tests will be implemented in a function called run, and then one runs that to execute the tests. This is just convention rather than needing any development.
  • void describe(required string label, required function testGroup) - this will output its label and call its testGroup.
  • void it(required string label, required function implementation) - this will output its label and call its implementation. I'm pretty sure the implementation of describe and it will be identical. I mean like I will use two references to the same function to implement this.
  • struct expect(required any actual) - this will take the actual value being tested, and return a struct containing a matcher to that actual value.
  • boolean toBe(required any expected) - this will take the value that the actual value passed to expect should be. This will be as a key in the struct returned by expect (this will emulate the function-chaining TestBox uses with expect(x).toBe(y).

If I create that minimalist implementation, then I will be able to write this much of a test suite in a trycf.com scratch pad:

function myFunctionToTest(x) {
    // etc
}

function run() {
    describe("Tests for myFunctionToTest" ,() => {
        it("tests some variant", () => {
            expect(myFunctionToTest("variant a")).toBe("something")
        })

        it("tests some other variant", () => {
            expect(myFunctionToTest("variant b")).toBe("something else")
        })
    })
}

run()

And everything in that run function will be compatible with TestBox.

I am going to show every single iteration of the process here, to demonstrate TDD in action. This means this article will be long, and it will be code-heavy. And it will have a lot of repetition. I'll try to keep my verbiage minimal, so if I think the iteration of the code speaks for itself, I possibly won't have anything to add.

Let's get on with it.


It runs the tests via a function "run"

<cfscript>
// tests    
try {
    run()
    writeOutput("OK")
} catch (any e) {
    writeOutput("run function not found")
}
</cfscript>

<cfscript>
// implementation    
    
</cfscript>

I am going to do this entire implementation in a trycf.com scratch-pad file. As per above, I have two code blocks: tests and implementation. For each step I will show you the test (or updates to existing tests), and then I will show you the implementation. We can take it as a given that the test will fail in the way I expect (which will be obvious from the code), unless I state otherwise. As a convention a test will output "OK" if it passed, or "Failure: " and some error explanation if not. Later these will be baked into the framework code, but for now, it's hand-cranked. This shows that you don't need any framework to design your code via TDD: it's a practice, it's not a piece of software.

When I run the code above, I get "Failure: run function not found". This is obviously because the run function doesn't even exist yet. Let's address that.

// implementation    
void function run(){
}

Results: OK

We have completed one iteration of red/green. There is nothing to refactor yet. Onto the next iteration.


It has a function describe

We already have some of our framework operational. We can put tests into that run function, and they'll be run: that's about all that function needs to do. Hey I didn't say this framework was gonna be complicated. In fact the aim is for it to be the exact opposite of complicated.

// tests
// ...

void function run(){
    try {
        describe()
        writeOutput("OK")
    } catch (any e) {
        writeOutput("Failure: describe function not found")
    }
}
// implementation
void function describe(){
}

The tests already helped me here actually. In my initial effort at implementing describe, I misspelt it as "desribe". Took me a few seconds to spot why the test failed. I presume, like me, you have found a function with a spelling mistake in it that has escaped into production. I was saved that here, by having to manually type the name of the function into the test before I did the first implementation.


describe takes a string parameter label which is displayed on a line by itself as a label for the test grouping

// tests
// ...
savecontent variable="testOutput" {
    describe("TEST_DESCRIPTION")
};
if (testOutput == "TEST_DESCRIPTION<br>") {
    writeOutput("OK<br>")
    return
}
writeOutput("Failure: label not output<br>")
// implementation
void function describe(required string label){
    writeOutput("#label#<br>")
}

This implementation passes its own test (good), but it makes the previous test break:

try {
    describe()
    writeOutput("OK")
} catch (any e) {
    writeOutput("Failure: describe function not found")
}

We require describe to take an argument now, so we need to update that test slightly:

try {
    describe("NOT_TESTED")
    writeOutput("OK")
} catch (any e) {
    writeOutput("Failure: describe function not found")
}

I make a point of being very clear when arguments etc I need to use are not part of the test.

It's also worth noting that the test output is a bit of a mess at the moment:

OKNOT_TESTED
OKOK

For the sake of cosmetics, I've gone through and put <br> tags on all the test messages I'm outputting, and I've also slapped a cfsilent around that first describe test, as its output is just clutter. The full implementation is currently:

// tests    
try {
    cfsilent() {
        run()
    }
    writeOutput("OK<br>")
} catch (any e) {
    writeOutput("Failure: run function not found<br>")
}

void function run(){
    try {
        cfsilent(){describe("NOT_TESTED")}
        writeOutput("OK<br>")
    } catch (any e) {
        writeOutput("Failure: describe function not found<br>")
    }

    savecontent variable="testOutput" {
        describe("TEST_DESCRIPTION")
    };
    if (testOutput == "TEST_DESCRIPTION<br>") {
        writeOutput("OK<br>")
    }else{
    	writeOutput("Failure: label not output<br>")
    }
}

run()
</cfscript>

<cfscript>
// implementation
void function describe(required string label){
    writeOutput("#label#<br>")
}

And now the output is tidier:

OK
OK
OK

I actually did a double-take here, wondering why the TEST_DESCRIPTION message was not displaying for that second test: the one where I'm actually testing that that message displays. Of course it's because I've got the savecontent around it, so I'm capturing the output, not letting it output. Duh.


describe takes a callback parameter testGroup which is is executed after the description is displayed

savecontent variable="testOutput" {
    describe("TEST_DESCRIPTION", () => {
        writeOutput("DESCRIBE_GROUP_OUTPUT")
    })
};
if (testOutput == "TEST_DESCRIPTION<br>DESCRIBE_GROUP_OUTPUT") {
    writeOutput("OK<br>")
}else{
    writeOutput("Failure: testGroup not executed<br>")
}
void function describe(required string label, required function testGroup){
    writeOutput("#label#<br>")
    testGroup()
}

This implementation change broke earlier tests that called describe, because they were not passing a testGroup argument. I've updated those to just slung an empty callback into those calls, eg:

describe("TEST_DESCRIPTION", () => {})

That's all I really need from describe, really. But I'll just do one last test to confirm that it can be nested OK (there's no reason why it couldn't be, but I'm gonna check anyways.


describe calls can be nested

savecontent variable="testOutput" {
    describe("OUTER_TEST_DESCRIPTION", () => {
        describe("INNER_TEST_DESCRIPTION", () => {
            writeOutput("DESCRIBE_GROUP_OUTPUT")
        })
    })
};
if (testOutput == "OUTER_TEST_DESCRIPTION<br>INNER_TEST_DESCRIPTION<br>DESCRIBE_GROUP_OUTPUT") {
    writeOutput("OK<br>")
}else{
    writeOutput("Failure: describe nested did not work<br>")
}

And this passes without any adjustment to the implementation, as I expected. Note that it's OK to write a test that just passes, if there's not then a need to make an implementation change to make it pass. One only needs a failing test before one makes an implementation change. Remember the tests are to test that the implementation change does what it's supposed to. This test here is just a belt and braces thing, and to be declarative about some functionality the system has.


The it function behaves the same way as the describe function

it will end up having more required functionality than describe needs, but there's no reason for them to not just be aliases of each other for the purposes of this micro-test-framework. As long as the describe alias still passes all its tests, it doesn't matter what extra functionality I put into it to accommodate the requirements of it.

savecontent variable="testOutput" {
    describe("TEST_DESCRIPTION", () => {
        it("TEST_CASE_DESCRIPTION", () => {
            writeOutput("TEST_CASE_RESULT")
        })
    })
};
if (testOutput == "TEST_DESCRIPTION<br>TEST_CASE_DESCRIPTION<br>TEST_CASE_RESULT") {
    writeOutput("OK<br>")
}else{
    writeOutput("Failure: the it function did not work<br>")
}
it = describe

it will not error-out if an exception occurs in its callback, instead reporting an error result, with the exception's message

Tests are intrinsically the sort of thing that might break, so we can't have the test run stopping just cos an exception occurs.

savecontent variable="testOutput" {
    describe("NOT_TESTED", () => {
        it("tests an exception", () => {
            throw "EXCEPTION_MESSAGE";
        })
    })
};
if (testOutput CONTAINS "tests an exception<br>Error: EXCEPTION_MESSAGE<br>") {
    writeOutput("OK<br>")
}else{
    writeOutput("Failure: the it function did not correctly report the test error<br>")
}
void function describe(required string label, required function testGroup) {
    try {
        writeOutput("#label#<br>")
        testGroup()
    } catch (any e) {
        writeOutput("Error: #e.message#<br>")
    }
}

At this point I decided I didn't like how I was just using describe to implement it, so I refactored:

void function runLabelledCallback(required string label, required function callback) {
    try {
        writeOutput("#label#<br>")
        callback()
    } catch (any e) {
        writeOutput("Error: #e.message#<br>")
    }
}

void function describe(required string label, required function testGroup) {
    runLabelledCallback(label, testGroup)
}

void function it(required string label, required function implementation) {
    runLabelledCallback(label, implementation)
}

Because everything is test-covered, I am completely safe to just make that change. Now I have the correct function signature on describe and it, and an appropriately "general" function to do the actual execution. And the tests all still pass, so that's cool. Reminder: refactoring doesn't need to start with a failing test. Intrinsically it's an activity that mustn't impact the behaviour of the code being refactored, otherwise it's not a refactor. Also note that one does not ever alter implementation and refactor at the same time. Follow red / green / refactor as separate steps.


it outputs "OK" if the test ran correctly

// <samp>it</samp> outputs OK if the test ran correctly
savecontent variable="testOutput" {
    describe("NOT_TESTED", () => {
        it("outputs OK if the test ran correctly", () => {
            // some test here... this actually starts to demonstrate an issue with the implementation, but we'll get to that
        })
    })
};
if (testOutput CONTAINS "outputs OK if the test ran correctly<br>OK<br>") {
    writeOutput("OK<br>")
}else{
    writeOutput("Failure: the it function did not correctly report the test error<br>")
}

The implementation of this demonstrated that describe and it can't share an implementation. I don't want describe calls outputting "OK" when their callback runs OK, and this was what started happening when I did my first pass of the implementation for this:

void function runLabelledCallback(required string label, required function callback) {
    try {
        writeOutput("#label#<br>")
        callback()
        writeOutput("OK<br>")
    } catch (any e) {
        writeOutput("Error: #e.message#<br>")
    }
}

Only it is a test, and only it is supposed to say "OK". This is an example of premature refactoring, and probably going overboard with DRY. The describe function wasn't complex, so we were gaining nothing by de-duping it from it, especially when it's implementation will have more complexity still to come.

I backed out my implementation and my failing test for a moment, and did another refactor to separate-out the two functions completely. I know I'm good when all my tests still pass.

void function describe(required string label, required function testGroup) {
    writeOutput("#label#<br>")
    testGroup()
}

void function it(required string label, required function implementation) {
    try {
        writeOutput("#label#<br>")
        implementation()
    } catch (any e) {
        writeOutput("Error: #e.message#<br>")
    }
}

I also needed to update the baseline it test to expect the "OK<br>". After that everything is green again, and I can bring my new test back (failure as expected), and implementation (all green again/still):

void function it(required string label, required function implementation) {
    try {
        writeOutput("#label#<br>")
        implementation()
        writeOutput("OK<br>")
    } catch (any e) {
        writeOutput("Error: #e.message#<br>")
    }
}

describe will not error-out if an exception occurs in its callback, instead reporting an error result, with the exception's message

Back to describe again. It too needs to not error-out if there's an issue with its callback, so we're going to put in some handling for that again too. This test is largely copy and paste from the equivalent it test:

savecontent variable="testOutput" {
    describe("TEST_DESCRIPTION", () => {
        throw "EXCEPTION_MESSAGE";
    })
};
if (testOutput CONTAINS "TEST_DESCRIPTION<br>Error: EXCEPTION_MESSAGE<br>") {
    writeOutput("OK<br>")
}else{
    writeOutput("Failure: the it function did not correctly report the test error<br>")
}
void function describe(required string label, required function testGroup) {
    try {
        writeOutput("#label#<br>")
        testGroup()
    } catch (any e) {
        writeOutput("Error: #e.message#<br>")
    }
}

it outputs "Failed" if the test failed

This took some thought. How do I know a test "failed"? In TestBox when something like expect(true).toBeFalse() runs, it exits from the test immediately, and does not run any further expectations the test might have. Clearly it's throwing an exception from toBeFalse if the actual value (the one passed to expect) isn't false. So this is what I need to do. I have not written any assertions or expectations yet, so for now I'll just test with an actual exception. It can't be any old exception, because the test could break for any other reason too, so I need to differentiate between a test fail exception (the test has failed), and any other sort of exception (the test errored). I'll use a TestFailedException!

savecontent variable="testOutput" {
    describe("NOT_TESTED", () => {
        it("outputs failed if the test failed", () => {
            throw(type="TestFailedException");
        })
    })
};
if (testOutput.reFind("Failed<br>$")) {
    writeOutput("OK<br>")
}else{
    writeOutput("Failure: the it function did not correctly report the test failure<br>")
}
void function it(required string label, required function implementation) {
    try {
        writeOutput("#label#<br>")
        implementation()
        writeOutput("OK<br>")
    } catch (TestFailedException e) {
        writeOutput("Failed<br>")
    } catch (any e) {
        writeOutput("Error: #e.message#<br>")
    }
}

This is probably enough now to a) start using the framework for its own tests; b) start working on expect and toBe.


has a function expect

it("has a function expect", () => {
    expect()
})

Check. It. Out. I'm actually able to use the framework now. I run this and I get:

has a function expect
Error: Variable EXPECT is undefined.

And when I do the implementation:

function expect() {
}
has a function expect
OK

I think once I'm tidying up, I might look to move the test result to be on the same line as the label. We'll see. For now though: functionality.


expect returns a struct with a key toBe which is a function

it("expect returns a struct with a key toBe which is a function", () => {
    var result = expect()
    if (isNull(result) || isNull(result.toBe) || !isCustomFunction(local.result.toBe)) {
        throw(type="TestFailedException")
    }
})

I'd like to be able to just use !isCustomFunction(local?.result?.toBe) in that if statement there, but Lucee has a bug that prevents ?. to be used in a function expression (I am not making this up, go look at LDEV-3020). Anyway, the implementation for this for now is easy:

function expect() {
    return {toBe = () => {}}
}

toBe returns true if the actual and expected values are equal

it("toBe returns true if the actual and expected values are equal", () => {
    var actual = "TEST_VALUE"
    var expected = "TEST_VALUE"

    result = expect(actual).toBe(expected)
    if (isNull(result) || !result) {
        throw(type="TestFailedException")
    }
})

The implementation for this bit is deceptively easy:

function expect(required any actual) {
    return {toBe = (expected) => {
        return actual.equals(expected)
    }}
}

toBe throws a TestFailedException if the actual and expected values are not equal

it("toBe throws a TestFailedException if the actual and expected values are not equal", () => {
    var actual = "ACTUAL_VALUE"
    var expected = "EXPECTED_VALUE"

    try {
        expect(actual).toBe(expected)
    } catch (TestFailedException e) {
        return
    }
    throw(type="TestFailedException")
})
function expect(required any actual) {
    return {toBe = (expected) => {
        if (actual.equals(expected)) {
            return true
        }
        throw(type="TestFailedException")
    }}
}

And with that… I have a very basic testing framework. Obviously it could be improved to have more expectations (toBeTrue, toBeFalse, toBe{Type}), and could have some nice messages on the toBe function so a failure is more clear as to what went on, but for a minimum viable project, this is fine. I'm going to do a couple more tests / tweaks though.


toBe works with a variety of data types

var types = ["string", 0, 0.0, true, ["array"], {struct="struct"}, queryNew(""), xmlNew()]
types.each((type) => {
    it("works with #type.getClass().getName()#", (type) => {
        expect(type).toBe(type)
    })
})

The results here differ between Lucee:

works with java.lang.String
OK
works with java.lang.Double
OK
works with java.lang.Double
OK
works with java.lang.Boolean
OK
works with lucee.runtime.type.ArrayImpl
OK
works with lucee.runtime.type.StructImpl
OK
works with lucee.runtime.type.QueryImpl
OK
works with lucee.runtime.text.xml.struct.XMLDocumentStruct
Failed

And ColdFusion:

works with java.lang.String
OK
works with java.lang.Integer
OK
works with coldfusion.runtime.CFDouble
Failed
works with coldfusion.runtime.CFBoolean
OK
works with coldfusion.runtime.Array
OK
works with coldfusion.runtime.Struct
OK
works with coldfusion.sql.QueryTable
OK
works with org.apache.xerces.dom.DocumentImpl
Failed

But the thing is the tests actually work on both platforms. If you compare the objects outside the context of the test framework, the results are the same. Apparently in ColdFusion 0.0 does not equal itself. I will be taking this up with Adobe, I think.

It's good to know that it works OK for structs, arrays and queries though.


The it function puts the test result on the same line as the test label, separated by a colon

As I mentioned above, I'd gonna take the <br> out from between the test message and the result, instead just colon-separating them:

// The <samp>it</samp> function puts the test result on the same line as the test label, separated by a colon
savecontent variable="testOutput" {
    describe("TEST_DESCRIPTION", () => {
        it("TEST_CASE_DESCRIPTION", () => {
            writeOutput("TEST_CASE_RESULT")
        })
    })
};
if (testOutput == "TEST_DESCRIPTION<br>TEST_CASE_DESCRIPTION: TEST_CASE_RESULTOK<br>") {
    writeOutput("OK<br>")
}else{
    writeOutput("Failure: the it function did not work<br>")
}
void function it(required string label, required function implementation) {
    try {
    	writeOutput("#label#<br>")
        writeOutput("#label#: ")
        implementation()
        writeOutput("OK<br>")
    } catch (TestFailedException e) {
        writeOutput("Failed<br>")
    } catch (any e) {
        writeOutput("Error: #e.message#<br>")
    }
}

That's just a change to that line before calling implementation()


Putting it to use

I can now save the implementation part of this to a gist, save another gist with my actual tests in it, and load that into trycf.com with the setupCodeGistId param pointing to my test framework Gist: https://trycf.com/gist/b8e6322c291ba6308bd82c3ee499dd0e?setupCodeGistId=816ce84fd991c2682df612dbaf1cad11. And… (sigh of relief, cos I'd not tried this until now) it all works.


Outro

This was a long one, but a lot of it was really simple code snippets, so hopefully it doesn't overflow the brain to read it. If you'd like me to clarify anything or find any bugs or I've messed something up or whatever, let me know. Also note that whilst it'll take you a while to read, and it took me bloody ages to write, if I was just doing the actual TDD exercise it's only about an hour's effort. The red/green/refactor cycle is very short.

Oh! Speaking of implementations. I never showed the final product. It's just this:

void function describe(required string label, required function testGroup) {
    try {
        writeOutput("#label#<br>")
        testGroup()
    } catch (any e) {
        writeOutput("Error: #e.message#<br>")
    }
}

void function it(required string label, required function implementation) {
    try {
        writeOutput("#label#: ")
        implementation()
        writeOutput("OK<br>")
    } catch (TestFailedException e) {
        writeOutput("Failed<br>")
    } catch (any e) {
        writeOutput("Error: #e.message#<br>")
    }
}

struct function expect(required any actual) {
    return {toBe = (expected) => {
        if (actual.equals(expected)) {
            return true
        }
        throw(type="TestFailedException")
    }}
}

That's a working testing framework. I quite like this :-)

All the code for the entire exercise - including all the test code - can be checked-out on trycf.com.

Righto.

--
Adam

Sunday, 5 September 2021

Testing: reasoning over "testing implementation detail" vs "testing features"

G'day:

I can't see how this article will be a very long one, as it only really dwells on one concept I had to reason through before I decided I wasn't talking horseshit.

I'm helping some community members with improving their undertanding of testing, and I was reviewing some code that was in need of some test coverage. The code below is not the actual code - for obvious "plausible deniability" reasons - but it's pretty much the same situation and logic flow we were looking at, just with different business entities and the basis for the rule has been "uncomplicated" for the sake of this article:

function validateEmail() {
    if ((isInstanceOf(this.orderItems, "ServicesCollection") || isInstanceOf(this.orderItems, "ProductsCollection")) && !isValid("email", this?.email) ) {
        var hasItemWithCost = this.orderItems.some((item) => item.cost > 0)
        if (hasItemWithCost) { // (*)
            this.addError("Valid email is required")
        }
    }
}

// (*) see AdamT's comment below. I was erroneously calling .len() on hasItemWithCost due to me not having an operational brain. Fixed.

This function is called as part of an object's validation before it's processed. It's also slightly contrived, and the only real bit is that it's a validation function. For this example the business rule is that if any items in the order have a cost associated with them; then the order also needs an email address. Say the receipt needs to be emailed or something. The real world code was different, but this is a fairly common-knowledge sort of parallel to the situation.

All the rest of the code contributing to the data being validated already existed, we're just got this new validation rule around the email not always being optional now.

We're not doing TDD here (yet!), so it's a matter of backfilling tests. The exercise was to identify what test need to be written.

We picked two obvious ones:

  • make an orderItems collection with no costs in it, and the email shouldn't be checked;
  • make an orderItems collection with at least one oneitem having a costs in it, and the email should be present and valid;

But then I figured that we've skipped a big chunk of the conditional logic: we're ignoring the check of whether the collection is one of a ServicesCollection or a ProductsCollection. NB: unfortunately the order can comprise collections of two disparate types, and there's no common interface we can check against instead. There's possibly some tech debt there.

I'm thinking we have three more cases here:

  • the collection is a ServicesCollection;
  • the collection is a ProductsCollection;
  • the collection is neither of those, somehow. This seems unlikely, but it seems it could be true.

Then I stopped and thought about whether by dissecting that if statement so closely, I am just chasing 100% line-of-code coverage. And worse: aren't the specifics of that part of the statement just implementation detail, rather than part of the actual requirement? The feature was "update the order system so that orders requiring a financial transaction require a valid email address". It doesn't say anything about what sort of collections we use to store the order.

Initially I concluded it was just implementation detail and there was no (test) case to answer to here. But it stuck in my craw that we were not testing that part of the conditional logic. I chalked that up to me being pedantic about making sure every line of code has coverage. Remember I generally use TDD when writing my code, so it's just weird to me to have conditional code with no specific tests, because the condition would only ever be written because a feature called for it.

But it kept gnawing at me.

Then the penny dropped (this was a few hours later, I have to admit).

It's not just testing implementation detail. The issue / confusion has arisen because we're not doing TDD.

If I rewind to when we were writing the function, and take a TDD approach, I'd be going "right, what's the first test case here?". The first test case - the reason why want to type in isInstanceOf(this.orderItems, "ServicesCollection") - is because part of the stated requirement is implicit. When the client said "if any items in the order have a cost associated with them…" they were not explicit about "this applies to both orders for services as well as orders for products", because in the business's vernacular the distinction between types of orders isn't one that's generally considered. For a lot of requirements there's just the concept of "orders" (this is why there should be an interface for orders!). However the fully-stated requirement could be made more accurately "if any items in an order for services have a cost associated with them…", followed by "likewise, if any items in an order for products have a cost associated with them…". It's this business information that inform our test cases, which would be better enumerated as:

  • it requires an email address if any items in a services order have an associated cost;
  • it requires an email address if any items in a products order have an associated cost;
  • it does not require an email address if the order is empty;

I had to chase up what the story was if the collection was neither of those types, and it turns out there wasn't a third type that was immune to the email address requirement, it's just - as per the nice clear test case! - the order might be empty, but the object as a whole still needs validation.

There is still "implementation detail" here - the code checking the collection type is obviously the implementation of that part of the business requirement: all code is implementation detail after all. The muddiness here arose because we were backfilling tests, rather than using TDD. Had we used TDD I'd not have been questioning myself when I was deciding on my test cases. The test cases are driven by business requirements; they implementation is driven by the test cases. And, indeed, the test implementation has to busy itself with implementation detail too, because to implement those tests we need to pass-in collections of the type the function is checking for.


As an aside, I was not happy with how busy that if condition was. It seemed to be doing too much to me, and mixing a coupla different things: type checks and a validity check. I re-reasoned the situation and concluded that the validation function has absolutely nothing to do if the email is there and is valid: this passes all cases right from the outset. So I will be recommending we separate that out into its own initial guard clause:

function validateEmail() {
    if (isValid("email", this.email)) {
        return
    }
    if ((isInstanceOf(this.orderItems, "ServicesCollection") || isInstanceOf(this.orderItems, "ProductsCollection"))) {
        var hasItemsWithCost = this.orderItems.some((item) => item.cost > 0)
        if (hasItemsWithCost.len()) {
            this.addError("Valid email is required")
        }
    }
}

I think this refactor makes things a bit easier to reason through what's going on. And note that this is a refactor, so we'll do it after we've got the tests in place and green.

I now wonder if this is something I should be looking out for more often? Is a compound if condition that is evaluating "asymmetrical" sub-conditions indicative of a code smell? Hrm… will need to keep an eye on this notion.

Now if only I could get that OrderCollection interface created, the code would be tidier-still, and closer to the business requirements.

Righto.

--
Adam

 

PS: I actually doubted myself again whilst writing this, but by the time I had written it all out, I'm happy that the line of thinking and the test cases are legit.

Tuesday, 24 August 2021

Test coverage: it's not about lines of code

G'day

A coupla days ago someone shared this Twitter status with me:

I'll just repeat the text here for google & copy-n-paste-ability:

My personal opinion: Having 100% test coverage doesn't make your code more effective. Tests should add value but at some point being overly redundant and testing absolutely every line of code is ineffective. Ex) Testing that a component renders doesn't IN MY OPINION add value.

Emma's position here is spot on, IMO. There were also a lot of "interesting" replies to it. Quelle surprise. Go read some of them, but then give up when they get too repetitive / unhelpful.

In a similar - but slightly contrary - context, I was reminded of my erstwhile article on this topic: "Yeah, you do want 100% test coverage".

But hang on: on one hand I say Emma is spot on. On the other hand I cross-post an old article that seems to contradict this. What gives?

The point of differentiation is "testing absolutely every line of code" vs "100% test coverage". Emma is talking about lines of code. I'm talking about test cases.

Don't worry about testing lines of code. That's intrinsically testing "implementation". However do care about your test cases: the variations that define the feature you are working on. you've been asked to develop a feature and its variations, and you don't know you've delivered the feature (or in the case of automated testing: continue to having been delivered in a working state) unless you test it.

Now, yeah, sure: features are made of lines of code, but don't worry about those. A trite example I posited to a mate yesterday is this: consider this to be the implementation of your feature:

doMyFeature() {
    // line 1
    // line 2
    // line 3
    // line 4
    // line 5
    // line 6
    // line 7
    // line 8
    // line 9
    // line 10
}

I challenge you to write a test for "My Feature", and not by implictation happen to test all those lines of code. But it's the feature we care about, not the lines of code.

On the other hand, let's consider a variant:

doMyFeature() {
    // line 1
    // line 2
    // line 3
    // line 4
    // line 5
    if (someVariantOfMyFeature) {
        // line 6
    }
    // line 7
    // line 8
    // line 9
    // line 10
}

If you run your test coverage analysis and see that line 6 ain't tested, this is not a case of wringing one's hands about line 6 of the code not being tested; it's that yer not testing the variation of the feature. It's the feature coverage that's missing. Not the lines-of-code coverage.

Intrinsically your code coverage analysis tooling probably marks individual lines of code as green or red or whatever, but only if you look that closely. If you zoom out a bit, you'll see that the method the lines of code are in is either green or orange or red; and out further the class is likewise green / orange / red, and probably says something like "76% coverage". The tooling necessarily needs to work in lines-of-code because its a dumb machine, and those are the units it can work on. You are the programmer don't need to focus on the lines-of-code. You have a brain, and what the report is saying is "you've not tested part of your feature", and it's saying "the bit of the feature that is represented by these lines of code". That bit. You're not testing it. Go test yer feature properly.

There's parts of the implementation of a feature I won't test maybe. Your DAOs and your other adapters for external services? Maybe not something I'd test during the TDD cycle of things, but it might still be prudent to chuck in an integration test for those. I mean they do need to work and continue to work. But I see this as possibly outside of feature testing. Maybe? Is it?

Also - now I won't repeat too much of that other article - there's a difference between "coverage" and "actually tested". Responsible devs can mark some code as "won't test" (eg in PHPUnit with a @codeCoverageIgnore), and if the reasons are legit then they're "covered" there.

Why I'm writing this fairly short article when it's kinda treading on ground I've already covered is because of some of the comments I saw in reply to Emma. Some devs are a bit shit, and a bit lazy, and they will see what Emma has written, and their take away will be "basically I don't need to test x because x is some lines of code, and we should not be focusing on lines of code for testing". That sounds daft, but I know people that have rationalised their way out of testing perfectly testable (and test-necessary) code due to "oh we should not test implementation details", or "you're just focusing on lines of code, that's wrong", and that sort of shit. Sorry pal, but a feature necessarily has an implementation, and that's made out of lines of code, so you will need to address that implementation in yer testing, which will innately report that those lines of code are "covered".

Also this notion of "100% coverage is too high, so maybe 80% is fine" (possibly using the Pareto Pinciple, poss just pulling a number of out their arses). Serious question: which 80%? If you set that goal, the a lot of devs will go "aah, it's in that 20% we don't test". Or they'll test the easy 80%, and leave the hard 20% - the bit that needs testing - to be the 20% they don't test (as I said in the other article: cos they basically can't be arsed).

Sorry to be a bit of a downer when it comes to devs' level of responsibility / professionalism / what-have-you. There are stack of devs out there who rock and just "get" what Emma (et al) is saying, and crack on with writing excellently-designed and well-tested features. But they probably didn't need Emma's messaging in the first place. I will just always think some more about advice like this, and wonder how it can be interpreted, and then be a contrarian and say "well actually…" (cringe, OK I hope I'm not actually doing that here?), and offer a different informed observation.

So my position remains: set your sights on 100% feature coverage. Use TDD, and your code will be better and leaner and you'll also likely end up with 100% LOC coverage too (but that doesn't matter; it's just a test-implementation detail ;-).

Righto.

--
Adam

Tuesday, 6 July 2021

TDD, code puzzle, recursing reductions

G'day:

I couldn't think of a title for this one, so: that'll do.

A few days ago Tom King posted this on the CFML Slack channel:

I'm sure I've asked this before, but has anyone got a good function to take complex values and parse out to a curl style string? i.e, https://trycf.com/gist/0c2672ac4a02d4c52d1e3ee1c9a408e7/lucee5?theme=monokai
The code in the gist is thus:
original = {
    'payment_method_types' = ['card'],
    'line_items' = [
        {
            "price_data" = {
                "product_data" = {
                    "name" = "Something"
                },
                "currency" = 'gbp',
                "unit_amount" = 123
            },
            "quantity" = 1
        }
    ]
};

writeDump(var=original, label="Original");

wantedResult["payment_method_types[0]"] = 'card';
wantedResult["line_items[0][price_data][product_data][name]"] ='Something';
wantedResult["line_items[0][price_data][currency]"] = 'gbp';
wantedResult["line_items[0][price_data][unit_amount]"] = 123;
wantedResult["line_items[0][quantity]"] = 1;

writeDump(var=wantedResult, label="Want this:");

So he wants to flatten out the key-path to the values. Cool. I'll give that a bash. Note: we've actually already solved this, but I'm now going back over the evolution of my solution, following my TDD steps. This is more a demo of using TDD than it is solving the problem: don't worry about how much code their is in this article, it's mostly incremental repetition, or just largely copy-and-pasted-and-tweaked tests. I will admit that for my first take on this I just wrote the one test that compared my result to his final expectation, and it took me far more time to sort it out than it ought to. I did the whole thing again last night to prep the code for this article, and using more focused TDD I solved it faster. Obviously I was benefiting from already kinda remembering how I did it the first time around, but even the first time around I could see how to do it before I even wrote any code, I just pantsed around making mistakes the first time due to not focusing. TDD on my second attempt helped me focus.

Last things first: the final test

I've got the "client's" final requirement already, so I'll create a test for that now, and watch my code error because it doesn't exist (my code, that is). I'll link the test case description through to the current state of the code in Github. I'll tag each step.

it("fulfils the requirement of the puzzle", () => {
    input = {
        "payment_method_types" = ["card"],
        "line_items" = [
            {
                "price_data" = {
                    "product_data" = {
                        "name" = "Something"
                    },
                    "currency" = "gbp",
                    "unit_amount" = 123
                },
                "quantity" = 1
            }
        ]
    }
    expected = {
        "payment_method_types[0]" = "card",
        "line_items[0][price_data][product_data][name]" = "Something",
        "line_items[0][price_data][currency]" = "gbp",
        "line_items[0][price_data][unit_amount]" = 123,
        "line_items[0][quantity]" = 1
    }

    actual = flattenStruct(input)

    expect(actual).toBe(expected)
})

This errors with No matching function [flattenStruct] found, because I ain't even started yet! Note: for this exercise, I'm just building the function in the test class itself, given it's just one function, and it only needs to exist to fulfil this blogging/TDD exercise.

This test will keep erroring until I've finished. This is fine. I am not going to look at that test code again until I have finished the task though.

It handles simple value key/value pairs

I'm not going to be crazily pendantic with the TDD here. For example in this case I'm rolling in "creating the function and solving the first step of the challenge" into one step. I've already got a test failing due to the lack of the function existing, so I can watch it's progress for some of the micro-iterations I am missing out. So for this step I'm gonna iterate over the struct, and deal with the really easy cases: top level key/value pairs that are simple values and do not require any recursion to work. This is also the "bottom" level of the recursion I'll move on to, so know I already need it anyhow. Maybe making the decision that I need to use recursion to sort this out is pre-emptive, but I'm buggered if I know any other way of doing it. And taking recursion as a given, I always like to start with the lowest level solution: the bit that doesn't actually recurse.

Here's my test for this:

it("handles top-level key/values with simple values", () => {
    input = {
        "one" = "tahi",
        "two" = {"second" = "rua"},
        "three" = {
            "third" = {"thrice" = "toru"}
        }
    }
    expected = {
        "[one]" = "tahi"
    }

    actual = flattenStruct(input)

    expect(actual).toBe(expected)
})

And the implementation to make it pass:

function flattenStruct(required struct struct) {
    return struct.reduce((flattened, key, value) => {
        if (isSimpleValue(value)) {
            return {"[#key#]" = value}
        }
        return flattened
    }, {})
}

That's lovely, and the test passes, but there's a bug in it which I didn't notice the first time around.

It handles multiple top-level key/values with simple values

it("handles multiple top-level key/values with simple values", () => {
    input = {
        "one" = "tahi",
        "first" = "tuatahi",
        "two" = {"second" = "rua"},
        "three" = {
            "third" = {"thrice" = "toru"}
        }
    }
    expected = {
        "[one]" = "tahi",
        "[first]" = "tuatahi"
    }

    actual = flattenStruct(input)

    expect(actual).toBe(expected)
})

That fails with Expected [{[one]={tahi}, [first]={tuatahi}}] but received [{[one]={tahi}}]. It's cos I'm a dropkick and not appending each iteration to the previous:

return {"[#key#]" = value}

// should be

return flattened.append({"[#key#]" = value})

Once I fixed that: the two TDD tests are green. Obviously the initial one is still broken cos we ain't done.

It handles complex values

Now all I needed was to add some structs to the sample data for the test:

it("handles complex values", () => {
    input = {
        "one" = "tahi",
        "first" = "tuatahi",
        "two" = {"second" = "rua"},
        "three" = {
            "third" = {"thrice" = "toru"}
        }
    }
    expected = {
        "[one]" = "tahi",
        "[first]" = "tuatahi",
        "[two][second]" = "rua",
        "[three][third][thrice]" = "toru"
    }

    actual = flattenStruct(input)

    expect(actual).toBe(expected)
})

And write a quick implementation:

function flattenStruct(required struct struct) {
    flatten = (flattened, key, value, _, prefix="") => {
        var prefixedKey = "#prefix#[#key#]"
        if (isSimpleValue(value)) {
            return flattened.append({"#prefixedKey#" = value})
        }

        return flattened.append(value.reduce(
            function (flattened, key, value, _, prefix=prefixedKey) {
                return flattened.append(flatten(flattened, key, value, _, prefix))
            },
            {}
        ))
    }

    return struct.reduce(flatten, {})
}

This warrants some explanation:

  • We need to extract all the logic we currently have within the outer function into its own function. We need to do this because when we recurse, we need it to call itself, so it needs a name.
  • The recursion works by saying to each level of the recursion "the previous level had this key, so append your keys to this", so we implement this, and default the prefix to be an empty string to start with.
  • We're leveraging a nice feature of CFML here in that one can pass additional arguments to a function, and "it'll just work", so we pass that after all the other arguments. We're not using the fourth parameter to the reduce callback which contains the original collection being iterated over, so we reflect that by calling the parameter _ (this is a notional standard for this sort of thing).
  • When we get to the "bottom" of the recursion we will always have a simple value for the value (otherwise we're not done recursing), so we prepend its key with the parent key (which will be all the ancestor keys combined).

I also now needed to update my previous test case expectations to include the structs coming out in the result too. In hindsight, I messed up there: I should only have had test data for the case at hand, rather than having some more data that I knew was specifically going to collapse in a heap.

But it also has a bug. The first test completely broke now: Can't cast Complex Object Type Struct to StringUse Built-In-Function "serialize(Struct):String" to create a String from Struct

I find the situation here a bit frustrating. The signature for the callback function for Struct.reduce is this:

function(previous, key, value, collection)

And that's what I have in the code above.

However. I had overlooked that the signature for the callback function for Array.reduce is this:

function(previous, value, key, collection)

It had never occurred to me that key, value and value, key were reversed between the two. Sigh. So my single handling of complex values there doesn't work because I'm passing Array.reduce the key and the value in the wrong order.

It's reasonably easy to fix though.

It handles arrays

I've added an array into the test data:

it("handles arrays values", () => {
    input = {
        "one" = "tahi",
        "first" = "tuatahi",
        "two" = {"second" = "rua"},
        "three" = {
            "third" = {"thrice" = "toru"}
        },
        "four" = ["fourth", "quaternary", "wha", "tuawha"]
    }
    expected = {
        "[one]" = "tahi",
        "[first]" = "tuatahi",
        "[two][second]" = "rua",
        "[three][third][thrice]" = "toru",
        "[four][1]" = "fourth",
        "[four][2]" = "quaternary",
        "[four][3]" = "wha",
        "[four][4]" = "tuawha"
    }

    actual = flattenStruct(input)

    expect(actual).toBe(expected)
})

And added logic to check if the value is a struct or an array before recursing. The variation just being to expect the callback parameters to be in a different order:

function flattenStruct(required struct struct) {
    flatten = (flattened, key, value, _, prefix="") => {
        var prefixedKey = "#prefix#[#key#]"
        if (isSimpleValue(value)) {
            return flattened.append({"#prefixedKey#" = value})
        }

        if (isStruct(value)) {
            return flattened.append(value.reduce(
                function (flattened={}, key, value, _, prefix=prefixedKey) {
                    return flattened.append(flatten(flattened, key, value, _, prefix))
                },
                {}
            ))
        }
        if (isArray(value)) {
            return flattened.append(value.reduce(
                function (flattened={}, value, index, _, prefix=prefixedKey) {
                    return flattened.append(flatten(flattened, index, value, _, prefix))
                },
                {}
            ))
        }
    }

    return struct.reduce(flatten, {})
}

With this amendment all the TDD tests pass, but the original test - which I was kinda expecting to also pass now - wasn't. I had a coupla bugs in my implementation still, basically from me "not reading the requirements"

It qualifies the keys correctly and zero-indexes the arrays

Yes yes, very bad to fix two things at once, I know. I didn't notice that Tom's requirements were to present the flattened keys like this (taken from the previous test): four[0], whereas I'm doing the same one like this: [four][1]. So I'm not supposed to be putting the brackets around the top-level item, plus I need to present the arrays with a zero index. This is easily done, so I did both at once. I figure I'm kinda OK doing this cos I am starting with a failing test afterall. And I also added a more complex test case in (mixing arrays, structs, simple values some more) to make sure I had nailed it:

it("qualifies the keys correctly and zero-indexes the arrays", () => {
    input = {
        "one" = "tahi",
        "first" = "tuatahi",
        "two" = {"second" = "rua"},
        "three" = {
            "third" = {"thrice" = "toru"}
        },
        "four" = ["fourth", "quaternary", "wha", "tuawha"],
        "five" = [
            {"fifth" = "rima"},
            ["tuarima"],
            "quinary"
        ]
    }
    expected = {
        "one" = "tahi",
        "first" = "tuatahi",
        "two[second]" = "rua",
        "three[third][thrice]" = "toru",
        "four[0]" = "fourth",
        "four[1]" = "quaternary",
        "four[2]" = "wha",
        "four[3]" = "tuawha",
        "five[0][fifth]" = "rima",
        "five[1][0]" = "tuarima",
        "five[2]" = "quinary"
    }

    actual = flattenStruct(input)

    expect(actual).toBe(expected)
})

And the fix:

function flattenStruct(required struct struct) {
    flatten = (flattened, key, value, actual, prefix="") => {
        var offsetKey = isArray(actual) ? key - 1 : key
        var qualifiedKey = prefix.len() > 0 ? "[#offsetKey#]" : offsetKey
        var prefixedKey = "#prefix##qualifiedKey#"

        if (isSimpleValue(value)) {
            return flattened.append({"#prefixedKey#" = value})
        }

        if (isStruct(value)) {
            return flattened.append(value.reduce(
                function (flattened={}, key, value, actual, prefix=prefixedKey) {
                    return flattened.append(flatten(flattened, key, value, actual, prefix))
                },
                {}
            ))
        }
        if (isArray(value)) {
            return flattened.append(value.reduce(
                function (flattened={}, value, index, actual, prefix=prefixedKey) {
                    return flattened.append(flatten(flattened, index, value, actual, prefix))
                },
                {}
            ))
        }
    }

    return struct.reduce(flatten, {})
}

Quite simply:

  • If the data we are reducing is an array, we reduce the index by one;
  • and we only put the brackets on if we've already got a prefix value.
  • We're using that _ parameter we were not using before, so I've given it a name.

Now everything works:

Or does it?

It handles the collection being an arguments scope object

I handed this implementation (or something very similar) to Tom with a flourish, and a coupla hours later he was puzzled about what sort of object the arguments scope is. It behaves like both a struct and an array. For some things. And it's ambiguous in others. When Tom was passing an arguments scope to this function, it was breaking again:

The arguments scope passes an isArray check, but its reduce method handles it like a struct, so they key being passed is the parameter name, not the positional index like we'd be expecting with an array that it's just claimed to us that it is.

Let's have a test for this:

it("handles arguments objects", () => {
    input = {
        "one" = "tahi",
        "first" = "tuatahi",
        "two" = {"second" = "rua"},
        "three" = {
            "third" = {"thrice" = "toru"}
        },
        "four" = ["fourth", "quaternary", "wha", "tuawha"],
        "five" = [
            {"fifth" = "rima"},
            ["tuarima"],
            "quinary"
        ],
        "six" = ((sixth, senary) => arguments)("ono", "tuaono", [6])
    }
    expected = {
        "one" = "tahi",
        "first" = "tuatahi",
        "two[second]" = "rua",
        "three[third][thrice]" = "toru",
        "four[0]" = "fourth",
        "four[1]" = "quaternary",
        "four[2]" = "wha",
        "four[3]" = "tuawha",
        "five[0][fifth]" = "rima",
        "five[1][0]" = "tuarima",
        "five[2]" = "quinary",
        "six[sixth]" = "ono",
        "six[senary]" = "tuaono",
        "six[2][0]" = 6
    }

    actual = flattenStruct(input)

    expect(actual).toBe(expected)
})

Note I'm using CFML's IIFE feature to call an inline function expression there.

I am not entirely happy with my fix for this, but I've gaffer-taped it up:

function flattenStruct(required struct struct) {
    flatten = (flattened, key, value, actual, prefix="") => {
        var offsetKey = isArray(actual) && isNumeric(key) ? key - 1 : key
        var qualifiedKey = prefix.len() > 0 ? "[#offsetKey#]" : offsetKey

I'm just checking if the "key" is numeric before I decrement it.


There's perhaps one more case I should put in there. This will silently ignore anything other than simple values, arrays or structs. I had a mind to throw an exception at the end if we'd not already returned, but I didn't get around to it. This solved the need we had, so that's all good.

Hopefully you found another incremental TDD exercise useful, and also found the problem we were trying to solve an interesting one as well. I did.

Righto.

--
Adam

Monday, 14 June 2021

Using TDD when adding new code to existing untestable code

G'day:

Just a note before I start. This article is based on a real-world situation for a company I used to work at. Because I'm basing this on proprietary information, I am not prepared (or permitted!) to share that. As such I have obfuscated some elements of the narrative, and have changed sample code around to be less specific to the industry concerned. The code is still line-for-line equivalent to the original code though with just names being changed. Nothing has been enhanced or simplified to make it artificially easier to solve the problem we really did face.

This challenge is best contextualised with a real-world situation I encountered. We had a function that was kind of a perversion of the concept of the Builder Pattern, and is pretty much a model example of how not to do it, and an example of what the Builder Pattern is designed to address.

Here's the method, shrunk down so you can get a feel for the scale of the problem, but without needing to see the reallyreally code. The code is too blurry to read, but with the syntax highlighting, you can get an idea of what's going on. And the method is… 387 lines long. Ridiculous:

I have indicated where we need to make out code change. This method (buildResponse) is building a reasonably complex Account object and wrapping it up in a Response object. The Account class itself is poorly designed and has far too many properties. And we need to add one more (don't @ me, it was not my call, this was another team's work, and I was only helping the salvage operation).

None of the existing code in the method had any tests, and the entire class had not been written with testing in mind: it had a raft of inline static method calls to external dependencies, and created other dependency objects inline. The team looking at the work had decided the code was untestable, and were about to continue adding to it, still without tests.

In a better world, even if the method had grown to be close to 400 lines long, as it evolved its tests would have evolved as well, and we'd be starting with all its current logic covered, and it would be a simple matter of adding in one new trivial case, and then adding in our one new bit of trivial code. But… that is not the reality we are living in.

Let's look at the code more closely. The code at the bottom of the method is along these lines:

    // ...
    $account->partnerInfo = $infoModel->jsonSerialize();
    $response = (object) ['account' => $account];
    return new Response($response);
}

Right before we create and return the $response object, we needed to add an apiToken property into the $account object.

The problem is that for a test to get to that last line of code, our test is going to have to execute the previous 400-odd lines of code too. And that code hits the database and a coupla different web services to source its data. Plus a couple of the DB accesses were writes to the database. We can't be doing that in a unit test, and we can't mock-out the DB and web service calls either due to how the code was written. Untestable, right? That was the conclusion the dev team had reached.

Is this code really untestable though? Can we really not use TDD to develop this change? Not at all. Deriving the test case is super easy. Barely even an inconvenience. We even already know what it is: "it must have an apiToken property sourced from the API service". So we know that. On that basis, we can even write the test for it.

/**
* @testDox it must have an apiToken property sourced from the API service
* @covers ::setApiTokenInAccount
*/
public function testBuiltAccountContainsAnApiToken()
{
    $anyUuid = $this->matchesRegularExpression($self::PATTERN_FOR_UUID);
    $anyTime = $this->matchesRegularExpression($self::PATTERN_FOR_TIMESTAMP);
    $apiConnector = $this->createMock(ApiConnector::class);
    $apiConnector
        ->expects($this->once())
        ->method('createApiToken')
        ->with($anyUuid, $anyTime)
        ->willReturn('A_VALID_TOKEN');
    ReflectionHelper::setProperty($this->accountService, 'apiConnector', apiConnector);
    $account = $this->createMock(Account::class);
    ReflectionHelper::invokeMethod($this->accountService, 'setApiTokenInAccount', [$account]);
    $this->assertSame('A_VALID_TOKEN', $account->apiToken);
}

Here it's $this->accountService that is our system under test, and it's where that huge buildResponse method is. ReflectionHelper is just a utility class that implements that ubiquitous code everyone has that uses reflection to set non-public properties, call non-public methods etc. Fortunately apiConnector is not created inline within buildResponse, it's been set into a property earlier, so we can replace it with a mock. This makes our life easier here.

We're going to extract the logic that gets the token out into its own private method - so we're only adding the minimum one line of new code into buildResponse - and we test that method directly, rather than calling buildResponse which we absolutely cannot do. Normally testing private methods is a nono. I also realise there's a strong whiff of "testing the implementation not the behaviour" going on here, which is also a TDD anti-pattern, and in the perfect world we would not do this. But we are about 370 lines of code away from the perfect world here, so we need to make do.

We have a test case that defines and tests the functionality we're expecting, and we will have an implementation to test:

private function setApiTokenInAccount(&$account)
{
    // some logic around the UUID and time argument values elided because it's not relevant here
    $apiToken = $this->apiConnector->createApiToken($uuid, $timestamp);
    $account->apiToken = $apiToken;
}

We then change buildResponse to call this:

    // ...
    $account->partnerInfo = $infoModel->jsonSerialize();
    $this->setApiTokenInAccount($account); // <----- new code
    $response = (object) ['account' => $account];

    return new Response($response);
}

We've still TDDed our solution, and we now have a bit of testing on the requirements of buildResponse even if we're not calling that method. We have to accept that we have added a line of code to buildResponse and it's now 388 lines long, and that addition isn't tested, but we cannot help that. This is a good case of "perfect is the enemy of good".

Now that we have discovered this mess, we can line-up a technical debt ticket to tidy-up buildResponse. It's a critical method, and it should not have been left to rot like this.


As it happens, we were needing to do more work on that method anyhow, and when planning out the story we built in effort to refactor it.

I need to blur the code because I can't think of suitable renamings for all the activity going on here and I can't use the original code, but you will be able to tell how much it's improved:

From 388 lines of code to 25, with most of the 25 just being one in a sequence of steps, which makes for really easy understanding of what's going on for any dev looking at the code in future. This is how a method should read.

All we did to get to here is to identify the 14 different sections of functionality in the original function and use PHPStorm's "Refactor > Extract Method" facility. We could then write tests for some of those bits of functionality, where there was no issues with inline dependencies being created. Some of them still didn't have tests because there were still untestable, but we undertook to deal with that later when we come to need to maintain them. We could also test buildResponse, because now it's basically just a recipe of steps (private methods) to follow, we could mock-out the steps, and just test that the data flow through the recipe was what was intended.

The testing for the overall method also overcomes the "testing the implementation not the behaviour" smell I mentioned before. The tests for it were testing the outer surface of the unit, and checking the results - what the response object ended up being given each of the variation of inputs - rather than how they were derived. There are still the individual tests on the private methods as well, and that's a bit "testing the implementation", but none of the tests will be impacted by changes elsewhere in the codebase, and as those bits of implementation change, then the overall tests can be updated to shift the cases to there, and retire the implementation-specific tests. When dealing with legacy code and limited time resources, backfilling test coverage and confidence is always a journey and not a destination. So as long was we've moving in the correct direction, I believe it's OK to not do things to the letter of the dogma, sometimes.

It was easy (and pretty fulfilling, I will say) work, and it's made the code a helluva lot better for the next person who comes along and needs to work on it.

In conclusion we don't need to think "oh god this is untestable", and continue making things worse by adding more untested code. The code probably is testable, just maybe in a partial fashion, and maybe also not in an ideal fashion. But we can at least start.

Righto.

--
Adam

Sunday, 2 May 2021

How TDD and automated testing helped me solve an Nginx config problem I had created for myself

G'day:

I have a "website" I'm building on as part of a series of articles I'm writing about Lucee / CFWheels / Docker. I have a Docker container running Nginx which proxies requests for CFML code to a Docker container running Lucee.

Due to the nature of web applications, I have my web-accessible assets - JS, CSS, image and "entry point" index.cfm and Application.cfc files - in a public directory off my app root; and adjacent to that I have a src directory for my code, and vendor directory for third-party code (like stuff I install from ForgeBox via CommandBox).

This /public abstraction needs to be hidden from the end user: they're going to want to be browsing to - for example - http://example.com/, not http://example.com/public. So this means the proxy from Nginx to Lucee also needs to deal with that. It is imperative that no CFML files are publicly exposed other than the aforementioned index.cfm and Application.cfc

I'm terrible at configuring Nginx, and mak a lot of mistakes. Some mistakes are obvious because Nginx flat-out refuses to start. Others are less obvious, and bleed out as "unexpected behaviour" later in the piece. Knowing this, I approached the exercise in a TDD fashion; identifying what cases need addressing and writing tests for them, and then doing the config work to make each pass. Note: my wording is ambiguous there: I did not write more than one test at a time. I wrote a test, then got the config to make that test pass. Then I wrote the next test and reconfigured to make that pass (whilst also keeping the earlier ones passing too). I've already detailed some of this in my earleir article "Adding TestBox, some tests and CFConfig into my Lucee container".

As a result of these efforts to get Nginx working how I expected it to, I ended up with these green tests:

(Ignore how I'm skipping some of these tests. This is down to a bug in CFWheels that doesn't report 404 situations with an actual 404 status code when in dev mode. This does demonstrate though how I identified a shortcoming in behaviour whilst TDDing the work, that said).

And that's great. At every step as I was configuring more and more of Nginx's handling of requests, I could see that any change I made had a) addressed the new requirement; b) not broken any previous requirement. And it was indeed the reality at times that something I tried fixed one issue, but broke something else.

Back to those tests I'm skipping. This showed to me that I was short some cases: CFWheels handles things differently from how I expect it to in some situations, so I figured I had better have two flavours of test: one set for proxying to non-CFWheel-handled URLs; another set for when the URLs are within CFWheels domain. And I'm glad I did make this call, because it showed-up a bug in my Nginx config:

And when I checked those URLs, I saw the problem:

expectedParamValue = "expectedValue"
// ... rest of test not relevant here
expect(response.fileContent).toInclude(
    "Expected query param value: [#expectedParamValue#]",
    "Query parameter value was incorrect (URL: #testUrl#)"
)

But what I was seeing at that URL was this:

Expected query param value: [expectedValue?testParam=expectedValue]

The query string part (including the ?) was being appended to the URL sent to Lucee twice (same problem for both those failing tests). I was pretty puzzled how my previous non-Wheels test was passing, and it still seemed legit. Bemusing. However I was actively appending the query string in my proxy_pass URL, so that was "clearly wrong":

proxy_pass  http://cfml-in-docker.lucee:8888/public$fastcgi_script_name$is_args$args;

I got rid of that, and figured "I had better go and check why those other tests are passing after I re-run the tests here, to check that change:


Dammit. Now the CFWheels-specific tests are passing, but the non-Wheels ones are failing. Time for me to RTFM, cos I'm clearly doing something wrong here.

The first thing I'm going wrong is here:

proxy_pass  http://cfml-in-docker.lucee:8888/public$fastcgi_script_name;

$fastcgi_script_name is a PHP thing, and whilst it coincidentally holds the URI I want, it's the wrong thing to have here. So I put $request_uri back in there.

Right and that broke all the path_info CFWheels tests, so wasn't right. I decided to read more closely. It turns out that request_uri is the whole original URI (including the path_info and query string), and thus is ignoring my rewrites, and it was the reason that the query params were getting doubled up. In my rewrite I had this:

location @rewrite {
    rewrite ^/(.*)? /index.cfm$request_uri last;
    rewrite ^ /index.cfm last;
    return 404;
}

I just wanted $uri, which is just the document URI part of the requested URI, and it also reflects any changes made to that URI during rewrites and what-have-you. So once I used that in my rewrite and for my proxy_pass URL, the tests now look better:

I've abbreviated how long it took me to work this out, and how many cycles of trial end error it too me. Having those automated tests in place were gold because after each iteration I knew how wrong I was - for all cases - in half a second. I didn't need to manually go "OK, did it work for this?" "did it break this other one?" etc, for 20-odd tests.

It was also a big help to me to take the TDD path here, and stop and think & reason about exactly what my expectations ought to be for each of the cases I had. It also lead me to add more cases, such as the combinations of "it has both path_info and query parameters", as well as realising the path through the Nginx config was different for URLs aimed at Wheels (which are completely rewritten), and the ones directly to files in the public directory. I could easily cover both cases by duplicating the tests and changing the URLs sligthly.

Things seem to be working now, but if I find something else wrong, I will first work out what my expectations of it to be right are, and write a quick test for it. Then I'll fix it (without breaking anything else).

For now though: I'm fed-up with Nginx & CFML & CFWheel and I'm gonna do something els for a while. But I'll be back to it later this afternoon: I'm wll behind wehre I want to be with this stuff, and using the bank-holiday weekend to catch up a bit.

The "final" state of my Nginx site config is (docker/nginx/sites/default.conf):

server {
    listen 80;
    listen [::]:80;

    #rewrite_log on;

    server_name cfml-in-docker.frontend;
    root /usr/share/nginx/html;
    index index.html index.cfm;

    resolver 127.0.0.11;

    location / {
        try_files $uri $uri/ @rewrite;
    }

    location @rewrite {
        rewrite ^/(.*)? /index.cfm$uri last;
        rewrite ^ /index.cfm last;
        return 404;
    }

    location ~ \.(?:cfm|cfc)\b {
        proxy_http_version  1.1;
        proxy_set_header    Connection "";
        proxy_set_header    Host                $host;
        proxy_set_header    X-Forwarded-Host    $host;
        proxy_set_header    X-Forwarded-Server  $host;
        proxy_set_header    X-Forwarded-For     $proxy_add_x_forwarded_for;     ## CGI.REMOTE_ADDR
        proxy_set_header    X-Forwarded-Proto   $scheme;                        ## CGI.SERVER_PORT_SECURE
        proxy_set_header    X-Real-IP           $remote_addr;
        expires             epoch;

        proxy_pass  http://cfml-in-docker.lucee:8888/public$uri$is_args$args;
    }

    location ~ /\.ht {
        deny all;
    }
}

Righto, where's me shooting game?

--
Adam

Sunday, 11 April 2021

TDD: eating the elephant one bite at a time

G'day:

I've got another interesting reader comment to address today. My namesake Adam Tuttle has sent through this wodge of questions, attached to my earlier article "TDD is not a testing strategy":

You weren't offering to teach anyone about TDD in this post, but hey... I'm here, you're here, I have questions... Shall we?

One of the things I struggle with w/r/t TDD is the temptation to test every. single. action. For example, a large, complex form-save. Dozens, possibly 100 fields. Whether that ends up saved via ORM entities or queries, chances are good that since the form is so large the logic is a bit beyond a dirt-simple single-CRUD query. Multiple relationships, order of operations, permissions to modify different fields, etc.

My gut reaction is to skip the unit-testing layer and jump up to an integration or E2E test: submit the form, then view the detail-view (or re-open the record for editing) and assert that the values changed have persisted and they are what you're expecting, where you're expecting it, on the latter view.

BUT doesn't that almost entirely eliminate the possibility of using mocks to make the tests fast(er), the base-state predictable, and to not leave a mess in a designated testing db/env? My (and I mean this literally!) feeble, bad-at-TDD brain doesn't comprehend what a good solution is to this problem.

Unless the solution is to not test that aspect of the code? I fully subscribe to the "100% test coverage is a fools errand" ethos, so perhaps this is something that should just not be tested; and save the testing for things that are doing "interesting" algorithmic work? (not-crud)

Since I specifically mentioned permission to edit a certain field in my example, I guess I should say that stands out to me as something I would likely want to test. Thinking about it now, my brain wants to architect a system that accepts the user object and the field name as inputs and returns a boolean for editable or not. Easy enough to implement and you're basically changing the conditional in the save method from "if user has X permission, entity.setProperty(newval)" to "if evaluatePermissions(user, property) = true, entity.setProperty(newval)", so there's no big mental leap to the next developer to read the code... but it also seems hairy to separate the permission logic from the form-save logic, not because of the separation, but because it leads towards combining the permission logic of lots of disparate and unrelated forms. I'm not seeing how that could be cleanly implemented.

So yeah. There's a can of worms for you. What do you make of that?

Nice one. There's a lot of work there, so I am going to approach it how I'd approach addressing any other requirements: a bit at a time. Like I'm doing TDD. Except I've NFI how I can write tests for a blog article, so just imagine that part. Also remember that TDD is not a testing strategy, it's a design strategy, so my TDD-ish approach here is focusing on identifying cases, and addressing them one at a time. OKOK, this is torturing my fixation with TDD a bit. Sorry.

You weren't offering to teach anyone about TDD in this post

You weren't offering to teach anyone about TDD in this post. OK, so first point. I'm always open to excuses to think about TDD practices, and how we can use them to address our work. So don't worry abou that.

It needs to save a large form

For example, a large, complex form-save. Dozens, possibly 100 fields. For my convenience I am going to interpret this as two separate things: the form, and the code that processes a post request. I suspect you were only meaning the latter. But, really, the same approach applies to both.

In seeing a large HTML form, you are not using a TDD mindset: how do I test that huge thing?! Using the TDD mindset, it's not a huge thing. It starts off being nothing. It starts off perhaps with "requests to /myForm.html return a 200-OK". From there it might move on to "It will be submitted as a POST to /processMyForm", and the to "after a successful form submission the user is redirected to /formSubmissionResults.html, and that request's status is 201-CREATED". Small steps. No form fields at all yet. But we have tested that requirements of your work have been tested (and implemented and pass the tests).

Next you might start addressing a form field requirement: "it has a text field with maximum length 100 for firstName". Quickly after that you have the same case for lastName. And then there might be 20 other fields that are all text and all have a sole constraint of maxLength, so you can test all of those really quicky but with still the same amount of care with a data provider that passes the test the case variations, but otherwise is the same test. This is still super quick, and your case still shows that you have addressed the requirement. And you can demonstrate that with your test output:

  Tests of WorkshopRegistrationForm component
     should have a required text input for fullName, maxLength 100, and label 'Full name'
     should have a required text input for phoneNumber, maxLength 50, and label 'Phone number'
     should have a required text input for emailAddress, maxLength 320, and label 'Email address'
     should have a required password input for password, maxLength 255, and label 'Password'
     should have a required workshopsToAttend multiple-select box, with label 'Workshops to attend'
     should list the workshop options fetched from the back-end
     should have a button to submit the registration
    - should leave the submit button disabled until the form is filled


  7 passing (48ms)
  1 pending

 MOCHA  Tests completed successfully

(I've lifted that from the blog article I mention lower down).

Not all form fields are so simple. Some need to be select boxes that source their data from [somewhere]. "It has a select for favouriteColour, which offers values returned from a call to /colours/?type=favourite". This needs better testing that just name and length. "It has a password field that only accepts [rules]". Definitely needs testing discrete from the other tests. Etc

Your form is a collection of form fields all of which will have stated requirements. If the requirements have been stated, it stands to reason you should demonstrate you've met the requirements. Both now in the first iteration of development, and that this continues to hold true during subsequent iterations (direct or indirect: basically new work doesn't break existing tests).

I cover an approach to this in article "Vue.js: using TDD to develop a data-entry form". It's a small form, but the technique scales.

Bottom line when using TDD you don't start with a massive form.

It's a similar story on the form submission handler. The TDD process doesn't start with "holy f**k 100 form fields!", it starts of with a POST request. or it might start with a controller method receiving a request object that represents that request. Each value in that request must have validation, and you must test that, because validation is a) critical, b) fiddly and error-prone. But you start with one field: firstName must exist, and must be between 1-100 character. You'd have these cases:

  • It's not passed with the request at all (fail);
  • It's passed with the request but its value is empty (fail);
  • It's passed with the request and its length is 1 (pass);
  • It's passed with the request and its length is 100 (pass);
  • It's passed with the request and its length is 101 (fail);

These are requirements your client has given you: You need to test them!

The validation tests are perhaps a good example of where one might use a focused unit test, rather than a functional test that actually makes a request to /processMyForm and analyses the response. Maybe you just pass a request object, or the request body values to a validate method, and check the results.

Once validation is in place, you'd need to vary the response based on those results: "when validation fails it returns a 400-BAD-REQUEST"; "when validation fails it returns a non-empty-array errors with validation failure details", etc. All actual requirements you've been given; all need to be tested.

Then you'd move on to whatever other business logic is needed, step by step, until you get to a point where yer firing some values into storage or whatever, and you check the expected values for each field are passed to the right place in storage. Although I'd still use a mock (or spy, or whatever the precise term is), and just check what values it receives, rather than actually letting the test write to storage.

It also has end-to-end acceptance tests

At this point you can demonstrate the requirements have been tested, and you know they work. I'd then put an end-to-end happy path test on that (maybe all the way from automating the form submission with a virtual web client, maybe just by sending a POST request; either is valid). And then I'd do an end-to-end unhappy path test, eg: when validation fails are the correct messages put in the correct place on the form, or whatever. Maybe there are other valid variations of end-to-end tests here, but I would not think to have an end-to-end test for each form field, and each validation rule. That'd be fiddly to write, and slow to run.

It does need to cover all the behaviour

I fully subscribe to the "100% test coverage is a fools errand" ethos. Steady on there. There's 100% and there's 100%. This notion is applied to lines-of-code, or 100% of methods, or basically implementation detail stuff. And it's also usually trotted out by someone who's looking at the code after it's been done, and is faced with a whole pile of testing to write and trying to work out ways of wriggling out of it. This is no slight on you, Adam (Tuttle), it's just how I have experienced devs rationalise this with me. If one does TDD / BDD, then one is not thinking about lines of code when one is testing. One is thinking about behaviour. And the behaviour has been requested by a client, and the behaviour needs to work. So we test the behaviour. Whether that's 1 line of code or 100 is irrelevant. However the test will exercise the code, because the code only ever came into existence to address the case / behaviour being delivered. Using TDD generally results in ~100% of the code being covered because you don't write code you don't need, which is the only time code might not be covered. How did that code get in there? Why did you write it? Obviously it's not needed so get rid of it ;-).

The key here is that 100% of behaviour gets covered.

Nothing is absolute though. There will be situations where some code - for whatever reason - is just not testable. This is rare, but it happens. In that case: don't get hung up by it. Isolate it away by itself, and mark it as not covered (eg in PHPUNit we have @codeCoverageIgnore), and move on. But be circumspect when making this decision, and the situations that one can't test some code is very rare. I find devs quite often seem to confuse "can't" with "don't feel like ~". Two different things ;-)

I'll also draw you back to an article I wrote ages ago about the benefits of 100% test coverage: "Yeah, you do want 100% test coverage". TL;DR: where in these two displays can you spot then new code that is accidentally missing test coverage:


Accidents are easy to spot when a previously all-green board starts being not all-green.

It uses emergent design to solve large problems

[My] brain wants to architect a system that [long and complicated description follows]. One of the premises of TDD is that you let the solution architect itself. I'm not 100% behind this as I can't quite see it yet, but I know I do find it really daunting if my requirement seems to be "it all does everything I need it to do", and I don't know where to start with that. This was my real life experience doing that Vue.js stuff I linked to above. I really did start with "yikes this whole form thing is gonna be a monster!? I don't even know where to start!". I pushed the end result I thought I might have to the back of your mind, especially the architectural side of things (which will probably more define itself in the refactor stage of things, not the red / green part).

And I started by adding a route for the form, and then I responded to request to that route with a 200-OK. And then moved on to the next bite of the elephant.

HTH.

--
Adam (Cameron)