Adam Cameron's Dev Blog: TDD

Showing posts with label TDD. Show all posts

Sunday 5 September 2021

Testing: reasoning over "testing implementation detail" vs "testing features"

G'day:

I can't see how this article will be a very long one, as it only really dwells on one concept I had to reason through before I decided I wasn't talking horseshit.

I'm helping some community members with improving their undertanding of testing, and I was reviewing some code that was in need of some test coverage. The code below is not the actual code - for obvious "plausible deniability" reasons - but it's pretty much the same situation and logic flow we were looking at, just with different business entities and the basis for the rule has been "uncomplicated" for the sake of this article:

function validateEmail() {
    if ((isInstanceOf(this.orderItems, "ServicesCollection") || isInstanceOf(this.orderItems, "ProductsCollection")) && !isValid("email", this?.email) ) {
        var hasItemWithCost = this.orderItems.some((item) => item.cost > 0)
        if (hasItemWithCost) { // (*)
            this.addError("Valid email is required")
        }
    }
}

// (*) see AdamT's comment below. I was erroneously calling .len() on hasItemWithCost due to me not having an operational brain. Fixed.

This function is called as part of an object's validation before it's processed. It's also slightly contrived, and the only real bit is that it's a validation function. For this example the business rule is that if any items in the order have a cost associated with them; then the order also needs an email address. Say the receipt needs to be emailed or something. The real world code was different, but this is a fairly common-knowledge sort of parallel to the situation.

All the rest of the code contributing to the data being validated already existed, we're just got this new validation rule around the email not always being optional now.

We're not doing TDD here (yet!), so it's a matter of backfilling tests. The exercise was to identify what test need to be written.

We picked two obvious ones:

make an orderItems collection with no costs in it, and the email shouldn't be checked;
make an orderItems collection with at least one oneitem having a costs in it, and the email should be present and valid;

But then I figured that we've skipped a big chunk of the conditional logic: we're ignoring the check of whether the collection is one of a ServicesCollection or a ProductsCollection. NB: unfortunately the order can comprise collections of two disparate types, and there's no common interface we can check against instead. There's possibly some tech debt there.

I'm thinking we have three more cases here:

the collection is a ServicesCollection;
the collection is a ProductsCollection;
the collection is neither of those, somehow. This seems unlikely, but it seems it could be true.

Then I stopped and thought about whether by dissecting that if statement so closely, I am just chasing 100% line-of-code coverage. And worse: aren't the specifics of that part of the statement just implementation detail, rather than part of the actual requirement? The feature was "update the order system so that orders requiring a financial transaction require a valid email address". It doesn't say anything about what sort of collections we use to store the order.

Initially I concluded it was just implementation detail and there was no (test) case to answer to here. But it stuck in my craw that we were not testing that part of the conditional logic. I chalked that up to me being pedantic about making sure every line of code has coverage. Remember I generally use TDD when writing my code, so it's just weird to me to have conditional code with no specific tests, because the condition would only ever be written because a feature called for it.

But it kept gnawing at me.

…

Then the penny dropped (this was a few hours later, I have to admit).

It's not just testing implementation detail. The issue / confusion has arisen because we're not doing TDD.

If I rewind to when we were writing the function, and take a TDD approach, I'd be going "right, what's the first test case here?". The first test case - the reason why want to type in isInstanceOf(this.orderItems, "ServicesCollection") - is because part of the stated requirement is implicit. When the client said "if any items in the order have a cost associated with them…" they were not explicit about "this applies to both orders for services as well as orders for products", because in the business's vernacular the distinction between types of orders isn't one that's generally considered. For a lot of requirements there's just the concept of "orders" (this is why there should be an interface for orders!). However the fully-stated requirement could be made more accurately "if any items in an order for services have a cost associated with them…", followed by "likewise, if any items in an order for products have a cost associated with them…". It's this business information that inform our test cases, which would be better enumerated as:

it requires an email address if any items in a services order have an associated cost;
it requires an email address if any items in a products order have an associated cost;
it does not require an email address if the order is empty;

I had to chase up what the story was if the collection was neither of those types, and it turns out there wasn't a third type that was immune to the email address requirement, it's just - as per the nice clear test case! - the order might be empty, but the object as a whole still needs validation.

There is still "implementation detail" here - the code checking the collection type is obviously the implementation of that part of the business requirement: all code is implementation detail after all. The muddiness here arose because we were backfilling tests, rather than using TDD. Had we used TDD I'd not have been questioning myself when I was deciding on my test cases. The test cases are driven by business requirements; they implementation is driven by the test cases. And, indeed, the test implementation has to busy itself with implementation detail too, because to implement those tests we need to pass-in collections of the type the function is checking for.

As an aside, I was not happy with how busy that if condition was. It seemed to be doing too much to me, and mixing a coupla different things: type checks and a validity check. I re-reasoned the situation and concluded that the validation function has absolutely nothing to do if the email is there and is valid: this passes all cases right from the outset. So I will be recommending we separate that out into its own initial guard clause:

function validateEmail() {
    if (isValid("email", this.email)) {
        return
    }
    if ((isInstanceOf(this.orderItems, "ServicesCollection") || isInstanceOf(this.orderItems, "ProductsCollection"))) {
        var hasItemsWithCost = this.orderItems.some((item) => item.cost > 0)
        if (hasItemsWithCost.len()) {
            this.addError("Valid email is required")
        }
    }
}

I think this refactor makes things a bit easier to reason through what's going on. And note that this is a refactor, so we'll do it after we've got the tests in place and green.

I now wonder if this is something I should be looking out for more often? Is a compound if condition that is evaluating "asymmetrical" sub-conditions indicative of a code smell? Hrm… will need to keep an eye on this notion.

Now if only I could get that OrderCollection interface created, the code would be tidier-still, and closer to the business requirements.

Righto.

--
Adam

PS: I actually doubted myself again whilst writing this, but by the time I had written it all out, I'm happy that the line of thinking and the test cases are legit.

Tuesday 24 August 2021

Test coverage: it's not about lines of code

G'day

A coupla days ago someone shared this Twitter status with me:

I'll just repeat the text here for google & copy-n-paste-ability:

My personal opinion: Having 100% test coverage doesn't make your code more effective. Tests should add value but at some point being overly redundant and testing absolutely every line of code is ineffective. Ex) Testing that a component renders doesn't IN MY OPINION add value.

Emma's position here is spot on, IMO. There were also a lot of "interesting" replies to it. Quelle surprise. Go read some of them, but then give up when they get too repetitive / unhelpful.

In a similar - but slightly contrary - context, I was reminded of my erstwhile article on this topic: "Yeah, you do want 100% test coverage".

But hang on: on one hand I say Emma is spot on. On the other hand I cross-post an old article that seems to contradict this. What gives?

The point of differentiation is "testing absolutely every line of code" vs "100% test coverage". Emma is talking about lines of code. I'm talking about test cases.

Don't worry about testing lines of code. That's intrinsically testing "implementation". However do care about your test cases: the variations that define the feature you are working on. you've been asked to develop a feature and its variations, and you don't know you've delivered the feature (or in the case of automated testing: continue to having been delivered in a working state) unless you test it.

Now, yeah, sure: features are made of lines of code, but don't worry about those. A trite example I posited to a mate yesterday is this: consider this to be the implementation of your feature:

doMyFeature() {
    // line 1
    // line 2
    // line 3
    // line 4
    // line 5
    // line 6
    // line 7
    // line 8
    // line 9
    // line 10
}

I challenge you to write a test for "My Feature", and not by implictation happen to test all those lines of code. But it's the feature we care about, not the lines of code.

On the other hand, let's consider a variant:

doMyFeature() {
    // line 1
    // line 2
    // line 3
    // line 4
    // line 5
    if (someVariantOfMyFeature) {
        // line 6
    }
    // line 7
    // line 8
    // line 9
    // line 10
}

If you run your test coverage analysis and see that line 6 ain't tested, this is not a case of wringing one's hands about line 6 of the code not being tested; it's that yer not testing the variation of the feature. It's the feature coverage that's missing. Not the lines-of-code coverage.

Intrinsically your code coverage analysis tooling probably marks individual lines of code as green or red or whatever, but only if you look that closely. If you zoom out a bit, you'll see that the method the lines of code are in is either green or orange or red; and out further the class is likewise green / orange / red, and probably says something like "76% coverage". The tooling necessarily needs to work in lines-of-code because its a dumb machine, and those are the units it can work on. You are the programmer don't need to focus on the lines-of-code. You have a brain, and what the report is saying is "you've not tested part of your feature", and it's saying "the bit of the feature that is represented by these lines of code". That bit. You're not testing it. Go test yer feature properly.

There's parts of the implementation of a feature I won't test maybe. Your DAOs and your other adapters for external services? Maybe not something I'd test during the TDD cycle of things, but it might still be prudent to chuck in an integration test for those. I mean they do need to work and continue to work. But I see this as possibly outside of feature testing. Maybe? Is it?

Also - now I won't repeat too much of that other article - there's a difference between "coverage" and "actually tested". Responsible devs can mark some code as "won't test" (eg in PHPUnit with a @codeCoverageIgnore), and if the reasons are legit then they're "covered" there.

Why I'm writing this fairly short article when it's kinda treading on ground I've already covered is because of some of the comments I saw in reply to Emma. Some devs are a bit shit, and a bit lazy, and they will see what Emma has written, and their take away will be "basically I don't need to test x because x is some lines of code, and we should not be focusing on lines of code for testing". That sounds daft, but I know people that have rationalised their way out of testing perfectly testable (and test-necessary) code due to "oh we should not test implementation details", or "you're just focusing on lines of code, that's wrong", and that sort of shit. Sorry pal, but a feature necessarily has an implementation, and that's made out of lines of code, so you will need to address that implementation in yer testing, which will innately report that those lines of code are "covered".

Also this notion of "100% coverage is too high, so maybe 80% is fine" (possibly using the Pareto Pinciple, poss just pulling a number of out their arses). Serious question: which 80%? If you set that goal, the a lot of devs will go "aah, it's in that 20% we don't test". Or they'll test the easy 80%, and leave the hard 20% - the bit that needs testing - to be the 20% they don't test (as I said in the other article: cos they basically can't be arsed).

Sorry to be a bit of a downer when it comes to devs' level of responsibility / professionalism / what-have-you. There are stack of devs out there who rock and just "get" what Emma (et al) is saying, and crack on with writing excellently-designed and well-tested features. But they probably didn't need Emma's messaging in the first place. I will just always think some more about advice like this, and wonder how it can be interpreted, and then be a contrarian and say "well actually…" (cringe, OK I hope I'm not actually doing that here?), and offer a different informed observation.

So my position remains: set your sights on 100% feature coverage. Use TDD, and your code will be better and leaner and you'll also likely end up with 100% LOC coverage too (but that doesn't matter; it's just a test-implementation detail ;-).

Righto.

--
Adam

Tuesday 6 July 2021

TDD, code puzzle, recursing reductions

G'day:

I couldn't think of a title for this one, so: that'll do.

A few days ago Tom King posted this on the CFML Slack channel:

I'm sure I've asked this before, but has anyone got a good function to take complex values and parse out to a curl style string? i.e, https://trycf.com/gist/0c2672ac4a02d4c52d1e3ee1c9a408e7/lucee5?theme=monokai

The code in the gist is thus:

original = {
    'payment_method_types' = ['card'],
    'line_items' = [
        {
            "price_data" = {
                "product_data" = {
                    "name" = "Something"
                },
                "currency" = 'gbp',
                "unit_amount" = 123
            },
            "quantity" = 1
        }
    ]
};

writeDump(var=original, label="Original");

wantedResult["payment_method_types[0]"] = 'card';
wantedResult["line_items[0][price_data][product_data][name]"] ='Something';
wantedResult["line_items[0][price_data][currency]"] = 'gbp';
wantedResult["line_items[0][price_data][unit_amount]"] = 123;
wantedResult["line_items[0][quantity]"] = 1;

writeDump(var=wantedResult, label="Want this:");

So he wants to flatten out the key-path to the values. Cool. I'll give that a bash. Note: we've actually already solved this, but I'm now going back over the evolution of my solution, following my TDD steps. This is more a demo of using TDD than it is solving the problem: don't worry about how much code their is in this article, it's mostly incremental repetition, or just largely copy-and-pasted-and-tweaked tests. I will admit that for my first take on this I just wrote the one test that compared my result to his final expectation, and it took me far more time to sort it out than it ought to. I did the whole thing again last night to prep the code for this article, and using more focused TDD I solved it faster. Obviously I was benefiting from already kinda remembering how I did it the first time around, but even the first time around I could see how to do it before I even wrote any code, I just pantsed around making mistakes the first time due to not focusing. TDD on my second attempt helped me focus.

Last things first: the final test

I've got the "client's" final requirement already, so I'll create a test for that now, and watch my code error because it doesn't exist (my code, that is). I'll link the test case description through to the current state of the code in Github. I'll tag each step.

it("fulfils the requirement of the puzzle", () => {
    input = {
        "payment_method_types" = ["card"],
        "line_items" = [
            {
                "price_data" = {
                    "product_data" = {
                        "name" = "Something"
                    },
                    "currency" = "gbp",
                    "unit_amount" = 123
                },
                "quantity" = 1
            }
        ]
    }
    expected = {
        "payment_method_types[0]" = "card",
        "line_items[0][price_data][product_data][name]" = "Something",
        "line_items[0][price_data][currency]" = "gbp",
        "line_items[0][price_data][unit_amount]" = 123,
        "line_items[0][quantity]" = 1
    }

    actual = flattenStruct(input)

    expect(actual).toBe(expected)
})

This errors with No matching function [flattenStruct] found, because I ain't even started yet! Note: for this exercise, I'm just building the function in the test class itself, given it's just one function, and it only needs to exist to fulfil this blogging/TDD exercise.

This test will keep erroring until I've finished. This is fine. I am not going to look at that test code again until I have finished the task though.

It handles simple value key/value pairs

I'm not going to be crazily pendantic with the TDD here. For example in this case I'm rolling in "creating the function and solving the first step of the challenge" into one step. I've already got a test failing due to the lack of the function existing, so I can watch it's progress for some of the micro-iterations I am missing out. So for this step I'm gonna iterate over the struct, and deal with the really easy cases: top level key/value pairs that are simple values and do not require any recursion to work. This is also the "bottom" level of the recursion I'll move on to, so know I already need it anyhow. Maybe making the decision that I need to use recursion to sort this out is pre-emptive, but I'm buggered if I know any other way of doing it. And taking recursion as a given, I always like to start with the lowest level solution: the bit that doesn't actually recurse.

Here's my test for this:

it("handles top-level key/values with simple values", () => {
    input = {
        "one" = "tahi",
        "two" = {"second" = "rua"},
        "three" = {
            "third" = {"thrice" = "toru"}
        }
    }
    expected = {
        "[one]" = "tahi"
    }

    actual = flattenStruct(input)

    expect(actual).toBe(expected)
})

And the implementation to make it pass:

function flattenStruct(required struct struct) {
    return struct.reduce((flattened, key, value) => {
        if (isSimpleValue(value)) {
            return {"[#key#]" = value}
        }
        return flattened
    }, {})
}

That's lovely, and the test passes, but there's a bug in it which I didn't notice the first time around.

It handles multiple top-level key/values with simple values

it("handles multiple top-level key/values with simple values", () => {
    input = {
        "one" = "tahi",
        "first" = "tuatahi",
        "two" = {"second" = "rua"},
        "three" = {
            "third" = {"thrice" = "toru"}
        }
    }
    expected = {
        "[one]" = "tahi",
        "[first]" = "tuatahi"
    }

    actual = flattenStruct(input)

    expect(actual).toBe(expected)
})

That fails with Expected [{[one]={tahi}, [first]={tuatahi}}] but received [{[one]={tahi}}]. It's cos I'm a dropkick and not appending each iteration to the previous:

return {"[#key#]" = value}

// should be

return flattened.append({"[#key#]" = value})

Once I fixed that: the two TDD tests are green. Obviously the initial one is still broken cos we ain't done.

It handles complex values

Now all I needed was to add some structs to the sample data for the test:

it("handles complex values", () => {
    input = {
        "one" = "tahi",
        "first" = "tuatahi",
        "two" = {"second" = "rua"},
        "three" = {
            "third" = {"thrice" = "toru"}
        }
    }
    expected = {
        "[one]" = "tahi",
        "[first]" = "tuatahi",
        "[two][second]" = "rua",
        "[three][third][thrice]" = "toru"
    }

    actual = flattenStruct(input)

    expect(actual).toBe(expected)
})

And write a quick implementation:

function flattenStruct(required struct struct) {
    flatten = (flattened, key, value, _, prefix="") => {
        var prefixedKey = "#prefix#[#key#]"
        if (isSimpleValue(value)) {
            return flattened.append({"#prefixedKey#" = value})
        }

        return flattened.append(value.reduce(
            function (flattened, key, value, _, prefix=prefixedKey) {
                return flattened.append(flatten(flattened, key, value, _, prefix))
            },
            {}
        ))
    }

    return struct.reduce(flatten, {})
}

This warrants some explanation:

We need to extract all the logic we currently have within the outer function into its own function. We need to do this because when we recurse, we need it to call itself, so it needs a name.
The recursion works by saying to each level of the recursion "the previous level had this key, so append your keys to this", so we implement this, and default the prefix to be an empty string to start with.
We're leveraging a nice feature of CFML here in that one can pass additional arguments to a function, and "it'll just work", so we pass that after all the other arguments. We're not using the fourth parameter to the reduce callback which contains the original collection being iterated over, so we reflect that by calling the parameter _ (this is a notional standard for this sort of thing).
When we get to the "bottom" of the recursion we will always have a simple value for the value (otherwise we're not done recursing), so we prepend its key with the parent key (which will be all the ancestor keys combined).

I also now needed to update my previous test case expectations to include the structs coming out in the result too. In hindsight, I messed up there: I should only have had test data for the case at hand, rather than having some more data that I knew was specifically going to collapse in a heap.

But it also has a bug. The first test completely broke now: Can't cast Complex Object Type Struct to StringUse Built-In-Function "serialize(Struct):String" to create a String from Struct

I find the situation here a bit frustrating. The signature for the callback function for Struct.reduce is this:

function(previous, key, value, collection)

And that's what I have in the code above.

However. I had overlooked that the signature for the callback function for Array.reduce is this:

function(previous, value, key, collection)

It had never occurred to me that key, value and value, key were reversed between the two. Sigh. So my single handling of complex values there doesn't work because I'm passing Array.reduce the key and the value in the wrong order.

It's reasonably easy to fix though.

It handles arrays

I've added an array into the test data:

it("handles arrays values", () => {
    input = {
        "one" = "tahi",
        "first" = "tuatahi",
        "two" = {"second" = "rua"},
        "three" = {
            "third" = {"thrice" = "toru"}
        },
        "four" = ["fourth", "quaternary", "wha", "tuawha"]
    }
    expected = {
        "[one]" = "tahi",
        "[first]" = "tuatahi",
        "[two][second]" = "rua",
        "[three][third][thrice]" = "toru",
        "[four][1]" = "fourth",
        "[four][2]" = "quaternary",
        "[four][3]" = "wha",
        "[four][4]" = "tuawha"
    }

    actual = flattenStruct(input)

    expect(actual).toBe(expected)
})

And added logic to check if the value is a struct or an array before recursing. The variation just being to expect the callback parameters to be in a different order:

function flattenStruct(required struct struct) {
    flatten = (flattened, key, value, _, prefix="") => {
        var prefixedKey = "#prefix#[#key#]"
        if (isSimpleValue(value)) {
            return flattened.append({"#prefixedKey#" = value})
        }

        if (isStruct(value)) {
            return flattened.append(value.reduce(
                function (flattened={}, key, value, _, prefix=prefixedKey) {
                    return flattened.append(flatten(flattened, key, value, _, prefix))
                },
                {}
            ))
        }
        if (isArray(value)) {
            return flattened.append(value.reduce(
                function (flattened={}, value, index, _, prefix=prefixedKey) {
                    return flattened.append(flatten(flattened, index, value, _, prefix))
                },
                {}
            ))
        }
    }

    return struct.reduce(flatten, {})
}

With this amendment all the TDD tests pass, but the original test - which I was kinda expecting to also pass now - wasn't. I had a coupla bugs in my implementation still, basically from me "not reading the requirements"

It qualifies the keys correctly and zero-indexes the arrays

Yes yes, very bad to fix two things at once, I know. I didn't notice that Tom's requirements were to present the flattened keys like this (taken from the previous test): four[0], whereas I'm doing the same one like this: [four][1]. So I'm not supposed to be putting the brackets around the top-level item, plus I need to present the arrays with a zero index. This is easily done, so I did both at once. I figure I'm kinda OK doing this cos I am starting with a failing test afterall. And I also added a more complex test case in (mixing arrays, structs, simple values some more) to make sure I had nailed it:

it("qualifies the keys correctly and zero-indexes the arrays", () => {
    input = {
        "one" = "tahi",
        "first" = "tuatahi",
        "two" = {"second" = "rua"},
        "three" = {
            "third" = {"thrice" = "toru"}
        },
        "four" = ["fourth", "quaternary", "wha", "tuawha"],
        "five" = [
            {"fifth" = "rima"},
            ["tuarima"],
            "quinary"
        ]
    }
    expected = {
        "one" = "tahi",
        "first" = "tuatahi",
        "two[second]" = "rua",
        "three[third][thrice]" = "toru",
        "four[0]" = "fourth",
        "four[1]" = "quaternary",
        "four[2]" = "wha",
        "four[3]" = "tuawha",
        "five[0][fifth]" = "rima",
        "five[1][0]" = "tuarima",
        "five[2]" = "quinary"
    }

    actual = flattenStruct(input)

    expect(actual).toBe(expected)
})

And the fix:

function flattenStruct(required struct struct) {
    flatten = (flattened, key, value, actual, prefix="") => {
        var offsetKey = isArray(actual) ? key - 1 : key
        var qualifiedKey = prefix.len() > 0 ? "[#offsetKey#]" : offsetKey
        var prefixedKey = "#prefix##qualifiedKey#"

        if (isSimpleValue(value)) {
            return flattened.append({"#prefixedKey#" = value})
        }

        if (isStruct(value)) {
            return flattened.append(value.reduce(
                function (flattened={}, key, value, actual, prefix=prefixedKey) {
                    return flattened.append(flatten(flattened, key, value, actual, prefix))
                },
                {}
            ))
        }
        if (isArray(value)) {
            return flattened.append(value.reduce(
                function (flattened={}, value, index, actual, prefix=prefixedKey) {
                    return flattened.append(flatten(flattened, index, value, actual, prefix))
                },
                {}
            ))
        }
    }

    return struct.reduce(flatten, {})
}

Quite simply:

If the data we are reducing is an array, we reduce the index by one;
and we only put the brackets on if we've already got a prefix value.
We're using that _ parameter we were not using before, so I've given it a name.

Now everything works:

Or does it?

It handles the collection being an arguments scope object

I handed this implementation (or something very similar) to Tom with a flourish, and a coupla hours later he was puzzled about what sort of object the arguments scope is. It behaves like both a struct and an array. For some things. And it's ambiguous in others. When Tom was passing an arguments scope to this function, it was breaking again:

The arguments scope passes an isArray check, but its reduce method handles it like a struct, so they key being passed is the parameter name, not the positional index like we'd be expecting with an array that it's just claimed to us that it is.

Let's have a test for this:

it("handles arguments objects", () => {
    input = {
        "one" = "tahi",
        "first" = "tuatahi",
        "two" = {"second" = "rua"},
        "three" = {
            "third" = {"thrice" = "toru"}
        },
        "four" = ["fourth", "quaternary", "wha", "tuawha"],
        "five" = [
            {"fifth" = "rima"},
            ["tuarima"],
            "quinary"
        ],
        "six" = ((sixth, senary) => arguments)("ono", "tuaono", [6])
    }
    expected = {
        "one" = "tahi",
        "first" = "tuatahi",
        "two[second]" = "rua",
        "three[third][thrice]" = "toru",
        "four[0]" = "fourth",
        "four[1]" = "quaternary",
        "four[2]" = "wha",
        "four[3]" = "tuawha",
        "five[0][fifth]" = "rima",
        "five[1][0]" = "tuarima",
        "five[2]" = "quinary",
        "six[sixth]" = "ono",
        "six[senary]" = "tuaono",
        "six[2][0]" = 6
    }

    actual = flattenStruct(input)

    expect(actual).toBe(expected)
})

Note I'm using CFML's IIFE feature to call an inline function expression there.

I am not entirely happy with my fix for this, but I've gaffer-taped it up:

function flattenStruct(required struct struct) {
    flatten = (flattened, key, value, actual, prefix="") => {
        var offsetKey = isArray(actual) && isNumeric(key) ? key - 1 : key
        var qualifiedKey = prefix.len() > 0 ? "[#offsetKey#]" : offsetKey

I'm just checking if the "key" is numeric before I decrement it.

There's perhaps one more case I should put in there. This will silently ignore anything other than simple values, arrays or structs. I had a mind to throw an exception at the end if we'd not already returned, but I didn't get around to it. This solved the need we had, so that's all good.

Hopefully you found another incremental TDD exercise useful, and also found the problem we were trying to solve an interesting one as well. I did.

Righto.

--
Adam

Monday 14 June 2021

Using TDD when adding new code to existing untestable code

G'day:

Just a note before I start. This article is based on a real-world situation for a company I used to work at. Because I'm basing this on proprietary information, I am not prepared (or permitted!) to share that. As such I have obfuscated some elements of the narrative, and have changed sample code around to be less specific to the industry concerned. The code is still line-for-line equivalent to the original code though with just names being changed. Nothing has been enhanced or simplified to make it artificially easier to solve the problem we really did face.

This challenge is best contextualised with a real-world situation I encountered. We had a function that was kind of a perversion of the concept of the Builder Pattern, and is pretty much a model example of how not to do it, and an example of what the Builder Pattern is designed to address.

Here's the method, shrunk down so you can get a feel for the scale of the problem, but without needing to see the reallyreally code. The code is too blurry to read, but with the syntax highlighting, you can get an idea of what's going on. And the method is… 387 lines long. Ridiculous:

I have indicated where we need to make out code change. This method (buildResponse) is building a reasonably complex Account object and wrapping it up in a Response object. The Account class itself is poorly designed and has far too many properties. And we need to add one more (don't @ me, it was not my call, this was another team's work, and I was only helping the salvage operation).

None of the existing code in the method had any tests, and the entire class had not been written with testing in mind: it had a raft of inline static method calls to external dependencies, and created other dependency objects inline. The team looking at the work had decided the code was untestable, and were about to continue adding to it, still without tests.

In a better world, even if the method had grown to be close to 400 lines long, as it evolved its tests would have evolved as well, and we'd be starting with all its current logic covered, and it would be a simple matter of adding in one new trivial case, and then adding in our one new bit of trivial code. But… that is not the reality we are living in.

Let's look at the code more closely. The code at the bottom of the method is along these lines:

    // ...
    $account->partnerInfo = $infoModel->jsonSerialize();
    $response = (object) ['account' => $account];
    return new Response($response);
}

Right before we create and return the $response object, we needed to add an apiToken property into the $account object.

The problem is that for a test to get to that last line of code, our test is going to have to execute the previous 400-odd lines of code too. And that code hits the database and a coupla different web services to source its data. Plus a couple of the DB accesses were writes to the database. We can't be doing that in a unit test, and we can't mock-out the DB and web service calls either due to how the code was written. Untestable, right? That was the conclusion the dev team had reached.

Is this code really untestable though? Can we really not use TDD to develop this change? Not at all. Deriving the test case is super easy. Barely even an inconvenience. We even already know what it is: "it must have an apiToken property sourced from the API service". So we know that. On that basis, we can even write the test for it.

/**
* @testDox it must have an apiToken property sourced from the API service
* @covers ::setApiTokenInAccount
*/
public function testBuiltAccountContainsAnApiToken()
{
    $anyUuid = $this->matchesRegularExpression($self::PATTERN_FOR_UUID);
    $anyTime = $this->matchesRegularExpression($self::PATTERN_FOR_TIMESTAMP);
    $apiConnector = $this->createMock(ApiConnector::class);
    $apiConnector
        ->expects($this->once())
        ->method('createApiToken')
        ->with($anyUuid, $anyTime)
        ->willReturn('A_VALID_TOKEN');
    ReflectionHelper::setProperty($this->accountService, 'apiConnector', apiConnector);
    $account = $this->createMock(Account::class);
    ReflectionHelper::invokeMethod($this->accountService, 'setApiTokenInAccount', [$account]);
    $this->assertSame('A_VALID_TOKEN', $account->apiToken);
}

Here it's $this->accountService that is our system under test, and it's where that huge buildResponse method is. ReflectionHelper is just a utility class that implements that ubiquitous code everyone has that uses reflection to set non-public properties, call non-public methods etc. Fortunately apiConnector is not created inline within buildResponse, it's been set into a property earlier, so we can replace it with a mock. This makes our life easier here.

We're going to extract the logic that gets the token out into its own private method - so we're only adding the minimum one line of new code into buildResponse - and we test that method directly, rather than calling buildResponse which we absolutely cannot do. Normally testing private methods is a nono. I also realise there's a strong whiff of "testing the implementation not the behaviour" going on here, which is also a TDD anti-pattern, and in the perfect world we would not do this. But we are about 370 lines of code away from the perfect world here, so we need to make do.

We have a test case that defines and tests the functionality we're expecting, and we will have an implementation to test:

private function setApiTokenInAccount(&$account)
{
    // some logic around the UUID and time argument values elided because it's not relevant here
    $apiToken = $this->apiConnector->createApiToken($uuid, $timestamp);
    $account->apiToken = $apiToken;
}

We then change buildResponse to call this:

    // ...
    $account->partnerInfo = $infoModel->jsonSerialize();
    $this->setApiTokenInAccount($account); // <----- new code
    $response = (object) ['account' => $account];

    return new Response($response);
}

We've still TDDed our solution, and we now have a bit of testing on the requirements of buildResponse even if we're not calling that method. We have to accept that we have added a line of code to buildResponse and it's now 388 lines long, and that addition isn't tested, but we cannot help that. This is a good case of "perfect is the enemy of good".

Now that we have discovered this mess, we can line-up a technical debt ticket to tidy-up buildResponse. It's a critical method, and it should not have been left to rot like this.

As it happens, we were needing to do more work on that method anyhow, and when planning out the story we built in effort to refactor it.

I need to blur the code because I can't think of suitable renamings for all the activity going on here and I can't use the original code, but you will be able to tell how much it's improved:

From 388 lines of code to 25, with most of the 25 just being one in a sequence of steps, which makes for really easy understanding of what's going on for any dev looking at the code in future. This is how a method should read.

All we did to get to here is to identify the 14 different sections of functionality in the original function and use PHPStorm's "Refactor > Extract Method" facility. We could then write tests for some of those bits of functionality, where there was no issues with inline dependencies being created. Some of them still didn't have tests because there were still untestable, but we undertook to deal with that later when we come to need to maintain them. We could also test buildResponse, because now it's basically just a recipe of steps (private methods) to follow, we could mock-out the steps, and just test that the data flow through the recipe was what was intended.

The testing for the overall method also overcomes the "testing the implementation not the behaviour" smell I mentioned before. The tests for it were testing the outer surface of the unit, and checking the results - what the response object ended up being given each of the variation of inputs - rather than how they were derived. There are still the individual tests on the private methods as well, and that's a bit "testing the implementation", but none of the tests will be impacted by changes elsewhere in the codebase, and as those bits of implementation change, then the overall tests can be updated to shift the cases to there, and retire the implementation-specific tests. When dealing with legacy code and limited time resources, backfilling test coverage and confidence is always a journey and not a destination. So as long was we've moving in the correct direction, I believe it's OK to not do things to the letter of the dogma, sometimes.

It was easy (and pretty fulfilling, I will say) work, and it's made the code a helluva lot better for the next person who comes along and needs to work on it.

In conclusion we don't need to think "oh god this is untestable", and continue making things worse by adding more untested code. The code probably is testable, just maybe in a partial fashion, and maybe also not in an ideal fashion. But we can at least start.

Righto.

--
Adam

Sunday 2 May 2021

How TDD and automated testing helped me solve an Nginx config problem I had created for myself

G'day:

I have a "website" I'm building on as part of a series of articles I'm writing about Lucee / CFWheels / Docker. I have a Docker container running Nginx which proxies requests for CFML code to a Docker container running Lucee.

Due to the nature of web applications, I have my web-accessible assets - JS, CSS, image and "entry point" index.cfm and Application.cfc files - in a public directory off my app root; and adjacent to that I have a src directory for my code, and vendor directory for third-party code (like stuff I install from ForgeBox via CommandBox).

This /public abstraction needs to be hidden from the end user: they're going to want to be browsing to - for example - http://example.com/, not http://example.com/public. So this means the proxy from Nginx to Lucee also needs to deal with that. It is imperative that no CFML files are publicly exposed other than the aforementioned index.cfm and Application.cfc

I'm terrible at configuring Nginx, and mak a lot of mistakes. Some mistakes are obvious because Nginx flat-out refuses to start. Others are less obvious, and bleed out as "unexpected behaviour" later in the piece. Knowing this, I approached the exercise in a TDD fashion; identifying what cases need addressing and writing tests for them, and then doing the config work to make each pass. Note: my wording is ambiguous there: I did not write more than one test at a time. I wrote a test, then got the config to make that test pass. Then I wrote the next test and reconfigured to make that pass (whilst also keeping the earlier ones passing too). I've already detailed some of this in my earleir article "Adding TestBox, some tests and CFConfig into my Lucee container".

As a result of these efforts to get Nginx working how I expected it to, I ended up with these green tests:

(Ignore how I'm skipping some of these tests. This is down to a bug in CFWheels that doesn't report 404 situations with an actual 404 status code when in dev mode. This does demonstrate though how I identified a shortcoming in behaviour whilst TDDing the work, that said).

And that's great. At every step as I was configuring more and more of Nginx's handling of requests, I could see that any change I made had a) addressed the new requirement; b) not broken any previous requirement. And it was indeed the reality at times that something I tried fixed one issue, but broke something else.

Back to those tests I'm skipping. This showed to me that I was short some cases: CFWheels handles things differently from how I expect it to in some situations, so I figured I had better have two flavours of test: one set for proxying to non-CFWheel-handled URLs; another set for when the URLs are within CFWheels domain. And I'm glad I did make this call, because it showed-up a bug in my Nginx config:

And when I checked those URLs, I saw the problem:

expectedParamValue = "expectedValue"
// ... rest of test not relevant here
expect(response.fileContent).toInclude(
    "Expected query param value: [#expectedParamValue#]",
    "Query parameter value was incorrect (URL: #testUrl#)"
)

But what I was seeing at that URL was this:

Expected query param value: [expectedValue?testParam=expectedValue]

The query string part (including the ?) was being appended to the URL sent to Lucee twice (same problem for both those failing tests). I was pretty puzzled how my previous non-Wheels test was passing, and it still seemed legit. Bemusing. However I was actively appending the query string in my proxy_pass URL, so that was "clearly wrong":

proxy_pass  http://cfml-in-docker.lucee:8888/public$fastcgi_script_name$is_args$args;

I got rid of that, and figured "I had better go and check why those other tests are passing after I re-run the tests here, to check that change:

Dammit. Now the CFWheels-specific tests are passing, but the non-Wheels ones are failing. Time for me to RTFM, cos I'm clearly doing something wrong here.

The first thing I'm going wrong is here:

proxy_pass  http://cfml-in-docker.lucee:8888/public$fastcgi_script_name;

$fastcgi_script_name is a PHP thing, and whilst it coincidentally holds the URI I want, it's the wrong thing to have here. So I put $request_uri back in there.

Right and that broke all the path_info CFWheels tests, so wasn't right. I decided to read more closely. It turns out that request_uri is the whole original URI (including the path_info and query string), and thus is ignoring my rewrites, and it was the reason that the query params were getting doubled up. In my rewrite I had this:

location @rewrite {
    rewrite ^/(.*)? /index.cfm$request_uri last;
    rewrite ^ /index.cfm last;
    return 404;
}

I just wanted $uri, which is just the document URI part of the requested URI, and it also reflects any changes made to that URI during rewrites and what-have-you. So once I used that in my rewrite and for my proxy_pass URL, the tests now look better:

I've abbreviated how long it took me to work this out, and how many cycles of trial end error it too me. Having those automated tests in place were gold because after each iteration I knew how wrong I was - for all cases - in half a second. I didn't need to manually go "OK, did it work for this?" "did it break this other one?" etc, for 20-odd tests.

It was also a big help to me to take the TDD path here, and stop and think & reason about exactly what my expectations ought to be for each of the cases I had. It also lead me to add more cases, such as the combinations of "it has both path_info and query parameters", as well as realising the path through the Nginx config was different for URLs aimed at Wheels (which are completely rewritten), and the ones directly to files in the public directory. I could easily cover both cases by duplicating the tests and changing the URLs sligthly.

Things seem to be working now, but if I find something else wrong, I will first work out what my expectations of it to be right are, and write a quick test for it. Then I'll fix it (without breaking anything else).

For now though: I'm fed-up with Nginx & CFML & CFWheel and I'm gonna do something els for a while. But I'll be back to it later this afternoon: I'm wll behind wehre I want to be with this stuff, and using the bank-holiday weekend to catch up a bit.

The "final" state of my Nginx site config is (docker/nginx/sites/default.conf):

server {
    listen 80;
    listen [::]:80;

    #rewrite_log on;

    server_name cfml-in-docker.frontend;
    root /usr/share/nginx/html;
    index index.html index.cfm;

    resolver 127.0.0.11;

    location / {
        try_files $uri $uri/ @rewrite;
    }

    location @rewrite {
        rewrite ^/(.*)? /index.cfm$uri last;
        rewrite ^ /index.cfm last;
        return 404;
    }

    location ~ \.(?:cfm|cfc)\b {
        proxy_http_version  1.1;
        proxy_set_header    Connection "";
        proxy_set_header    Host                $host;
        proxy_set_header    X-Forwarded-Host    $host;
        proxy_set_header    X-Forwarded-Server  $host;
        proxy_set_header    X-Forwarded-For     $proxy_add_x_forwarded_for;     ## CGI.REMOTE_ADDR
        proxy_set_header    X-Forwarded-Proto   $scheme;                        ## CGI.SERVER_PORT_SECURE
        proxy_set_header    X-Real-IP           $remote_addr;
        expires             epoch;

        proxy_pass  http://cfml-in-docker.lucee:8888/public$uri$is_args$args;
    }

    location ~ /\.ht {
        deny all;
    }
}

Righto, where's me shooting game?

--
Adam

Sunday 11 April 2021

TDD: eating the elephant one bite at a time

G'day:

I've got another interesting reader comment to address today. My namesake Adam Tuttle has sent through this wodge of questions, attached to my earlier article "TDD is not a testing strategy":

You weren't offering to teach anyone about TDD in this post, but hey... I'm here, you're here, I have questions... Shall we?

One of the things I struggle with w/r/t TDD is the temptation to test every. single. action. For example, a large, complex form-save. Dozens, possibly 100 fields. Whether that ends up saved via ORM entities or queries, chances are good that since the form is so large the logic is a bit beyond a dirt-simple single-CRUD query. Multiple relationships, order of operations, permissions to modify different fields, etc.

My gut reaction is to skip the unit-testing layer and jump up to an integration or E2E test: submit the form, then view the detail-view (or re-open the record for editing) and assert that the values changed have persisted and they are what you're expecting, where you're expecting it, on the latter view.

BUT doesn't that almost entirely eliminate the possibility of using mocks to make the tests fast(er), the base-state predictable, and to not leave a mess in a designated testing db/env? My (and I mean this literally!) feeble, bad-at-TDD brain doesn't comprehend what a good solution is to this problem.

Unless the solution is to not test that aspect of the code? I fully subscribe to the "100% test coverage is a fools errand" ethos, so perhaps this is something that should just not be tested; and save the testing for things that are doing "interesting" algorithmic work? (not-crud)

Since I specifically mentioned permission to edit a certain field in my example, I guess I should say that stands out to me as something I would likely want to test. Thinking about it now, my brain wants to architect a system that accepts the user object and the field name as inputs and returns a boolean for editable or not. Easy enough to implement and you're basically changing the conditional in the save method from "if user has X permission, entity.setProperty(newval)" to "if evaluatePermissions(user, property) = true, entity.setProperty(newval)", so there's no big mental leap to the next developer to read the code... but it also seems hairy to separate the permission logic from the form-save logic, not because of the separation, but because it leads towards combining the permission logic of lots of disparate and unrelated forms. I'm not seeing how that could be cleanly implemented.

So yeah. There's a can of worms for you. What do you make of that?

Nice one. There's a lot of work there, so I am going to approach it how I'd approach addressing any other requirements: a bit at a time. Like I'm doing TDD. Except I've NFI how I can write tests for a blog article, so just imagine that part. Also remember that TDD is not a testing strategy, it's a design strategy, so my TDD-ish approach here is focusing on identifying cases, and addressing them one at a time. OKOK, this is torturing my fixation with TDD a bit. Sorry.

You weren't offering to teach anyone about TDD in this post

You weren't offering to teach anyone about TDD in this post. OK, so first point. I'm always open to excuses to think about TDD practices, and how we can use them to address our work. So don't worry abou that.

It needs to save a large form

For example, a large, complex form-save. Dozens, possibly 100 fields. For my convenience I am going to interpret this as two separate things: the form, and the code that processes a post request. I suspect you were only meaning the latter. But, really, the same approach applies to both.

In seeing a large HTML form, you are not using a TDD mindset: how do I test that huge thing?! Using the TDD mindset, it's not a huge thing. It starts off being nothing. It starts off perhaps with "requests to /myForm.html return a 200-OK". From there it might move on to "It will be submitted as a POST to /processMyForm", and the to "after a successful form submission the user is redirected to /formSubmissionResults.html, and that request's status is 201-CREATED". Small steps. No form fields at all yet. But we have tested that requirements of your work have been tested (and implemented and pass the tests).

Next you might start addressing a form field requirement: "it has a text field with maximum length 100 for firstName". Quickly after that you have the same case for lastName. And then there might be 20 other fields that are all text and all have a sole constraint of maxLength, so you can test all of those really quicky but with still the same amount of care with a data provider that passes the test the case variations, but otherwise is the same test. This is still super quick, and your case still shows that you have addressed the requirement. And you can demonstrate that with your test output:

  Tests of WorkshopRegistrationForm component
    ✓ should have a required text input for fullName, maxLength 100, and label 'Full name'
    ✓ should have a required text input for phoneNumber, maxLength 50, and label 'Phone number'
    ✓ should have a required text input for emailAddress, maxLength 320, and label 'Email address'
    ✓ should have a required password input for password, maxLength 255, and label 'Password'
    ✓ should have a required workshopsToAttend multiple-select box, with label 'Workshops to attend'
    ✓ should list the workshop options fetched from the back-end
    ✓ should have a button to submit the registration
    - should leave the submit button disabled until the form is filled

  7 passing (48ms)
  1 pending

MOCHA  Tests completed successfully

(I've lifted that from the blog article I mention lower down).

Not all form fields are so simple. Some need to be select boxes that source their data from [somewhere]. "It has a select for favouriteColour, which offers values returned from a call to /colours/?type=favourite". This needs better testing that just name and length. "It has a password field that only accepts [rules]". Definitely needs testing discrete from the other tests. Etc

Your form is a collection of form fields all of which will have stated requirements. If the requirements have been stated, it stands to reason you should demonstrate you've met the requirements. Both now in the first iteration of development, and that this continues to hold true during subsequent iterations (direct or indirect: basically new work doesn't break existing tests).

I cover an approach to this in article "Vue.js: using TDD to develop a data-entry form". It's a small form, but the technique scales.

Bottom line when using TDD you don't start with a massive form.

It's a similar story on the form submission handler. The TDD process doesn't start with "holy f**k 100 form fields!", it starts of with a POST request. or it might start with a controller method receiving a request object that represents that request. Each value in that request must have validation, and you must test that, because validation is a) critical, b) fiddly and error-prone. But you start with one field: firstName must exist, and must be between 1-100 character. You'd have these cases:

It's not passed with the request at all (fail);
It's passed with the request but its value is empty (fail);
It's passed with the request and its length is 1 (pass);
It's passed with the request and its length is 100 (pass);
It's passed with the request and its length is 101 (fail);

These are requirements your client has given you: You need to test them!

The validation tests are perhaps a good example of where one might use a focused unit test, rather than a functional test that actually makes a request to /processMyForm and analyses the response. Maybe you just pass a request object, or the request body values to a validate method, and check the results.

Once validation is in place, you'd need to vary the response based on those results: "when validation fails it returns a 400-BAD-REQUEST"; "when validation fails it returns a non-empty-array errors with validation failure details", etc. All actual requirements you've been given; all need to be tested.

Then you'd move on to whatever other business logic is needed, step by step, until you get to a point where yer firing some values into storage or whatever, and you check the expected values for each field are passed to the right place in storage. Although I'd still use a mock (or spy, or whatever the precise term is), and just check what values it receives, rather than actually letting the test write to storage.

It also has end-to-end acceptance tests

At this point you can demonstrate the requirements have been tested, and you know they work. I'd then put an end-to-end happy path test on that (maybe all the way from automating the form submission with a virtual web client, maybe just by sending a POST request; either is valid). And then I'd do an end-to-end unhappy path test, eg: when validation fails are the correct messages put in the correct place on the form, or whatever. Maybe there are other valid variations of end-to-end tests here, but I would not think to have an end-to-end test for each form field, and each validation rule. That'd be fiddly to write, and slow to run.

It does need to cover all the behaviour

I fully subscribe to the "100% test coverage is a fools errand" ethos. Steady on there. There's 100% and there's 100%. This notion is applied to lines-of-code, or 100% of methods, or basically implementation detail stuff. And it's also usually trotted out by someone who's looking at the code after it's been done, and is faced with a whole pile of testing to write and trying to work out ways of wriggling out of it. This is no slight on you, Adam (Tuttle), it's just how I have experienced devs rationalise this with me. If one does TDD / BDD, then one is not thinking about lines of code when one is testing. One is thinking about behaviour. And the behaviour has been requested by a client, and the behaviour needs to work. So we test the behaviour. Whether that's 1 line of code or 100 is irrelevant. However the test will exercise the code, because the code only ever came into existence to address the case / behaviour being delivered. Using TDD generally results in ~100% of the code being covered because you don't write code you don't need, which is the only time code might not be covered. How did that code get in there? Why did you write it? Obviously it's not needed so get rid of it ;-).

The key here is that 100% of behaviour gets covered.

Nothing is absolute though. There will be situations where some code - for whatever reason - is just not testable. This is rare, but it happens. In that case: don't get hung up by it. Isolate it away by itself, and mark it as not covered (eg in PHPUNit we have @codeCoverageIgnore), and move on. But be circumspect when making this decision, and the situations that one can't test some code is very rare. I find devs quite often seem to confuse "can't" with "don't feel like ~". Two different things ;-)

I'll also draw you back to an article I wrote ages ago about the benefits of 100% test coverage: "Yeah, you do want 100% test coverage". TL;DR: where in these two displays can you spot then new code that is accidentally missing test coverage:

Accidents are easy to spot when a previously all-green board starts being not all-green.

It uses emergent design to solve large problems

[My] brain wants to architect a system that [long and complicated description follows]. One of the premises of TDD is that you let the solution architect itself. I'm not 100% behind this as I can't quite see it yet, but I know I do find it really daunting if my requirement seems to be "it all does everything I need it to do", and I don't know where to start with that. This was my real life experience doing that Vue.js stuff I linked to above. I really did start with "yikes this whole form thing is gonna be a monster!? I don't even know where to start!". I pushed the end result I thought I might have to the back of your mind, especially the architectural side of things (which will probably more define itself in the refactor stage of things, not the red / green part).

And I started by adding a route for the form, and then I responded to request to that route with a 200-OK. And then moved on to the next bite of the elephant.

HTH.

--
Adam (Cameron)

Thursday 8 April 2021

TDD and external services

G'day

You might have noticed I spend a bit of my time encouraging people to use TDD, or at the very least making sure yer code is tested somehow. But use TDD ;-)

As an interesting aside, I recently failed a technical interview because the interviewer didn't feel I was strong enough at the testing side of things. Given what I see around the industry… that seems to be a moderately high bar yer setting for yerselves there, peeps. Or perhaps I'm just shit at articulating myself. Hrm. But anyway.

OK, so I rattled out a quick article a few days ago - "TDD & professionalism: a brief follow-up to Thoughts on Working Code podcast's Testing episode" - which revisits some existing ground and by-and-large is not relevant to what I'm going to say here, other than the "TDD & professionalism" being why I bang on about it so much. And you might think I bang on about it here, but I also bang on about it at work (when I have work I mean), and in my background conversations too. I try to limit it to only my technical associates, that said.

Right so Mingo hit me up in a comment on that article, asking this question:

Something I ran into was needing to access the external API for the tests and I understand that one usually uses mocking for that, right? But, my question is then: how do you then **know** that you're actually calling the API correctly? Should I build the error handling they have in their API into my mocked up API as well (so I can test my handling of invalid inputs)? This feels like way too much work. I chose to just call the API and use a test account on there, which has it's own issues, because that test account could be setup differently than the multiple different live ones we have. I guess I should just verify my side of things, it's just that it's nice when it's testing everything together.

Yep, good question. With new code, my approach to the TDD is based on the public interface doing what's been asked of it. One can see me working through this process in my earlier article "Symfony & TDD: adding endpoints to provide data for front-end workshop / registration requirements". Here I'm building a web service end point - by definition the public interface to some code - and I am always hitting the controller (via the routing). And whatever I start testing, I just "fake it until I make it". My first test case here is "It needs to return a 200-OK status for GET requests on the /workshops endpoint", and the test is this:

/**
 * @testdox it needs to return a 200-OK status for successful GET requests
 * @covers \adamCameron\fullStackExercise\Controller\WorkshopsController
 */
public function testDoGetReturns200()
{
    $this->client->request('GET', '/workshops/');

    $this->assertEquals(Response::HTTP_OK, $this->client->getResponse()->getStatusCode());
}

To get this to pass, the first iteration of the implementation code is just this:

public function doGet() : JsonResponse
{
    return new JsonResponse(null);
}

The next case is "It returns a collection of workshop objects, as JSON", implemented thus:

/**
 * @testdox it returns a collection of workshop objects, as JSON
 * @covers \adamCameron\fullStackExercise\Controller\WorkshopsController
 */
public function testDoGetReturnsJson()
{
    $workshops = [
        new Workshop(1, 'Workshop 1'),
        new Workshop(2, 'Workshop 2')
    ];

    $this->client->request('GET', '/workshops/');

    $resultJson = $this->client->getResponse()->getContent();
    $result = json_decode($resultJson, false);

    $this->assertCount(count($workshops), $result);
    array_walk($result, function ($workshopValues, $i) use ($workshops) {
        $workshop = new Workshop($workshopValues->id, $workshopValues->name);
        $this->assertEquals($workshops[$i], $workshop);
    });
}

And the code to make it work shows I've pushed the mocking one level back into the application:

class WorkshopsController extends AbstractController
{

    private WorkshopCollection $workshops;

    public function __construct(WorkshopCollection $workshops)
    {
        $this->workshops = $workshops;
    }

    public function doGet() : JsonResponse
    {
        $this->workshops->loadAll();

        return new JsonResponse($this->workshops);
    }
}

class WorkshopCollection implements \JsonSerializable
{
    /** @var Workshop[] */
    private $workshops;

    public function loadAll()
    {
        $this->workshops = [
            new Workshop(1, 'Workshop 1'),
            new Workshop(2, 'Workshop 2')
        ];
    }

    public function jsonSerialize()
    {
        return $this->workshops;
    }
}

(I've skipped a step here… the first iteration could/should be to mock the data right there in the controller, and then refactor it into the model, but this isn't about refactoring, it's about mocking).

From here I refactor further, so that instead of having the data itself in loadAll, the WorkshopCollection calls a repository, and the repository calls a DAO, which for now ends up being:

class WorkshopsDAO
{
    public function selectAll() : array
    {
        return [
            ['id' => 1, 'name' => 'Workshop 1'],
            ['id' => 2, 'name' => 'Workshop 2']
        ];
    }
}

The next step is where Mingo's question comes in. The next refactor is to swap out the mocked data for a DB call. We'll end up with this:

class WorkshopsDAO
{
    private Connection $connection;

    public function __construct(Connection $connection)
    {
        $this->connection = $connection;
    }

    public function selectAll() : array
    {
        $sql = "
            SELECT
                id, name
            FROM
                workshops
            ORDER BY
                id ASC
        ";
        $statement = $this->connection->executeQuery($sql);

        return $statement->fetchAllAssociative();
    }
}

But wait. if we do that, our unit tests will be hitting the DB. Which we are not gonna do. We've run out of things to directly mock as we're at the lower-boundary of our application, and the connection object is "someon else's code" (Doctrine/DBAL in this case). We can't mock that, but fortunately this is why I have the DAO tier. It acts as the frontier between our app and the external service provider, and we still mock that:

public function testDoGetReturnsJson()
{
    $workshopDbValues = [
        ['id' => 1, 'name' => 'Workshop 1'],
        ['id' => 2, 'name' => 'Workshop 2']
    ];

    $this->mockWorkshopDaoInServiceContainer($workshopDbValues);

    // ... unchanged ...

    array_walk($result, function ($workshopValues, $i) use ($workshopDbValues) {
        $this->assertEquals($workshopDbValues[$i], $workshopValues);
    });
}

private function mockWorkshopDaoInServiceContainer($returnValue = []): void
{
    $mockedDao = $this->createMock(WorkshopsDAO::class);
    $mockedDao->method('selectAll')->willReturn($returnValue);

    $container = $this->client->getContainer();
    $workshopRepository = $container->get('test.WorkshopsRepository');

    $reflection = new \ReflectionClass($workshopRepository);
    $property = $reflection->getProperty('dao');
    $property->setAccessible(true);
    $property->setValue($workshopRepository, $mockedDao);
}

We just use a mocking library (baked into PHPUnit in this case) to create a runtime mock, and we put that into our repository.

The tests pass, the DB is left alone, and the code is "complete" so we can push it to production perhaps. But we are not - as Mingo observed - actually testing that what we are asking the DB to do is being done. Because all our tests mock the DB part of things out.

The solution is easy, but it's not done via a unit test. It's done via an integration test (or end-to-end test, or acceptance test or whatever you wanna call it), which hits the real endpoint which queries the real database, and gets the real data. Adjacent to that in the test we hit the DB directly to fetch the records we're expecting, and then we compare the JSON that the end point returns represents the same data we manually fetched from the DB. This tests the SQL statement in the DAO, that the data fetched models OK in the repo, and that the model (WorkshopCollection here) applies whatever business logic is necessary to the data from the repo before passing it back to the controller to return with the response, which was requested via the external URL. IE: it tests end-to-end.

public function testDoGetExternally()
{
    $client = new Client([
        'base_uri' => 'http://fullstackexercise.backend/'
    ]);

    $response = $client->get('workshops/');
    $this->assertEquals(Response::HTTP_OK, $response->getStatusCode());
    $workshops = json_decode($response->getBody(), false);

    /** @var Connection */
    $connection = static::$container->get('database_connection');
    $expectedRecords = $connection->query("SELECT id, name FROM workshops ORDER BY id ASC")->fetchAll();

    $this->assertCount(count($expectedRecords), $workshops);
    array_walk($expectedRecords, function ($record, $i) use ($workshops) {
        $this->assertEquals($record['id'], $workshops[$i]->id);
        $this->assertSame($record['name'], $workshops[$i]->name);
    });
}

Note that despite I'm saying "it's not a unit test, it's an integration test", I'm still implementing it via PHPUnit. The testing framework should just provide testing functionality: it should not dictate what kind of testing you implement with it. And similarly not all tests written with PHPUnit are unit tests. They are xUnit style tests, eg: in a class called SomethingTest, and the the methods are prefixed with test and use assertion methods to implement the test constraints.

Also: why don't I just use end-to-end tests then? They seem more reliable? Yep they are. However they are also more fiddly to write as they have more set-up / tear-down overhead, so they take longer to write. Also they generally take longer to run, and given TDD is supposed to be a very quick cadence of test / run / code / run / refactor / run, the less overhead the better. The slower your tests are, the more likely you are to switch to writing code and testing later once you need to clear your head. In the mean time your code design has gone out the window. Also unit tests are more focused - addressing only a small part of the codebase overall - and that has merit in itself. Aso I used a really really trivial example here, but some end-to-end tests are really very tricky to write, given the overall complexity of the functionality being tested. I've been in the lucky place that at my last gig we had a dedicated QA development team, and they wrote the end-to-end tests for us, but this also meant that those tests were executed after the dev considered the tasks "code complete", and QA ran the tests to verify this. There is no definitive way of doing this stuff, that said.

To round this out, I'm gonna digress into another chat I had with Mingo yesterday:

Normally I'd say this:

Unit tests
Test logic of one small part of the code (maybe a public method in one class). If you were doing TDD and about to add a condition into your logic, you'd write a until test to cover the new expectations that the condition brings to the mix.

Functional tests
These are a subset of unit tests which might test a broader section of the application, eg from the public frontier of the application (so like an endpoint) down to where the code leaves the system (to a logger, or a DB, or whatever). The difference between unit tests and functional tests - to me - are just how distributed the logic being tests is throughout the system.

Integration tests
Test that the external connections all work fine. So if you use the app's DB configuration, the correct database is usable. I'd personally consider a test an integration test if it only focused on a single integration.

Acceptance tests(or end-to-end tests)
Are to integration tests what functional tests are to unit tests: a broader subset. That test above is an end-to-end test, it tests the web server, the application and the DB.

And yes I know the usages of these terms vary a bit.

Furthermore, considering the distinction between BDD and TDD:

The BDD part is the nicely-worded case labels, which in theory (but seldom in practise, I find) are written in direct collaboration with the client user.

The TDD part is when in the design-phase they are created: with TDD it's before the implementation is written; I am not sure whether in BDD it matters or is stipulated.

But both of them are design / development strategies, not testing strategies.

The tests can be implemented as any sort of test, not specifically unit tests or functional tests or end-to-end tests. The point is the test defines the design of the piece of code being written: it codifies the expectations of the behaviour of the code.

BDD and TDD tests are generally implemented via some unit testing framework, be it xUnit (testMyMethodDoesSomethingRight), or Jasmine-esque (it("does something right", function (){}).

One can also do testing that is not TDD or BDD, but it's a less than ideal way of going about things, and I would image result in subpar tests, fragmented test coverage, and tests that don't really help understand the application, so are harder to maintain in a meaningful way. But they are still better than no tests at all.

When I am designing my code, I use TDD, and I consider my test cases in a BDD-ish fashion (except I do it on the client's behalf generally, and sadly), and I use PHPUnit (xUnit) to do so on PHP, and Mocha (Jasime-esque) to do so on Javascript.

Hopefully that clarifies some things for people. Or people will leap at me and tell me where I'm wrong, and I can learn the error in my ways.

Righto.

--
Adam

Tuesday 6 April 2021

TDD is not a testing strategy

TDD. Is. Not. A. Testing. Strategy.

Just a passing thought. Apropros to absolutely nothing. 'Onest guv.(*)

Dunno if it occurred to you, but that TDD thing? It's not a testing strategy. It's a design strategy.

Let's look at the name. In the name test-driven is a compound adjective: it just modifies the subject. The subject is development. It's about development. It's not about testing.

It's probably better described by BDD, although to me that's a documentation strategy, rather than a development (or testing) one. BDD is about arriving at the test cases (with the client), TDD is about implementing those cases.

The purpose of TDD is to define a process that leads you - as a developer - to pay more attention to the design of your code. It achieves this by forcing you to address the requirement as a set of needs (or cases), eg "it needs to return the product of the two operands". Then you demonstrate your understanding of the case by demonstrating what it is for the case to "work" (a test that when you pass 2 and 3 to the function it returns 6), and then you implement the code to address that case. Then you refine the case, refactor the implementation so it's "nicer", or move on to the next case, and cycle through that one. Rinse and repeat.

But all along the object of the exercise is to think about what needs to be done, break it into small pieces, and code just what's needed to implement the customer need. It also provides a firm foundation to be able to safely refactor the code once it's working. You know: the bit that you do to make your code actually good; rather than just settling for "doesn't break", which is a very low bar to set yourself.

That you end up with repeatable tests along the way is a by-product of TDD. Not the reason you're doing it. Although obviously it's key to offering that stability and confidence for the refactor phase.

Too many people I interact with when they're explaining why it's OK they don't do TDD [because reasons], fall back to the validity / longevity of the tests. It's… not about the tests. It's about how you design your solutions.

Lines of code are not a measure of productivity

Tangential to this, we all know that LOC are not a measure of productivity. There's not a uniform relationship between one line of code and another adjacent line of code. Or ten lines of code in one logic block that represent the implementation of a method are likely to represent less productivity burden than a single line of code nested 14-levels deep in some flow-control-logic monstrousity. We all know this. Not all lines of code are created equal. More is definitely not better. But fewer is also not intrinsically better. It's just an invalid metric.

So why is it that so many people are prepared to count the lines of code a test adds to the codebase as a rationalisation (note: not a justification, because it's invalid) as to why they don't have time to write that test? Or that the test represents an undue burden in the codebase. Why are they back to measuring productivity with LOC? Why won't they let it occur to them that - especially when employing TDD - the investment in the LOC for the test code reduces the investment in the LOC for the production code? And note I am not meaning this as a benefit that one only realises over time having amortised it over a long code lifespan. I mean on the first iteration of code<->test<->release, because the bouncing back and forth between each step there will be reduced. Even for code which this might (although probably won't) be the only iteration the production code sees.

It's just "measure twice, cut once" for code. Of course one doesn't have the limitation in code that one can only cut once; the realisation here needs to be that "measuring" takes really a lot less time than "cutting" when it comes to code.

In closing, if you are rationalising to me (or, in reality: to yourself) why you don't do TDD, and that rationalisation involves lines of code or how often code will be revisited, then you are not talking about TDD. You are pretty much just setting up a strawman to tilt at to make yourself feel better. Hopefully that tactic will seem a little more transparent to you now.

Design your code. Measure your code.

Righto.

--
Adam

(*) that's a lie. It's obviously a retaliation to a coupla comments I've been getting on recent articles I've written about TDD.

Sunday 4 April 2021

TDD & professionalism: a brief follow-up to Thoughts on Working Code podcast's Testing episode

G'day:

Yer gonna need to go back and read the comments on Thoughts on Working Code podcast's Testing episode for context here. Especially as I quote a couple of them. I kinda left the comments run themselves there a bit and didn't reply to everyone as I didn't want to dominate the conversation. But one earlier comment that made me itchy, and now one comment that came in in the last week or so, have made me decide to - briefly - follow-up one point that I think warrants drawing attention to and building on.

Briefly, Working Code Pod did an episode on testing, and I got all surly about some of the things that were said, and wrote them up in the article I link to above. BTW Ben's reaction to my feedback in their follow-up episode ("Listener Questions #1) was the source of my current strapline quote: "the tone... it sounds very heated and abrasive". That should frame things nicely.

Right so in the comments to that previous article, we have these discussion fragments:

Sean Corfield - Heck, there's still a sizable portion that still doesn't use version control or has some whacked-out manual approach to "source code control".
Ben replied to that with - Yikes, I have trouble believing that there are developers _anywhere_ that don't use source-control in this day-and-age. That seems like "table stakes" for development. Period.
[off-screen, Adam bites his tongue]
Then Sean Hogge replied to the article rather than that particular comment thread. I'm gonna include a chunk of what he said here:

18 months ago, I was 100% Ben-shaped. Testing simply held little ROI. I have a dev server that's a perfect replica of production, with SSL and everything. I can just log in, open the dashboard, delete the cache and check things with a few clicks.

But as I started developing features that interact with other features, or that use the same API call in different ways, or present the same data with a partial or module with options, I started seeing an increase in production errors. I could feel myself scrambling more and more. When I stepped back and assessed objectively, tests were the only efficient answer.

After about 3 weeks of annoying, frustrating, angry work learning to write tests, every word of Adam C's blog post resonates with me. I am not good at it (yet), I am not fast at it (yet), but it is paying off exactly as he and those he references promised it would.

I recommend reading his entire comment, because it's bloody good.
Finally last week I listened to a YouTube video "Jim Coplien and Bob Martin Debate TDD", from which I extracted these two quotes from Martin that drew me back to this discussion:
- (@ 43sec) My thesis is that it has become infeasible […] for a software developer to consider himself professional if [(s)he] does not practice test-driven development.
- (@ 14min 42sec) Nowadays it is […] irresponsible for a developer to ship a line of code that [(s)he] has not executed any unit test [upon].. It's important to note that "nowadays" being 2012 in this case: that's when the video was from.
And, yes, OK the two quotes say much the same thing. I just wanted to emphasise the words "professional" and "irresponsible".

This I think is what Ben is missing. He shows incredulity that someone in 2021 would not use source control. People's reaction is going to be the same to his suggestion he doesn't put much focus on test-automatic, or practise TDD as a matter of course when he's designing his code. And Sean (Hogge) nails it for me.

(And not just Ben. I'm not ragging on him here, he's just the one providing the quote for me to start from).

TDD is not something to be framed in a context alongside other peripheral things one might pick up like this week's kewl JS framework, or Docker or some other piece of utility one might optionally use when solving a client's problem. It's lower level than that, so it's false equivalence to bracket it like that conceptually. Especially as a rationalisation for not addressing your shortcomings in this area.

Testing yer code and using TDD is as fundamental to your professional responsibilities as using source control. That's how one ought to contextualise this. Now I know plenty of people who I'd consider professional and strong pillars of my dev community who aren't as good as they could be with testing/TDD. So I think Martin's first quote is a bit strong. However I think his second quote nails it. If you're not doing TDD you are eroding your professionalism, and you are being professionalbly irresponsible by not addressing this.

In closing: thanks to everyone for putting the effort you did into commenting on that previous article. I really appreciate the conversation even if I didn't say thanks etc to everyone participating.

Righto.

--
Adam

About

I've been a web developer since 2001: 13yrs as a CFML developer; 6yrs as a PHP dev; and now back in the CFML community but focusing on code quality, design and testing. As of 2023, I am migrating my dev focus back to PHP again.

The code I write and discuss here is pretty much just looking at random conundrums I encounter in my day job.

I tend to be a bit "forthright" in my opinions, I am indelicate, and I tend to swear too much. This will come out occasionally here: I make no apology for it.

Everything said here is my own opinion. Feel free to disagree with me :-)