Tuesday, 7 October 2014

The received wisdom of TDD and private methods

G'day:
As you might know, I've recently taken on a different role as a PHP developer. My employer are shifting our code base from CFML to PHP for various reasons ("So long, and thanks for all the CF"). One facet of this is we're moving off a venerable, well-established CFML code base to a new code base in the infancy of its existence (some work on it had been done before we picked up the project). In a different language.

And we've got from having about 4000 unit tests to zero.



A coupla of us are having a quick exploratory look at PHPUnit to see what it can do, and whether there's any "gotchas" we need to be aware of when testing in PHP. So far the findings seem to be that unit testing in CFML via MXUnit and TestBox - especially in conjunction with MockBox - is a bit easier than the hoops we need to jump through in PHP to achieve the same ends. This is mostly down to CFML's objects being far more dynamic than PHP's, so injecting testing helper methods and exposing non-public methods for testing is much easier.

The challenges that PHP has thrown at us has caused us to revisit our TDD and unit testing strategy. Not necessarily to revise it, but to revisit it and see if it does need revision.

Our existing policy is that all code is written via a TDD & "Clean Code" approach:
  1. the need for a piece of functionality is identified;
  2. a failing test is written to test a facet of the functionality;
  3. code is written to pass the test;
  4. repeat from 2 until the functionality is complete;
  5. refactor if necessary.

This is applied both to new functionality as well as maintenance of existing functionality. The TDD side of things drives the code design, and also demonstrates that downstream changes don't have adverse effects on earlier requirements. The usual sort of thing.

On new work, this will mean creating a new public method (and the class it needs to go in, as required), and implementing the requirement. So all the tests are on that public method. When we refactor the code, those tests all still work, which is fine.

As a result of the refactoring we might end up with some of the code moving out into public methods of other classes, or - as often - private methods of the same class.

The newly refactored public methods go through the same TDD approach as the initial one, although this is quite often just a re-homing of the tests which were previously testing the unfactored functionality from the original public method. And the original public method's tests are selectively updated to remove any tests which are now the domain of the new methods, and mocks are used in lieu of calling these factored-out methods in the remaining tests of the original function.

And traditionally we have done exactly the same thing with the refactoring that was simply moved into private methods.

Perhaps I can demonstrate this with some pseudocode. At the end of our first TDD round, and before refactoring, we have this:



class SomeClass

    public method someMethod
        (1) some logic to derive first part of result
        (1) some logic to derive first part of result
        (1) some logic to derive first part of result
        
        (2) depending on something
            (2) we take this approach
            (2) we take this approach
            (2) we take this approach
        (3) else
            (3) or we take this other approach
            (3) or we take this other approach
            (3) or we take this other approach
        /depending
        
        (4) a number of times
            (4) do some functionality better homed elsewhere
            (4) do some functionality better homed elsewhere
            (4) do some functionality better homed elsewhere
        /times
        
        (5) massaging the remaining stuff
        (5) massaging the remaining stuff
        (5) massaging the remaining stuff
        
        (1) and return it
    /method

/class

class SomeClassTest
    test (1) (later updated to accommodate 5)

    test (2)

    test (3)

    test (4) (0 times)

    test (4) (1 time)

    test (4) (more times)
/class

Here the logic has been built sequentially as indicated with the numeric indexes, with obviously the test being written first. I was gonna display the code that way too, but it didn't make much sense.

After refactoring we might have this:

class SomeClass

    method someMethod
        (1) some logic to derive first part of result
        (1) some logic to derive first part of result
        (1) some logic to derive first part of result
        
        (2) depending on something
            (2)(6) call helperA()
        (3) else
            (3)(7) call helperB()
        /depending
        
        (8) call FirstNewClass.publicMethod
        
        (9) call SecondNewClass.publicMethod
        
        (1) and return it
    /method
    
    private method helperA
        (2)(6) we take this approach
        (2)(6) we take this approach
        (2)(6) we take this approach
    /method

    private method helperB
        (3)(7) or we take this other approach
        (3)(7) or we take this other approach
        (3)(7) or we take this other approach
    /method
    
/class


class FirstNewClass
    public method publicMethod
        (4)(8) a number of times
            (4)(8) do some functionality now homed elsewhere
            (4)(8) do some functionality now homed elsewhere
            (4)(8) do some functionality now homed elsewhere
        /times
    /method
/class


class SecondNewClass
    public method publicMethod
        (5)(9) massaging the remaining stuff
        (5)(9) massaging the remaining stuff
        (5)(9) massaging the remaining stuff
    /method
/class


class SomeClassTest
    test (1)

    test (2)

    test (3)
/class

class FirstNewClassTest
    test (4)(8) (0 times)

    test (4)(8) (1 time)

    test (4)(8) (more times)
/class

class SecondNewClassTest
    test (9)
/class

This is all basically just moving code around the place. Note that we do not have specific tests for helperA() and helperB() because the existing tests for (2) and (3) still cover them.

I don't think anyone would find anything here too controversial, and this is all fine.

However, and this is where there's a divergence in opinions, and I think most opinions here are wrong, and based on convenient use of bad logic.

TDD is a developmental approach right? One designs one's logic via deciding what it needs to do, and "proving" it does it via tests. With TDD, I think it's the dev process that is the important thing, and that we get unit test coverage as a result is a secondary side-effect. A good one, but a side-effect nevertheless. TDD is not a way of writing tests, it's a way of designing our code, basically.

OK, so the life of this software continues. And requirements change, and as a result - continuing this example - the realisation of that requirement is thus:

        (2)(6) we take this approach
        (2)(6) we take this approach
        (2)(6) we take this approach
        (10) but depending on something
                (10) we also now have to do this
        /but

And where's this logic going? into helperA(), which is a private method.

Following TDD - which is an approach to development, and "public" or "private" is neither here nor there - we test for "(10) but depending on something" before we implement it:

class SomeClassTest
    test (1)

    test (2)

    test (3)

    test (10) - happy

    test (10) - unhappy
/class


But the dogmatic position is "don't test private methods". The problem here is that this dogma is a unit testing dogma, not a TDD dogma.

The general argument I see on the net when the subject of testing private methods when using TDD is that people conveniently ignore the "when using TDD", and just parrot back "you don't unit test private methods, you unit test the public interface". This is a logical non sequitur.

So, in actual fact, the position on TDD vis-a-vis private methods should be along these lines:
  1. Do not refactor into private methods unless the public method has test coverage for the side effects of the code targetted for refactoring (this is a unit testing consideration).
  2. Happily refactor the code into private methods, relying on existing test coverage (again, a unit testing consideration).
  3. Existing code that needs revision should have test coverage for the revisions before the revisions are made (this is a TDD consideration, and where the code happens to be located is immaterial).

My problem is that there there's a wealth of discussion on this out there (Google: "tdd private methods"), and the general message is contrary to this (but, as I suggested, backed-up with faulty logic). If this was a one-on-one situation I'd be confident of my understanding of things. But as I kinda feel like Cnut vs The Tide, I wonder if I'm missing something here?

What are your thoughts?

--
Adam