G'day:
So this really sux.
For the bulk of the last decade I've been working alongside Brian Sadler. We first worked with each other when he joined hostelbookers.com when I'd been there for a coupla years, back in fuck-knows-when. I can recall thinking "shit... someone as old as I am, this should be interesting", in a room of devs (and managers) who were between 5-15 years younger than the both of us. Not that chronological experience necessarily means anything in any way that talent can be measured, but I had been used to being the oldest and most experienced geezer in the teams I'd been involved in. I've always been old, basically. So this new "Brian" guy seemed an interesting colleague to acquire.
My instinct was right.
[I've typed-in a paragraph of over-written shite four times now, and deleted the lot. I need to get to the point]
I have been a reasonably good programmer, technically. I'm OK with saying that.
What I've learned from Brian is that being a good developer is only a small part of being a good team operative. I've always known and respected that programming is an act of communicating with humans, not the computer, but Brian really drove this home to me.
He introduced me to the concept of Clean Code.
He introduced me to the concept of TDD.
He schooled me in various concepts of refactoring, in that latter stage of Red Green Refactor. I'm still reading Martin Fowler's "Refactoring" as a result.
Every time I am looking at a code problem and I know I am not quite getting it, I hit him up and say "right, come on then... what am I missing...?" and he'll pull out a design pattern, or some OOP concept I'd forgotten about, or just has the ability to "see" what I can't in a code conundrum, and explain it to me.
Every time I ask "how can we make this code easier for our team to work with in future?", Brian's had the answer.
For a few years Brian's has been our Agile advocate, and listening to him and following his guidance has been an immeasurable benefit to my capabilities and my work focus.
I've also seen Brian help everyone else in my immediate team; other teams; and our leaders.
He's probably the most significant force for good I have had in my career.
He's also a really fuckin' good mate, for a lot of reasons that are not relevant for this sort of environment.
For reasons that are also irrelevant to this blog, today was the last day for the time being that Brian and I will be working with each other, and I am genuinely sad about that.
Genuinely sad.
Thanks bro.
--
Adam
Showing posts with label Brian Sadler. Show all posts
Showing posts with label Brian Sadler. Show all posts
Friday, 3 April 2020
Saturday, 9 November 2019
I need to backtrack on some TDD guidance. I think
G'day:
Yes, this thing still exists.
Possibly foolishly my employer has made me tech lead of one of our software applications, and part of that is to lead from the front when it comes to maintaining code quality, including our TDD efforts. So I tend to pester the other devs about TDD techniques and the like. O how they love that. Some of them. Occasionally. Actually seldom. Poor bastards.
Anyway, recently I started to think about the difference between "fixing tests that would break if we made this new change to the code" and "writing a TDD-oriented test case before making the code change". Just to back up a bit... this is in the context of "write a failing test for the case being tested... then writing the code to pass the test". The red and green from the "red green refactor" trope. This started from doing code review and seeing existing tests being update to accommodate new code changes, but not see any actual new test case explicitly testing the new requirement. This did not sit well with me, so I've been asking the devs "where's yer actual test case for the change we're making here?"
To help with this notion I wrote a quick article on WTF I was on about. I was initially pleased with it and thought I'd nailed it, but when I asked my more-knowledgeable colleague Brian Sadler to sanity check it, he wasn't convinced I had nailed it, and moreover wasn't sure I was even right. Doh. I've been mulling over this for the rest of the week and I've come to the conclusion... yeah, he's right and I'm wrong. I think. I'm writing this article to put my thinking down in public, and see what you lot think. If anyone still reads this I mean. I will be chasing some ppl to read it, that said...
I can't just reproduce the original article cos it was an internal document and work owns it. So I'm reproducing a similar thing here.
Here's some sample code. I stress the code used here bears no relation to any code or situation we have in our codebase. It's just a simplified case to contextualise the discussion:
It gets raw data for the requested thing from a DAO and returns the data as a Thing object.
It also has an existing test:
And this all works fine:
If you were paying attention, you'll've spotted we're tightly coupling the DAO implementation here to its repository. At least we're using a factory, but still it is tightly coupled to the Repository implementation. We want to use dependency injection to provide the DAO to the repository.
Here's where my stance gets contentious.
Someone has addressed this requirement, and what they've done is updated that existing test to expect the passed-in DAO:
Note that this is exactly how I presented this information in the original work article. With the test implementation changes elided. This becomes relevant later.
Having prepped that test - and if fails nicely, cool - the dev then goes and sorts out the repo class.
I went on to reject that approach as not being valid TDD. My position is that "fixing other tests so they won't fail after we make the change" is not TDD, it's just technical debt recovery. If we're doing TDD, then we need to start with an explicit test case for the change we're going to make. I'll put this in a highlight box to draw attention to it:
Here we have an explicit test case for the change we're making. This is TDD. We're explicitly injecting a DAO into the test repo, and we're checking that DAO is being used to get the data.
This test is red:
But that means we're now good to make the changes to the repository:
Now our test is green, and we're done.
I rounded out the article with this:
Brian read the article and - my words not his - didn't get the distinction I was trying to make. Why was the first test update not sufficient? I could not articulate myself any differently than repeat what I'd said - no, not helpful - so we decided to get together later and look at how the implementation of the test I had elided would pan out, and how that wouldn't itself be a decent test. Due to work being busy this week we never had that confab, but I continued to think about it, and this morning concluded that Brian was right, and I was mistaken.
It comes down to how I omitted the implementation of that first test we updated. I did not do this on purpose, I just didn't want to show an implementation that was not helpful to the article. But in doing that, I never thought about what the implementation would be, and set-up a straw man to support my argument. I went back this morning and implemented it...
This is what I'd need to do to that first test to get it to not break when we change the DAO to being injected. And... erm... it's the same frickin' logic as in the test I said we need to explicitly make. I mean I wrote the two versions at different times so solved the same problem slightly differently, but it's the same thing for all intents and purposes.
So... let's revisit my original position then:
It's not intrinsically true. Actually fixing existing tests can indeed be precisely the TDD-oriented cover we need.
I love being wrong. No sarcsm there: I really do. Now I'm thinking better about this topic. Win.
The only thing left after the distillation is the difference between these two:
Original test name (detailing its original test case requirement):
And my suggested case name for the new injected-DAO requirement:
Now I do still think there's merit in being that explicit about what we're testing. But it's well into the territory of "perfect is the enemy of good". We're fine just updating that original test.
I thought of how to deal with this, and came up with this lunacy:
But I think that's too pedantic even for me. I think we can generally just take it as given that the existing test covers us.
I'll close by saying I do think this is good advice when starting to do the testing for a code change to first come up with the test case name. It gets one into the correct mindset for approaching things in a TDD manner. But one oughtn't be as dogmatic about this as I clearly tend to be... existing tests quite possible could provide the TDD-oriented cover we need for code changes. We do still need to start with a breaking test, and the test does actually need to cover the case for the change... but it might be an existing test.
Oh and making the code change then fixing the tests that break as a result is not TDD. That's still wrong ;-)
I'm dead keen to hear what anyone might have to say about all this...
Righto.
--
Adam
Yes, this thing still exists.
Possibly foolishly my employer has made me tech lead of one of our software applications, and part of that is to lead from the front when it comes to maintaining code quality, including our TDD efforts. So I tend to pester the other devs about TDD techniques and the like. O how they love that. Some of them. Occasionally. Actually seldom. Poor bastards.
Anyway, recently I started to think about the difference between "fixing tests that would break if we made this new change to the code" and "writing a TDD-oriented test case before making the code change". Just to back up a bit... this is in the context of "write a failing test for the case being tested... then writing the code to pass the test". The red and green from the "red green refactor" trope. This started from doing code review and seeing existing tests being update to accommodate new code changes, but not see any actual new test case explicitly testing the new requirement. This did not sit well with me, so I've been asking the devs "where's yer actual test case for the change we're making here?"
To help with this notion I wrote a quick article on WTF I was on about. I was initially pleased with it and thought I'd nailed it, but when I asked my more-knowledgeable colleague Brian Sadler to sanity check it, he wasn't convinced I had nailed it, and moreover wasn't sure I was even right. Doh. I've been mulling over this for the rest of the week and I've come to the conclusion... yeah, he's right and I'm wrong. I think. I'm writing this article to put my thinking down in public, and see what you lot think. If anyone still reads this I mean. I will be chasing some ppl to read it, that said...
I can't just reproduce the original article cos it was an internal document and work owns it. So I'm reproducing a similar thing here.
Here's some sample code. I stress the code used here bears no relation to any code or situation we have in our codebase. It's just a simplified case to contextualise the discussion:
class ThingRepository {
public function getThing($id) {
$dao = DaoFactory::getThingDao();
$dbData = $dao->getThing($id);
return Thing::createFromArray($dbData);
}
}
It gets raw data for the requested thing from a DAO and returns the data as a Thing object.
It also has an existing test:
class ThingRepositoryTest extends TestCase {
private $repository;
protected function setUp() : void {
$this->repository = new ThingRepository();
}
public function testGetThingReturnsCorrectThingForId(){
$testId = 17;
$expectedName = "Thing 1";
$thing = $this->repository->getThing(17);
$this->assertInstanceOf(Thing::class, $thing);
$this->assertSame($testId, $thing->id);
$this->assertSame($expectedName, $thing->name);
}
}
And this all works fine:
C:\src\php\general>vendor\bin\phpunit test\tdd\fixingExistingTests\ThingRepositoryTest.php
PHPUnit 8.4.3 by Sebastian Bergmann and contributors.
. 1 / 1 (100%)
Time: 54 ms, Memory: 4.00 MB
OK (1 test, 3 assertions)
C:\src\php\general>
PHPUnit 8.4.3 by Sebastian Bergmann and contributors.
. 1 / 1 (100%)
Time: 54 ms, Memory: 4.00 MB
OK (1 test, 3 assertions)
C:\src\php\general>
If you were paying attention, you'll've spotted we're tightly coupling the DAO implementation here to its repository. At least we're using a factory, but still it is tightly coupled to the Repository implementation. We want to use dependency injection to provide the DAO to the repository.
Here's where my stance gets contentious.
Someone has addressed this requirement, and what they've done is updated that existing test to expect the passed-in DAO:
class ThingRepositoryExistingTestUpdatedTest extends TestCase {
private $repository;
private $dao;
protected function setUp() : void {
$this->dao = $this->createMock(ThingDao::class);
$this->repository = new ThingRepositoryWithDaoDependency($this->dao);
}
public function testGetThingReturnsCorrectThingForId(){
// implementation that now uses mocked DAO
}
}
Note that this is exactly how I presented this information in the original work article. With the test implementation changes elided. This becomes relevant later.
Having prepped that test - and if fails nicely, cool - the dev then goes and sorts out the repo class.
I went on to reject that approach as not being valid TDD. My position is that "fixing other tests so they won't fail after we make the change" is not TDD, it's just technical debt recovery. If we're doing TDD, then we need to start with an explicit test case for the change we're going to make. I'll put this in a highlight box to draw attention to it:
Fixing other tests so they won't fail after we make the change is not TDD, it's just technical debt recovery. If we're doing TDD, then we need to start with an explicit test case for the change we're going to make.This is what I'd expect to see:
class ThingRepositoryTest extends TestCase {
private $repository;
private $dao;
protected function setUp() : void {
$this->dao = $this->createMock(ThingDao::class);
$this->repository = new ThingRepositoryWithDaoDependency($this->dao);
}
public function testGetThingUsesDaoPassedIntoConstructor(){
$testId = 17;
$mockedThingData = ['id'=>$testId, 'thing' => 'stuff'];
$this->dao
->expects($this->once())
->method('getThing')
->with($testId)
->willReturn($mockedThingData);
$result = $this->repository->getThing($testId);
$this->assertEquals(
new Thing($mockedThingData['id'], $mockedThingData['thing']),
$result
)
}
// ... rest of test cases that were already there, including this one...
public function testGetThingReturnsCorrectThingForID(){
// updated implementation that doesn't break because of the DAO change
}
}
Here we have an explicit test case for the change we're making. This is TDD. We're explicitly injecting a DAO into the test repo, and we're checking that DAO is being used to get the data.
This test is red:
C:\src\php\general>vendor\bin\phpunit test\tdd\fixingExistingTests\ThingRepositoryTest.php
PHPUnit 8.4.3 by Sebastian Bergmann and contributors.
F 1 / 1 (100%)
Time: 61 ms, Memory: 4.00 MB
There was 1 failure:
1) me\adamcameron\general\test\tdd\fixingExistingTests\ThingRepositoryTest::testGetThingUsesDaoPassedIntoConstructor
Expectation failed for method name is "getThing" when invoked 1 time(s).
Method was expected to be called 1 times, actually called 0 times.
FAILURES!
Tests: 1, Assertions: 4, Failures: 1.
C:\src\php\general>
PHPUnit 8.4.3 by Sebastian Bergmann and contributors.
F 1 / 1 (100%)
Time: 61 ms, Memory: 4.00 MB
There was 1 failure:
1) me\adamcameron\general\test\tdd\fixingExistingTests\ThingRepositoryTest::testGetThingUsesDaoPassedIntoConstructor
Expectation failed for method name is "getThing" when invoked 1 time(s).
Method was expected to be called 1 times, actually called 0 times.
FAILURES!
Tests: 1, Assertions: 4, Failures: 1.
C:\src\php\general>
But that means we're now good to make the changes to the repository:
class ThingRepository {
private $dao;
public function __construct($dao) {
$this->dao = $dao;
}
public function getThing($id){
$dbData = $this->dao->getThing($id);
return Thing::createFromArray($dbData);
}
}
Now our test is green, and we're done.
I rounded out the article with this:
As a rule of thumb... if you don't start your TDD work by typing in a case name - eg testGetThingUsesDaoPassedIntoConstructor - then you're not doing TDD, you are just doing maintenance code. These are... not the same thing.
Brian read the article and - my words not his - didn't get the distinction I was trying to make. Why was the first test update not sufficient? I could not articulate myself any differently than repeat what I'd said - no, not helpful - so we decided to get together later and look at how the implementation of the test I had elided would pan out, and how that wouldn't itself be a decent test. Due to work being busy this week we never had that confab, but I continued to think about it, and this morning concluded that Brian was right, and I was mistaken.
It comes down to how I omitted the implementation of that first test we updated. I did not do this on purpose, I just didn't want to show an implementation that was not helpful to the article. But in doing that, I never thought about what the implementation would be, and set-up a straw man to support my argument. I went back this morning and implemented it...
public function testGetThingReturnsCorrectThingForId(){
$testId = 17;
$expectedName = "Thing 1";
$this->dao->expects($this->once())
->method('getThing')
->with($testId)
->willReturn(['id' => $testId, 'name' => $expectedName]);
$thing = $this->repository->getThing(17);
$this->assertInstanceOf(Thing::class, $thing);
$this->assertSame($testId, $thing->id);
$this->assertSame($expectedName, $thing->name);
}
This is what I'd need to do to that first test to get it to not break when we change the DAO to being injected. And... erm... it's the same frickin' logic as in the test I said we need to explicitly make. I mean I wrote the two versions at different times so solved the same problem slightly differently, but it's the same thing for all intents and purposes.
So... let's revisit my original position then:
Fixing other tests so they won't fail after we make the change is not TDD, it's just technical debt recovery. If we're doing TDD, then we need to start with an explicit test case for the change we're going to make.
It's not intrinsically true. Actually fixing existing tests can indeed be precisely the TDD-oriented cover we need.
I love being wrong. No sarcsm there: I really do. Now I'm thinking better about this topic. Win.
The only thing left after the distillation is the difference between these two:
Original test name (detailing its original test case requirement):
testGetThingReturnsCorrectThingForId
And my suggested case name for the new injected-DAO requirement:
testGetThingUsesDaoPassedIntoConstructor
Now I do still think there's merit in being that explicit about what we're testing. But it's well into the territory of "perfect is the enemy of good". We're fine just updating that original test.
I thought of how to deal with this, and came up with this lunacy:
public function testGetThingUsesDaoPassedIntoConstructor()
{
$this->testGetThingReturnsCorrectThingForId();
}
But I think that's too pedantic even for me. I think we can generally just take it as given that the existing test covers us.
I'll close by saying I do think this is good advice when starting to do the testing for a code change to first come up with the test case name. It gets one into the correct mindset for approaching things in a TDD manner. But one oughtn't be as dogmatic about this as I clearly tend to be... existing tests quite possible could provide the TDD-oriented cover we need for code changes. We do still need to start with a breaking test, and the test does actually need to cover the case for the change... but it might be an existing test.
Oh and making the code change then fixing the tests that break as a result is not TDD. That's still wrong ;-)
I'm dead keen to hear what anyone might have to say about all this...
Righto.
--
Adam
Labels:
Brian Sadler,
TDD,
Unit Testing
Friday, 14 April 2017
I actively consider what I think ought to go into a controller
G'day:
This is one of these ones where I state the obvious (probably), and expose myself as a fraud in that it's taken me this long to reach this conclusion. I can only take solace in that I'm clearly in "good" company. Where "good" == "a lot".
Yesterday I was going a code review, and came across a controller method which made one of my eyebrows raise as it was about 40 lines long. Now this isn't intrinsically bad in context. I have seen (and still see!) controller methods that are hundreds and hundreds of lines long. In the past I've seen controllers tipping the 1000-line mark. Those examples are clearly just wrong. And the developers who wrote them and/or continue to maintain them should be shot. I'm not even sure I'm over-stating it with that sentence. Hmmm... I probably am I guess. But they should be squarely and actively admonished, anyhow.
So in the scheme of things, 40 lines ain't bad. Especially when we try to keep our code wrapped within 80-100chars, so a few of the statements were multi-line. There was about say a dozen statements in it altogether. Not so bad.
Before I go on - and not only because the dev concerned will be reading this article (once I send them the link... pretty sure they don't know this blog exists) - I hasten to add that this code is not by any means bad. I'm probably gonna go "close enough" on the code review, once one bit of the code is refactored at least into a private method. I am not having a go at the dev concerned here. They are just my muse for the day. And I happen to think they're pretty good at their job.
There was one bit of the controller that stood out to me and I went "nuh-uh, that's gotta go". There was a foreach loop just before the response was returned which performed a calculation. That's business logic and it has no place in a controller. Fortunately we have an appropriate service class which we can lift that out and re-home in, and slap some tests on it, and our code is a bit better and a bit more maintainable.
(NB: yeah... no tests... we have a policy of not testing controller methods, but that's with the caveat there's no logic in them. We followed the "don't test controller methods" bit, but not the "provided there's no logic in them" bit here).
However once I spot a problem, I immediately start looking for more, so I assessed the rest of the code there. The controller did seem too fat. I wanted to know if all the logic did actually belong in the controller, and my instinct was: probably not.
At the same time as writing this article I'm chatting on Slack about it, and me mate Mingo offered the perennial answer to the question "should this go in the controller?"
(yes, he cited the attribution too, so I'm including that ;-)
It's pleasingly pithy, and almost ubiquitously agreed with; but in a casually dismissive way that is seldom heeded, I think. The problem is it's alluding to a mindset one should adopt as a guideline, but clearly the answer isn't "no", so I think people dismiss it. If one says "one should never do x", and it's clear that "never" is slight hyperbole, people will acknowledge the hyperbole and stretch it back the other way as far as they see fit to accommodate their current whim.
With this in mind (retroactively... Mingo just mentioned that now, and the code review was yesterday, and this article was drafted in my head last night), I stopped to think about a rule of thumb as to when something ought to go in a controller.
Broadly speaking I feel there are only three types of logic that belong in a controller:
Yeah that's 4. I was not counting the return statement. An example of this might be:
I think that's OK, even though there's four service calls. All the logic is busying itself with what it is to fulfil that request. The controller simply dictates what data needs returning, and gets that data. there is no business logic in there. It's all controller logic.
A pseudo-code example where we're adding business logic into this might be:
Here the slice call - even though it's a native array method - is still business logic. If the controller method was getTopFiveArticles or something, I'd go back to considering it controller logic. So it's a fine line.
Back to the code i was reviewing, it was doing 1&2, then it was going off-piste with that foreach loop, then doing 3&4. Ignoring the foreach, I was still itchy, but wasn't seeing why. It seemed like it was roughly analogous to my first example above.
So what's the problem? Why was the code still making me furrow my brow?
Well there was just too much code. That set-off a warning bell. I wanted to reduce the number of calls to various services (or repositories as is the case here: data from slightly different sources), but initially I couldn't really justify it. Then I looked closer.
The code in question was more like this (I've got rid of the date args to focus more):
I hasten to add the code I was reviewing is nothing to do with users and content and social media stuff, this is just an example which is analogous but would make more sense to people not in the context of our business.
The difference is subtle. The difference is that the decision that the IDs for the social content are stored in the user object is business logic. It's not controller logic to "know" that's how that data is stored. the user object here is not part of the data that composes the response; it's part of some business logic that dictates how to get the data.
And this does indeed break that second rule:
The call to get the social media content is not using those arguments. It's using a value from a separate call:
Another tell tale here is that user is not even part of the response. It's just used to get those "foreign keys".
My rule of thumb in those four points above is still correct. But one needs to pay special attention to the second point. Any time one's inclined to fetch data "indirectly", the indirection is business logic. It might not be a loop or some conditional or switch block or something more obvious that manipulates the data before it's returned, but that indirection is still business logic. And does not belong a the controller.
So how should that controller look? Move the content acquisition out into a service that has the explicit purpose of getting a user's content - from wherever - via some key:
This follows the second point to the letter, and there's now no business logic in the controller. Also the controller legitimately does not really need unit testing now, but the logic in the ContentService can be. Also the controller method is down to a size that would not raise any eyebrows.
This is not an earth-shattering conclusion by any measure, but I'm quite pleased to fine-tune my position here, and hopefully it can help drive discussion about this in code reviews in future.
It'd be great to hear anyone else's thoughts on this?
--
Adam
This is one of these ones where I state the obvious (probably), and expose myself as a fraud in that it's taken me this long to reach this conclusion. I can only take solace in that I'm clearly in "good" company. Where "good" == "a lot".
Yesterday I was going a code review, and came across a controller method which made one of my eyebrows raise as it was about 40 lines long. Now this isn't intrinsically bad in context. I have seen (and still see!) controller methods that are hundreds and hundreds of lines long. In the past I've seen controllers tipping the 1000-line mark. Those examples are clearly just wrong. And the developers who wrote them and/or continue to maintain them should be shot. I'm not even sure I'm over-stating it with that sentence. Hmmm... I probably am I guess. But they should be squarely and actively admonished, anyhow.
So in the scheme of things, 40 lines ain't bad. Especially when we try to keep our code wrapped within 80-100chars, so a few of the statements were multi-line. There was about say a dozen statements in it altogether. Not so bad.
Before I go on - and not only because the dev concerned will be reading this article (once I send them the link... pretty sure they don't know this blog exists) - I hasten to add that this code is not by any means bad. I'm probably gonna go "close enough" on the code review, once one bit of the code is refactored at least into a private method. I am not having a go at the dev concerned here. They are just my muse for the day. And I happen to think they're pretty good at their job.
There was one bit of the controller that stood out to me and I went "nuh-uh, that's gotta go". There was a foreach loop just before the response was returned which performed a calculation. That's business logic and it has no place in a controller. Fortunately we have an appropriate service class which we can lift that out and re-home in, and slap some tests on it, and our code is a bit better and a bit more maintainable.
(NB: yeah... no tests... we have a policy of not testing controller methods, but that's with the caveat there's no logic in them. We followed the "don't test controller methods" bit, but not the "provided there's no logic in them" bit here).
However once I spot a problem, I immediately start looking for more, so I assessed the rest of the code there. The controller did seem too fat. I wanted to know if all the logic did actually belong in the controller, and my instinct was: probably not.
At the same time as writing this article I'm chatting on Slack about it, and me mate Mingo offered the perennial answer to the question "should this go in the controller?"
Isn't the answer always "No"?~ mjhagen 2017
(yes, he cited the attribution too, so I'm including that ;-)
It's pleasingly pithy, and almost ubiquitously agreed with; but in a casually dismissive way that is seldom heeded, I think. The problem is it's alluding to a mindset one should adopt as a guideline, but clearly the answer isn't "no", so I think people dismiss it. If one says "one should never do x", and it's clear that "never" is slight hyperbole, people will acknowledge the hyperbole and stretch it back the other way as far as they see fit to accommodate their current whim.
With this in mind (retroactively... Mingo just mentioned that now, and the code review was yesterday, and this article was drafted in my head last night), I stopped to think about a rule of thumb as to when something ought to go in a controller.
Broadly speaking I feel there are only three types of logic that belong in a controller:
- a call to validate / sanitise all incoming arguments so they're fit for purpose for fulfilling the request;
- one (preferably) or more (and increasingly less preferably the more there are) calls to get data - using those incoming arguments - to build the response;
- a call to assemble that data for the response;
- returning the response.
Yeah that's 4. I was not counting the return statement. An example of this might be:
class UserController {
function handleGet(rawArgs){
args = validationService.validate(rawArgs)
user = userService.get(args.id)
publishedContent = contentService.getForId(args.id, args.startDate, args.endDate)
twitterContent = twitterService.getForId(args.id, args.startDate, args.endDate)
facebookContent = facebookService.getForId(args.id, args.startDate, args.endDate)
response = {
user = user,
publishedContent = publishedContent,
socialContent = {
twitter = twitterContent,
facebook = facebookContent
}
}
return new Response(response)
}
}
I think that's OK, even though there's four service calls. All the logic is busying itself with what it is to fulfil that request. The controller simply dictates what data needs returning, and gets that data. there is no business logic in there. It's all controller logic.
A pseudo-code example where we're adding business logic into this might be:
// ...
publishedContent = contentService.getForId(args.id, args.startDate, args.endDate)
topFiveArticles = publishedContent.slice(5)
// ...
response = {
// ...
articles = topFiveArticles,
// ...
}
Here the slice call - even though it's a native array method - is still business logic. If the controller method was getTopFiveArticles or something, I'd go back to considering it controller logic. So it's a fine line.
Back to the code i was reviewing, it was doing 1&2, then it was going off-piste with that foreach loop, then doing 3&4. Ignoring the foreach, I was still itchy, but wasn't seeing why. It seemed like it was roughly analogous to my first example above.
So what's the problem? Why was the code still making me furrow my brow?
Well there was just too much code. That set-off a warning bell. I wanted to reduce the number of calls to various services (or repositories as is the case here: data from slightly different sources), but initially I couldn't really justify it. Then I looked closer.
The code in question was more like this (I've got rid of the date args to focus more):
class ContentController {
function handleGet(rawArgs){
args = validationService.validate(rawArgs)
user = userService.get(args.id)
publishedContent = contentService.getForId(args.id)
twitterContent = twitterService.getForId(user.twitterId)
facebookContent = facebookService.getForId(user.facebookId)
response = {
articles = publishedContent,
socialContent = {
twitter = twitterContent,
facebook = facebookContent
}
}
return new Response(response)
}
}
I hasten to add the code I was reviewing is nothing to do with users and content and social media stuff, this is just an example which is analogous but would make more sense to people not in the context of our business.
The difference is subtle. The difference is that the decision that the IDs for the social content are stored in the user object is business logic. It's not controller logic to "know" that's how that data is stored. the user object here is not part of the data that composes the response; it's part of some business logic that dictates how to get the data.
And this does indeed break that second rule:
- a call to validate / sanitise all incoming arguments so they're fit for purpose for fulfilling the request;
- [...] calls to get data - using those incoming arguments - to build the response;
The call to get the social media content is not using those arguments. It's using a value from a separate call:
user = userService.get(args.id)
// ...
twitterContent = twitterService.getForId(user.twitterId)
facebookContent = facebookService.getForId(user.facebookId)
Another tell tale here is that user is not even part of the response. It's just used to get those "foreign keys".
My rule of thumb in those four points above is still correct. But one needs to pay special attention to the second point. Any time one's inclined to fetch data "indirectly", the indirection is business logic. It might not be a loop or some conditional or switch block or something more obvious that manipulates the data before it's returned, but that indirection is still business logic. And does not belong a the controller.
So how should that controller look? Move the content acquisition out into a service that has the explicit purpose of getting a user's content - from wherever - via some key:
class ContentController {
function handleGet(rawArgs){
args = validationService.validate(rawArgs)
content = contentService.getById(args.id)
response = {
articles = content.published,
socialContent = {
twitter = content.twitter,
facebook = content.facebook
}
}
return new Response(response)
}
}
This follows the second point to the letter, and there's now no business logic in the controller. Also the controller legitimately does not really need unit testing now, but the logic in the ContentService can be. Also the controller method is down to a size that would not raise any eyebrows.
This is not an earth-shattering conclusion by any measure, but I'm quite pleased to fine-tune my position here, and hopefully it can help drive discussion about this in code reviews in future.
It'd be great to hear anyone else's thoughts on this?
--
Adam
Labels:
Brian Sadler,
Mingo Hagen
Tuesday, 7 February 2017
TDD: still testing too much
G'day:
A while back I wrote this article - Maintaining untested code, and mocking too much when backfilling test (with JavaScript & Jasmine) - wherein my colleague pulled me up for testing too much when I'm doing my TDD. I'm working with PHP this time, and I've just been pulled up on it again. For much the same reason. It created an interesting thought exercise... well interetsing ish... so I'll write it up here.
Firstly, I cannot use the actual code I am working with due to the usual over-baked "commercial sensitivity" considerations, so this is simply an analogy, but it captures the gist of what I was doing.
We have an adapter that handles HTTP calls. It's basically a generic wrapper that masks the API differences between Guzzle 5 and Guzzle 6, both of which we are still using, but we want a unified adapter to work with both. It's very generic, and basically has methods for GET, POST, PUT etc, and doesn't encompass any of our business logic.
Business reality is that we use this mostly to transmit JSONified objects, so we're adding a layer in front of it to take the raw text responses and rehydrate them. I'm writing the GET method.
When I start designing my solution, I know that the happy path will return me a Response object that has JSON as its content. I just need to deserialise the JSON and return it. This is easy enough:
mockAdapterWithResponse just does this:
(there are other tests that already needed this). All it's doing is enabling us to set what the content for the call to the HttpClient will return, because we don't care how the client goes about doing this; we just care what we do with the content afterwards.
So back to my test code:
All I'm doing here is mocking the adapter to return a predetermined result, and I then test that what comes back from my function is the deserialisation of that value. That's it. I then implement some code to make that test pass:
And that's green: all good.
I've already done too much in my test, but have not noticed (and not put it in for code review, so no-one's pointed it out).
The next requirement for this method is that if the content coming back from the adapter isn't JSON, then just throw an exception instead of trying to deserialise it. Again, pretty easy to TDD that:
Note there's not even any explicit assertions in there: the @expectedException annotation takes care of it.
This test fails at present: no exception is thrown thanks to PHP's stupid-arse behaviour if json_decode is given non-JSON: it just returns null. Bravo, PHP. But that's what we want... we've designed our code and have tests to prove the design, now we need to implement the code:
The tests are green, so all good.
But what have I done wrong? Nothing major, but once again I'm testing too much. Actually my tests are fine. It's my helper method that's "wrong":
Why am I testing that those methods are called once? It is completely irrelevant to what I'm testing how many times they're called.
What got into my head is I was being too dogmatic when I was writing my code, and I also cheated a bit with the TDD. Instead of designing the test variation first, and then implemented the code to pass the test, I was doing both at the same time. And it shows from my test implementation. Basically I didn't design my solution first, I designed it as I went, so - whilst tricking myself into thinking I was doing TDD - I was asking myself "right, what does my code need to do? It needs to call get on the adapter. OK, so I'll test I call it. What next... I need to get the content from the response. Right: I'll make sure that getContent gets called". All very stolid and admirable, but: wrong. All this is doing is testing my code (almost line by line!). TDD is not about testing my code; it's about testing my design. And it's the design and behaviour of the API of the method I'm testing. As I've said before, it'd be excellent it TDD was retconned to mean "Test Driven Design", not "Test Driven Development". The former is closer to the intent of the exercise.
I don't need to test those methods are called once: there's no logic in my function that will vary how many times they might be called, so they will be called once. We can trust PHP to do what we tell it to do by giving it source code.
The logic flaw here was demonstrated by my code reviewer observed: if you think you need to check how many times that functionality is called, why are you also not concerned with how many times json_decode is called too? It's an important part of your functionality which definitely needs to be called. Ha... yeah "duh", I thought. This drove home I was testing too much.
Another thing is that I was getting too distracted by what my implementation was whilst writing my test. The requirements of that function are:
That first step is implementation detail from the POV of testing. What we're testing with TDD is the outcome of the function call. Which is: an object, or an exception. We're also only supposed to be testing the logic in the function the test is actually calling. I emphasise "logic" there because we're not testing the code. TDD is not about that. We're testing the logic. This is why we have two test variations: one wherein the logic to throw the exception is not true, so we get our object back; another when that if condition is true, so we get the exception.
We only need to test logic branching code (conditionals and loops, basically).
We do not need to prove every single statement is executed.
I was reminded a rule of thumb to apply when considering whether to use expects or not. Basically it boils down to if you have test variations which might result in a dependency being called a different amount of times, and that variation is germane to the result, then verify the call count.
Another good point made by my other colleague Brian was that when mocking stuff... only mock the bare minimum to get the test passing. Then stop. And that's probably the right place to leave it be.
An example of this might be if we had this logic in our method:
Logging is an optional outcome of that method call, so we might want to test that the logger definitely gets called once in that second test. We could also be belt-and-braces about it and test it's not called at all in the happy-path test.
Or let's say the code was a caching proxy and one is testing the uncached and cached variations:
In the uncached variation, the data is not found in the cache, so we want to make sure that the DAO is called to get the data from the DB. In the cached variation, we want to make sure the DAO isn't called. Even here, in the first example we're not really testing the DAO call was made specifically; we're testing the logic around the cache check didn't trigger the early return. We're testing the if statement, not the DAO call.
Bottom line, I think that if I find myself starting to do a call count on a mocked method, I need to stop to think about whether I will have cases where the expected call count will be different, and only then ought I even be bothering if the call-count variation is important to the outcome of the function call. If I'm only checking for one call count value... I'm doing it wrong.
And now rather than talking about my day job... I should actually be doing it...
Righto.
--
Adam
A while back I wrote this article - Maintaining untested code, and mocking too much when backfilling test (with JavaScript & Jasmine) - wherein my colleague pulled me up for testing too much when I'm doing my TDD. I'm working with PHP this time, and I've just been pulled up on it again. For much the same reason. It created an interesting thought exercise... well interetsing ish... so I'll write it up here.
Firstly, I cannot use the actual code I am working with due to the usual over-baked "commercial sensitivity" considerations, so this is simply an analogy, but it captures the gist of what I was doing.
We have an adapter that handles HTTP calls. It's basically a generic wrapper that masks the API differences between Guzzle 5 and Guzzle 6, both of which we are still using, but we want a unified adapter to work with both. It's very generic, and basically has methods for GET, POST, PUT etc, and doesn't encompass any of our business logic.
Business reality is that we use this mostly to transmit JSONified objects, so we're adding a layer in front of it to take the raw text responses and rehydrate them. I'm writing the GET method.
When I start designing my solution, I know that the happy path will return me a Response object that has JSON as its content. I just need to deserialise the JSON and return it. This is easy enough:
public function testGetAsObject() {
$expectedResult = ['result' => 'valid JSON'];
$expectedContent = json_encode($expectedResult);
$adapter = $this->mockAdapterWithResponse(200, $expectedContent);
$client = new HttpClient($adapter);
$result = $client->getWithJson($this->uri);
$this->assertEquals($expectedResult, $result);
}
mockAdapterWithResponse just does this:
private function mockAdapterWithResponse($statusCode, $content){
$response = $this->getMockBuilder(Response::class)
->setMethods(['getStatusCode', 'getContent'])
->getMock();
$response->expects($this->once())
->method('getStatusCode')
->willReturn(statusCode);
$response->expects($this->once())
->method('getContent')
->willReturn(content);
$adapter = $this->getMockBuilder(Adapter::class)
->setMethods(['get'])
->getMock();
$adapter->expects($this->once())
->method('get')
->willReturn($this->response);
return $adapter;
}
(there are other tests that already needed this). All it's doing is enabling us to set what the content for the call to the HttpClient will return, because we don't care how the client goes about doing this; we just care what we do with the content afterwards.
So back to my test code:
public function testGetAsObject() {
$expectedResult = ['result' => 'valid JSON'];
$expectedContent = json_encode($expectedResult);
$adapter = $this->mockAdapterWithResponse(200, $expectedContent);
$client = new HttpClient($adapter);
$result = $client->getWithJson($this->uri);
$this->assertEquals($expectedResult, $result);
}
All I'm doing here is mocking the adapter to return a predetermined result, and I then test that what comes back from my function is the deserialisation of that value. That's it. I then implement some code to make that test pass:
public function getAsObject($uri, $params = [], $headers = []) {
$response = $this->get($uri, $params, $headers);
$object = json_decode($response->getContent(), true);
return $object;
}
And that's green: all good.
I've already done too much in my test, but have not noticed (and not put it in for code review, so no-one's pointed it out).
The next requirement for this method is that if the content coming back from the adapter isn't JSON, then just throw an exception instead of trying to deserialise it. Again, pretty easy to TDD that:
/** @expectedException \Exception */
public function testGetAsObjectThrowsExceptionOnNonJsonContent() {
$content = 'NOT_VALID_JSON';
$adapter = $this->mockAdapterWithResponse(200, $content);
$client = new HttpClient($adapter);
$client->getWithJson($this->uri);
}
Note there's not even any explicit assertions in there: the @expectedException annotation takes care of it.
This test fails at present: no exception is thrown thanks to PHP's stupid-arse behaviour if json_decode is given non-JSON: it just returns null. Bravo, PHP. But that's what we want... we've designed our code and have tests to prove the design, now we need to implement the code:
public function getAsObject($uri, $params = [], $headers = []) {
$response = $this->get($uri, $params, $headers);
$object = json_decode($response->getContent(), true);
if (is_null($object)) {
throw new \Exception('Unable to decode JSON result');
}
return $object;
}
The tests are green, so all good.
But what have I done wrong? Nothing major, but once again I'm testing too much. Actually my tests are fine. It's my helper method that's "wrong":
private function mockAdapterWithResponse($statusCode, $content){
$response = $this->getMockBuilder(Response::class)
->setMethods(['getStatusCode', 'getContent'])
->getMock();
$response->expects($this->once())
->method('getStatusCode')
->willReturn(statusCode);
$response->expects($this->once())
->method('getContent')
->willReturn(content);
$adapter = $this->getMockBuilder(Adapter::class)
->setMethods(['get'])
->getMock();
$adapter->expects($this->once())
->method('get')
->willReturn($this->response);
return $adapter;
}
Why am I testing that those methods are called once? It is completely irrelevant to what I'm testing how many times they're called.
What got into my head is I was being too dogmatic when I was writing my code, and I also cheated a bit with the TDD. Instead of designing the test variation first, and then implemented the code to pass the test, I was doing both at the same time. And it shows from my test implementation. Basically I didn't design my solution first, I designed it as I went, so - whilst tricking myself into thinking I was doing TDD - I was asking myself "right, what does my code need to do? It needs to call get on the adapter. OK, so I'll test I call it. What next... I need to get the content from the response. Right: I'll make sure that getContent gets called". All very stolid and admirable, but: wrong. All this is doing is testing my code (almost line by line!). TDD is not about testing my code; it's about testing my design. And it's the design and behaviour of the API of the method I'm testing. As I've said before, it'd be excellent it TDD was retconned to mean "Test Driven Design", not "Test Driven Development". The former is closer to the intent of the exercise.
I don't need to test those methods are called once: there's no logic in my function that will vary how many times they might be called, so they will be called once. We can trust PHP to do what we tell it to do by giving it source code.
The logic flaw here was demonstrated by my code reviewer observed: if you think you need to check how many times that functionality is called, why are you also not concerned with how many times json_decode is called too? It's an important part of your functionality which definitely needs to be called. Ha... yeah "duh", I thought. This drove home I was testing too much.
Another thing is that I was getting too distracted by what my implementation was whilst writing my test. The requirements of that function are:
- make a request, the response of which might be JSON:
- if it is: return the deserialised content;
- if it's not: raise an exception.
That first step is implementation detail from the POV of testing. What we're testing with TDD is the outcome of the function call. Which is: an object, or an exception. We're also only supposed to be testing the logic in the function the test is actually calling. I emphasise "logic" there because we're not testing the code. TDD is not about that. We're testing the logic. This is why we have two test variations: one wherein the logic to throw the exception is not true, so we get our object back; another when that if condition is true, so we get the exception.
We only need to test logic branching code (conditionals and loops, basically).
We do not need to prove every single statement is executed.
I was reminded a rule of thumb to apply when considering whether to use expects or not. Basically it boils down to if you have test variations which might result in a dependency being called a different amount of times, and that variation is germane to the result, then verify the call count.
Another good point made by my other colleague Brian was that when mocking stuff... only mock the bare minimum to get the test passing. Then stop. And that's probably the right place to leave it be.
An example of this might be if we had this logic in our method:
public function getAsObject($uri, $params = [], $headers = []) {
$response = $this->get($uri, $params, $headers);
$object = json_decode($response->getContent(), true);
if (is_null($object)) {
$this->logger("Bung content returned: $object");
throw new \Exception('Unable to decode JSON result');
}
return $object;
}
Logging is an optional outcome of that method call, so we might want to test that the logger definitely gets called once in that second test. We could also be belt-and-braces about it and test it's not called at all in the happy-path test.
Or let's say the code was a caching proxy and one is testing the uncached and cached variations:
function get($id){
if ($this->cache->exists($id)){
return $this->cache->get($id);
}
return $this->dao->get($id);
}
In the uncached variation, the data is not found in the cache, so we want to make sure that the DAO is called to get the data from the DB. In the cached variation, we want to make sure the DAO isn't called. Even here, in the first example we're not really testing the DAO call was made specifically; we're testing the logic around the cache check didn't trigger the early return. We're testing the if statement, not the DAO call.
Bottom line, I think that if I find myself starting to do a call count on a mocked method, I need to stop to think about whether I will have cases where the expected call count will be different, and only then ought I even be bothering if the call-count variation is important to the outcome of the function call. If I'm only checking for one call count value... I'm doing it wrong.
And now rather than talking about my day job... I should actually be doing it...
Righto.
--
Adam
Sunday, 1 May 2016
PHP: Scaling the decorator pattern
G'day:
I while back I had a look at using the Decorator Pattern to simplify some code:
One again me mate Brian has come to the fore with a tweak to make this tactic an even more appealing prospect: leveraging PHP's "magic"
Let's have a look at one of the examples I used earlier:
That looks like a chunk of code, but the User, DataSource and LoggerService are just dependencies. The code you really wannna look at are the two repo variations: the basic UserRepository, and the decorated LoggedUserRepository.
This code all works fine, and outputs:
Fine.
But this is a very simple example: the repo has only one one method. So the decorator only needs to implement one method. But what if the repo has ten public methods? Suddenly the decorator is getting rather busy, as it needs to implement wrappers for those methods too. Yikes. This is made even worse if the decorator is only really interested in decorating one method... it still needs those other nine wrappers. And it'd be getting very boiler-plate-ish to have to implement all these "empty" wrapper methods.
Let's add a coupla more methods to our repo:
We have two new methods:
See how we still need methods for
I while back I had a look at using the Decorator Pattern to simplify some code:
- Using a decorator pattern to reduce inappropriate code complexity
- Decorator Pattern vs simple inheritance
One again me mate Brian has come to the fore with a tweak to make this tactic an even more appealing prospect: leveraging PHP's "magic"
__call
method to simplify decorator code.Let's have a look at one of the examples I used earlier:
class User {
public $id;
public $firstName;
public $lastName;
function __construct($id, $firstName, $lastName){
$this->id = $id;
$this->firstName = $firstName;
$this->lastName = $lastName;
}
}
class DataSource {
function getById($id){
return json_encode([
'id' => $id,
'firstName' => 'Zachary',
'lastName' => 'Cameron Lynch',
]);
}
}
class LoggerService {
function logText($text){
echo "LOGGED: $text" . PHP_EOL;
}
}
class UserRepository {
private $dataSource;
public function __construct($dataSource) {
$this->dataSource = $dataSource;
}
public function getById($id) {
$userAsJson = $this->dataSource->getById($id);
$rawUser = json_decode($userAsJson);
$user = new User($rawUser->id, $rawUser->firstName, $rawUser->lastName);
return $user;
}
}
class LoggedUserRepository {
private $repository;
private $loggerService;
public function __construct($repository, $loggerService) {
$this->repository = $repository;
$this->loggerService = $loggerService;
}
public function getById($id) {
$this->loggerService->logText("$id requested");
$object = $this->repository->getById($id);
return $object;
}
}
$dataSource = new DataSource();
$userRepository = new UserRepository($dataSource);
$loggerService = new LoggerService();
$loggedUserRepository = new LoggedUserRepository($userRepository, $loggerService);
$user = $loggedUserRepository->getById(5);
var_dump($user);
That looks like a chunk of code, but the User, DataSource and LoggerService are just dependencies. The code you really wannna look at are the two repo variations: the basic UserRepository, and the decorated LoggedUserRepository.
This code all works fine, and outputs:
C:\src>php baseline.php
LOGGED: 5 requested
object(User)#6 (3) {
["id"]=>
int(5)
["firstName"]=>
string(7) "Zachary"
["lastName"]=>
string(13) "Cameron Lynch"
}
C:\src>
LOGGED: 5 requested
object(User)#6 (3) {
["id"]=>
int(5)
["firstName"]=>
string(7) "Zachary"
["lastName"]=>
string(13) "Cameron Lynch"
}
C:\src>
Fine.
But this is a very simple example: the repo has only one one method. So the decorator only needs to implement one method. But what if the repo has ten public methods? Suddenly the decorator is getting rather busy, as it needs to implement wrappers for those methods too. Yikes. This is made even worse if the decorator is only really interested in decorating one method... it still needs those other nine wrappers. And it'd be getting very boiler-plate-ish to have to implement all these "empty" wrapper methods.
Let's add a coupla more methods to our repo:
class UserRepository {
private $dataSource;
public function __construct($dataSource) {
$this->dataSource = $dataSource;
}
public function getById($id) {
$userAsJson = $this->dataSource->getById($id);
$rawUser = json_decode($userAsJson);
$user = new User($rawUser->id, $rawUser->firstName, $rawUser->lastName);
return $user;
}
public function getByFilters($filters) {
$usersAsJson = $this->dataSource->getByFilters($filters);
$rawUsers = json_decode($usersAsJson);
$users = array_map(function($rawUser){
return new User($rawUser->id, $rawUser->firstName, $rawUser->lastName);
}, $rawUsers);
return $users;
}
public function create($firstName, $lastName){
$rawNewUser = json_decode($this->dataSource->create($firstName, $lastName));
return new User($rawNewUser->id, $rawNewUser->firstName, $rawNewUser->lastName);
}
}
We have two new methods:
getByFilters()
and create()
. Even if we don't want to log calls to those for some reason, we still need the LoggedUserRepository to implement "pass-through" methods for them:class LoggedUserRepository {
private $repository;
private $loggerService;
public function __construct($repository, $loggerService) {
$this->repository = $repository;
$this->loggerService = $loggerService;
}
public function getById($id) {
$this->loggerService->logText("$id requested");
$object = $this->repository->getById($id);
return $object;
}
public function getByFilters($filters) {
return $this->repository->getByFilters($filters);
}
public function create($firstName, $lastName) {
return $this->repository->create($firstName, $lastName);
}
}
See how we still need methods for
getByFilters
and create
? Suck. I mean it's not a huge amount of code, but given the LoggedUserRepository doesn't actually wanna log those methods, they're out of place.
Saturday, 30 April 2016
PHP: Strategy pattern to defer implementation details
G'day:
Once again my Saturday evening is set in the local pub (well: the one in Galway I go to when I'm here anyhow. I go to this place more than I do my "local" at home), with a Guinness and the laptop in front of me. I've not actually done this for a while, coming to think of it.
Right, so today's random code exercise is brought to me via Stack Overflow, with some bod asking a question about how to decouple knowledge of the implementation of a task from the code that needs to execute the task. They ask it in a slightly different way, but that's the gist of it. Go read it anyhow.
This very thing has come up at work a bit recently... we have a tendency to have a single class which has various different implementations of the same task, but for different situational variations. We could all see this was a bit of a smell, but weren't initially sure how we should approach it. Enter my mate Brian and his ability to remember things that he's read (damn him), and at some stage he's read that - apparently turgid to the point of being close to unreadable - book "Design Patterns [etc]" which at least identifies a bunch of design patterns we should all be aware of when solving problems. I think I'll stick with the copy of "Head First Design Patterns" sitting on my desk. It's eminently readable.
Right, so this sounds like a case for the Strategy Pattern, which - according to Wikipedia - "… is a software design pattern that enables an algorithm's behavior to be selected at runtime". Basically you have a class which has a method to perform a task, but the implementation of the task is handed off to a collaborator which is implemented separately.
By way of example I'm gonna use a very contrived situation to demonstrate the technique. NB: this is different from my answer on Stack Overflow, but amounts to the same thing. Here we the notion of a Name, and a NameDecorator; and the requirement is to render the name in two ways: plain text as "[lastName], [firstName]" and with a mark-up template to effect something like "[firstName] [lastName]". A naïve implementation might be as follows:
First we just have a very generic Name class:
That's not really part of the problem here, but the code revolves around this. And now our Decorator class:
And sample usage:
And result:
Simple rendering: Cameron Lynch, Zachary
Rendering with template: Zachary Cameron Lynch
Here's the problem. Sometimes we need to render the name using a template: sometimes we don't. Some entire usages of this decorator will never need the template, but we still need to initialise the decorator with it.
This could be easily solved by passing the template with the rendering call, instead of init-ing the whole object with it:
And if that was the full requirement, that'd be fine. But the requirement is such that we need to call this code from other code which doesn't want to fuss about with templates and which method to call, it just wants to call a render method and dependent on how the decorator has been configured, will use whatever rendering implementation the decorator has decided upon.
We could solve this with an abstract notion of what the NameDecorator is, and have concrete implementations for each variation:
Now we just use separate Decorator classes: PlainTextNameDecorator and TemplatedNameDecorator, but we have the one unified
TBH, if the full requirement was as I've stated here, that'd probably do. But... same as with my discussion "Decorator Pattern vs simple inheritance", it's not very scalable. This works for one variation, but the number of inherited classes gets out of hand very quickly with each method that potentially needs its implementation abstracted. For example if one had a Person class and two methods which need handling:
And if we wanted to include a phone number? Eight classes. Basically it's m^n classes, where m is the ways to decorate things, and n is the number of items needing decoration. So if we also had a need to provide for JSON decoration for these three: up from eight to 27 possible variations. Yikes. Not. Scalable.
A more scalable approach is to use composition over inheritance (the more I program, the more I find this to be the case), and use the Strategy Pattern. This basically means you extract just the method functionality into a collaborator, and just tell your main class about the collaborator.
Here NameDecorator knows it can render stuff, but it itself has no idea or care about the detail of how to render stuff. All it is doing is exposing a uniform API to the calling code: it can simply call
Before doing that though, we need to initialise the decorator with a handler:
And it's the handler that knows its own various pre-requisites:
These classes just focus on what it is to render a name. The default one just gets on with it, whereas the templated one needs to be initialised with a template. But these sort of conceits are limited to the element that needs them: the Decorator itself doesn't need to know the templated handler needs a template. All it needs is a collaborator which implements a
If we have further rendering requirements (say an address, as per earlier), we just need the collaboration for that. So the scaling for this model is m*n, not m^n. Obviously better. Of course one might not need every variation, but it's more the rapidity of things getting out of hand which is the important point here.
It also keeps things more focused: each collaborator simply needs to focus on what it needs to deal with, and not anything else. And the decorator itself just worries about providing its API.
It might seem like a lot of work to create a whole class for a single method, but - really - the class boilerplating isn't much is it? One could use stand-alone functions for this:
See how I'm just using functions instead of whole classes here? But that's pretty poor code organisation IMO, so just stick 'em in classes.
The original question on Stack Overflow has asked for a variation and I think the Factory Method Pattern might be the way to go there. But... I'll think about that over night and perhaps come up with something tomorrow. I've got a bunch of downtime (this is for you, Andy Myers) at the airport waiting for a flight tomorrow afternoon.
Righto.
--
Adam
PS: four pints, that one.
Once again my Saturday evening is set in the local pub (well: the one in Galway I go to when I'm here anyhow. I go to this place more than I do my "local" at home), with a Guinness and the laptop in front of me. I've not actually done this for a while, coming to think of it.
Right, so today's random code exercise is brought to me via Stack Overflow, with some bod asking a question about how to decouple knowledge of the implementation of a task from the code that needs to execute the task. They ask it in a slightly different way, but that's the gist of it. Go read it anyhow.
This very thing has come up at work a bit recently... we have a tendency to have a single class which has various different implementations of the same task, but for different situational variations. We could all see this was a bit of a smell, but weren't initially sure how we should approach it. Enter my mate Brian and his ability to remember things that he's read (damn him), and at some stage he's read that - apparently turgid to the point of being close to unreadable - book "Design Patterns [etc]" which at least identifies a bunch of design patterns we should all be aware of when solving problems. I think I'll stick with the copy of "Head First Design Patterns" sitting on my desk. It's eminently readable.
Right, so this sounds like a case for the Strategy Pattern, which - according to Wikipedia - "… is a software design pattern that enables an algorithm's behavior to be selected at runtime". Basically you have a class which has a method to perform a task, but the implementation of the task is handed off to a collaborator which is implemented separately.
By way of example I'm gonna use a very contrived situation to demonstrate the technique. NB: this is different from my answer on Stack Overflow, but amounts to the same thing. Here we the notion of a Name, and a NameDecorator; and the requirement is to render the name in two ways: plain text as "[lastName], [firstName]" and with a mark-up template to effect something like "[firstName] [lastName]". A naïve implementation might be as follows:
First we just have a very generic Name class:
class Name {
public $firstName;
public $lastName;
function __construct($firstName, $lastName){
$this->firstName = $firstName;
$this->lastName = $lastName;
}
}
That's not really part of the problem here, but the code revolves around this. And now our Decorator class:
class NameDecorator {
private $name;
private $fullNameTemplate;
function __construct($name, $fullNameTemplate){
$this->name = $name;
$this->fullNameTemplate = $fullNameTemplate;
}
function renderFullNameAsPlainText(){
return sprintf('%s, %s', $this->name->lastName, $this->name->firstName);
}
function renderFullNameWithHtmlTemplate(){
return sprintf($this->fullNameTemplate, $this->name->firstName, $this->name->lastName);
}
}
And sample usage:
$name = new Name('Zachary', 'Cameron Lynch');
$template = '%s <strong>%s</strong>';
$nameDecorator = new NameDecorator($name, $template);
$rendered = $nameDecorator->renderFullNameAsPlainText();
echo "Simple rendering: $rendered<br>";
$rendered = $nameDecorator->renderFullNameWithHtmlTemplate();
echo "Rendering with template: $rendered<br>";
And result:
Simple rendering: Cameron Lynch, Zachary
Rendering with template: Zachary Cameron Lynch
Here's the problem. Sometimes we need to render the name using a template: sometimes we don't. Some entire usages of this decorator will never need the template, but we still need to initialise the decorator with it.
This could be easily solved by passing the template with the rendering call, instead of init-ing the whole object with it:
function renderFullNameWithHtmlTemplate($template){
return sprintf($template, $this->name->firstName, $this->name->lastName);
}
And if that was the full requirement, that'd be fine. But the requirement is such that we need to call this code from other code which doesn't want to fuss about with templates and which method to call, it just wants to call a render method and dependent on how the decorator has been configured, will use whatever rendering implementation the decorator has decided upon.
We could solve this with an abstract notion of what the NameDecorator is, and have concrete implementations for each variation:
abstract class NameDecorator {
protected $name;
function __construct($name){
$this->name = $name;
}
abstract public function render();
}
class PlainTextNameDecorator extends NameDecorator {
function render(){
return sprintf('%s, %s', $this->name->lastName, $this->name->firstName);
}
}
class TemplatedNameDecorator extends NameDecorator {
protected $fullNameTemplate;
function __construct($name, $fullNameTemplate){
parent::__construct($name);
$this->fullNameTemplate = $fullNameTemplate;
}
function render(){
return sprintf($this->fullNameTemplate, $this->name->firstName, $this->name->lastName);
}
}
Now we just use separate Decorator classes: PlainTextNameDecorator and TemplatedNameDecorator, but we have the one unified
render()
method:
$name = new Name('Zachary', 'Cameron Lynch');
$nameDecorator = new PlainTextNameDecorator($name);
$rendered = $nameDecorator->render();
echo "Simple rendering: $rendered<br>";
$template = '%s <strong>%s</strong>';
$nameDecorator = new TemplatedNameDecorator($name, $template);
$rendered = $nameDecorator->render();
echo "Rendering with template: $rendered<br>";
TBH, if the full requirement was as I've stated here, that'd probably do. But... same as with my discussion "Decorator Pattern vs simple inheritance", it's not very scalable. This works for one variation, but the number of inherited classes gets out of hand very quickly with each method that potentially needs its implementation abstracted. For example if one had a Person class and two methods which need handling:
renderName()
and renderAddress()
... to cover the bases with "plain text" and "templated" options, one would need classes like this:- PlainNamePlainAddressPersonDecorator
- PlainNameTemplatedAddressPersonDecorator
- TemplatedNamePlainAddressPersonDecorator
- TemplatedNameTemplatedAddressPersonDecorator
And if we wanted to include a phone number? Eight classes. Basically it's m^n classes, where m is the ways to decorate things, and n is the number of items needing decoration. So if we also had a need to provide for JSON decoration for these three: up from eight to 27 possible variations. Yikes. Not. Scalable.
A more scalable approach is to use composition over inheritance (the more I program, the more I find this to be the case), and use the Strategy Pattern. This basically means you extract just the method functionality into a collaborator, and just tell your main class about the collaborator.
class NameDecorator {
private $renderingHandler;
private $name;
function __construct($name, $renderingHandler){
$this->name = $name;
$this->renderingHandler = $renderingHandler;
}
function render(){
return $this->renderingHandler->render($this->name->firstName, $this->name->lastName);
}
}
Here NameDecorator knows it can render stuff, but it itself has no idea or care about the detail of how to render stuff. All it is doing is exposing a uniform API to the calling code: it can simply call
render()
:
$rendered = $nameDecorator->render();
echo "Simple rendering: $rendered<br>";
Before doing that though, we need to initialise the decorator with a handler:
$nameDecorator = new NameDecorator($name, new DefaultRenderingHandler());
And it's the handler that knows its own various pre-requisites:
class DefaultRenderingHandler {
function render($firstName, $lastName){
return "$lastName, $firstName";
}
}
class TemplatedRenderingHandler {
private $template;
function __construct($template){
$this->template = $template;
}
function render($firstName, $lastName){
return sprintf($this->template, $firstName, $lastName);
}
}
These classes just focus on what it is to render a name. The default one just gets on with it, whereas the templated one needs to be initialised with a template. But these sort of conceits are limited to the element that needs them: the Decorator itself doesn't need to know the templated handler needs a template. All it needs is a collaborator which implements a
render(firstName, lastName)
method.If we have further rendering requirements (say an address, as per earlier), we just need the collaboration for that. So the scaling for this model is m*n, not m^n. Obviously better. Of course one might not need every variation, but it's more the rapidity of things getting out of hand which is the important point here.
It also keeps things more focused: each collaborator simply needs to focus on what it needs to deal with, and not anything else. And the decorator itself just worries about providing its API.
It might seem like a lot of work to create a whole class for a single method, but - really - the class boilerplating isn't much is it? One could use stand-alone functions for this:
class NameDecorator {
private $renderingHandler;
private $name;
function __construct($name, $renderingHandler){
$this->name = $name;
$this->renderingHandler = $renderingHandler;
}
function render(){
return ($this->renderingHandler)($this->name->firstName, $this->name->lastName);
}
}
$renderAsPlainText = function ($firstName, $lastName) {
return "$lastName, $firstName";
};
$template = '%s <strong>%s</strong>';
$renderWithTemplate = function ($firstName, $lastName) use ($template) {
return sprintf($template, $firstName, $lastName);
};
$name = new Name('Zachary', 'Cameron Lynch');
$nameDecorator = new NameDecorator($name, $renderAsPlainText);
$rendered = $nameDecorator->render();
echo "Simple rendering: $rendered<br>";
$nameDecorator = new NameDecorator($name, $renderWithTemplate);
$rendered = $nameDecorator->render();
echo "Rendering with template: $rendered<br>";
See how I'm just using functions instead of whole classes here? But that's pretty poor code organisation IMO, so just stick 'em in classes.
The original question on Stack Overflow has asked for a variation and I think the Factory Method Pattern might be the way to go there. But... I'll think about that over night and perhaps come up with something tomorrow. I've got a bunch of downtime (this is for you, Andy Myers) at the airport waiting for a flight tomorrow afternoon.
Righto.
--
Adam
PS: four pints, that one.
Labels:
Brian Sadler,
PHP
Tuesday, 1 December 2015
Guest author: more on agile practices from Brian Sadler
G'day:
Once again I'm promoting a comment from another article (Guest author: a response to "Why good developers write bad code") to be an article in its own right, cos I reckon people ought to read it. Firstly I'll repost that to which Brian is replying (for context), then on with Brian's reply.
Estragon says:
Brian says:
Time to address my day job...
--
Adam
Once again I'm promoting a comment from another article (Guest author: a response to "Why good developers write bad code") to be an article in its own right, cos I reckon people ought to read it. Firstly I'll repost that to which Brian is replying (for context), then on with Brian's reply.
Estragon says:
@Brian - yes to smaller chunks of the elephant at a time, tasks that are too large are a major contributor to this behaviour ... but then what's too large for one dev is fine for another, and it can be hard to judge it right, especially when someone doesn't give early feedback on the requirements and then proves to be "quite reluctant to provide honest progress reports if things aren't going well". I guess getting the task size right for individual developers, and distinguishing legitimate concerns/cries for help from the usual background griping is one of the skills that makes a good team lead. Not easy though
I've actually never had the luxury of having a QA person on a dev team, so while what you say is no doubt true in those kinds of environments, my experience has been that there's no surer way to get a developer to lose sight of the big picture than presenting them with a bug list. At that stage it becomes all about fixing the bugs, not the code design
Brian says:
@Estragon Thanks for the comment! I'm nowhere near as engaged or as prolific as Adam, so apologies for the tardy response!Yeah, I've highlighted some bits as if I'm a student reading a text book. This is great stuff, and it's as if Mr Sadler should be writing more stuff more often. Anyway, I'm glad to have the content for this blog, so cheers fella.
"what's too large for one dev is fine for another"
Agreed there's always going to be some variation in developer productivity, but I don't think that's hugely significant in most organisations. Rock star developers tend to clump together, as do mere mortals.
I'm going to assume that the developer(s) doing the work provides the estimates - and if that's not the case then run for the hills! :-) If we then impose time-boxing (estimate 1 day max per task) and mandatory progress reports (daily stand ups), then we have a mechanism for checking if the work is progressing as planned. The time-boxing addresses the potential issue of differences in individuals' capabilities. A slower developer might have to break down a large requirement into more one-day and sub-one-day tasks than a faster developer, but whatever speed developers work at, we have a daily report on progress.
"quite reluctant to provide honest progress reports if things aren't going well"
We then have to tackle the thorny question of getting honest progress reports. Again all sorts of things come into play, but in essence we have to:
1. encourage and reward honest reporting (especially when there's bad news to be delivered)
2. penalise dishonest reporting, and
3. enforce transparency.
The first two points are all about the culture you work in - and management must set the tone here. Raising your hand because you're falling behind schedule should result in assistance not admonition. If delays/problems are flagged up as soon as they are known by developers it gives management the largest possible amount of time to take corrective action. Hiding schedule problems and hoping to catch up time on other tasks is a recipe for disaster. The management's opportunity to take corrective action starts to recede and inevitably quality takes a nose dive on the other remaining tasks as we attempt to make up for last time.
Clearly we don't want a never ending stream of schedule problems, but that's where sprints / iterations and retrospectives come in. If we honestly, diligently and faithfully log the effort required to complete each task and, as part of our retrospective activities, compare expended effort against our initial original estimates we'll start to generate a feedback loop that should help us to get better at producing reliable estimates over time.
On the third point, Continuous Integration can play a major role here but we can also do things like define what "done" means to ensure that everyone has a common understanding of what it takes to complete a task. Many devs are quite happy to declare that they're finished when they press the save button on their IDE, more "enterprise" organisations might have a definition of done that includes all sorts of checks - unit testing, code review, functional testing, CI regression tests, performance tests etc. etc. The important point is that if a developer claims to have finished a task, then it must meet the organisation's definition of done, whatever that definition is (in agile circles that usually means, deployable to production).
On top of that, I'd mandate that any developer claiming to have finished a task should be able to demonstrate it. The conversation I've heard many times is,:
Dev: "Yeh I finished Feature A"
Colleague: "Great! Let's have a look!"
Dev: "Oh, actually you can't see it just yet, QA haven't signed it off"
Colleague: "OK... So, has your code been reviewed?"
Dev: "Err... no, not yet"
Colleague: "Have you even merged your code?"
Dev: "Erm... well, you see, I haven't committed my code yet"
Colleague: "So QA hasn't signed it off, the code hasn't been merged or reviewed, indeed you haven't even committed your code, but Feature A is done right?"
Dev: "Yeh!".
At this point I call BS :-) For a task to be finished it has to be in a state where it can at least be demo'ed. We're back to the agile principle of working software being the primary measure of progress. Add in the concept of binary milestones (there is no 90% done, it's either done or it's not done) and we start to concentrate minds.
"I've actually never had the luxury of having a QA person on a dev team"
It's not a luxury! ;-) I think the key here is shortening the feedback cycle. Some people use the analogy that unreleased code is akin to stock inventory in a traditional business, so unreleased code is just filling up space in the warehouse. Having code that is not only unreleased but untested is even worse. What it means is that we've got to manage code that we're not even sure is fit for purpose, and depending on your branching model it's either hanging around going stale on a feature branch or cluttering up your mainline.
Having testers and devs work together enables us to be much leaner, and get close to a kind-of JIT stock control system for coding. We write very small amounts of code which gets reviewed, tested and fixed very quickly, and as soon as possible we release it to production so that the business owners can make a return on their investment. If organisations deliberately choose to lengthen the feedback cycle they're just wasting time, money and effort. In my opinion that sort of waste is a luxury.
"especially when someone doesn't give early feedback on the requirements"
QA can play a key role here as well. If a requirement isn't testable, then there are almost certainly problems with it, so get QA involved before we start to code, maybe even before the devs look at the requirement. If QA has no idea how to test a requirement, how can a developer hope to know when they have finished the task? The only answer is to down tools until the customer / business analyst / project manager can provide clarity.
"there's no surer way to get a developer to lose sight of the big picture than presenting them with a bug list"
To my mind that's a symptom of the tasks being too big. If you keep to one day max per task, how much damage can a half-decent developer do in one day's coding? Another question this raises is, if working at a micro level produces a mountain of bugs, how can our development at a macro level be expected to be any better?
"At that stage it becomes all about fixing the bugs, not the code design"
Whether we're fixing bugs or delivering new features we should always be fixing the design - evolutionary architecture mandates constant refactoring, even when we're fixing bugs.
"Not easy though"
Absolutely - but (to mix my metaphors) Rome wasn't built in a day and there is no Silver Bullet! :-) Sadly, I think many people choose to take a very short-term, static view of how agile methodologies might apply to them or their team. They look at a bunch of agile-like practices and think, meh, that won't make us better, faster, stronger etc. I have some sympathy for such viewpoints, because taken in isolation, and in the very short term, they won't produce dramatic results, but they're wrong!
The last principle listed on the agile manifesto states, "At regular
intervals, the team reflects on how to become more effective, then tunes
and adjusts its behaviour accordingly" and to my mind this is the key
to producing the dramatic improvements and eliminating the problems described in the OP.
If we take a look at how individuals and groups learn and grow over time, I think it's self-evident that regular, frequent opportunities to measure, reflect, take on board feedback and make changes are crucial for improvement to occur. The agile principle above demonstrates that such opportunities are hard-wired into agile, but without such opportunities I'd argue that improvement is virtually impossible.
Time to address my day job...
--
Adam
Thursday, 26 November 2015
Guest author: a response to "Why good developers write bad code"
G'day:
I could get used to this. Another one of my readers has provided my copy for today, in response to my earlier guest-authored article "Why good developers write bad code". This originally was a comment on that article (well: it still is) but it stands on its own merit, and Brian's cleared me to reproduce it as an article. Brian's a mate, one of my colleagues at Hostelworld.com. The quote below is unabridged. I have corrected one typo, and the links are my own.
Brian says:
I could get used to this. Another one of my readers has provided my copy for today, in response to my earlier guest-authored article "Why good developers write bad code". This originally was a comment on that article (well: it still is) but it stands on its own merit, and Brian's cleared me to reproduce it as an article. Brian's a mate, one of my colleagues at Hostelworld.com. The quote below is unabridged. I have corrected one typo, and the links are my own.
Brian says:
Really thought provoking post, Adam, and my immediate impression is that your mate seems to be describing an awful lot of dysfunctionality here. I'll get my declaration of interest out in the open straightaway and own up to being a massive Agile/XP fanboy, and consequently I think much of what your mate describes could be cured by the adoption of Agile/XP practices.
The "...but then he gets a chunky piece of work and, more often than you'd expect, it goes astray" quote is very revealing, because it would appear that there's no effort made to break large requirements down into smaller ones and we're on a slippery slope from here on in.
Developers often receive very coarse grained requirements and irrespective of our natural coding ability, we're not naturally adept at managing large pieces of work. That's not to say we can't manage large requirements, but without training, mentoring and support developers who might be first rate coders, can often struggle. In this case, it sounds like the developer in question is trying to eat an elephant in one sitting, and then being left alone to go dark and "fail". Reading between the lines it seems that this is accompanied by a seat of the pants estimate, which becomes a commitment as soon as he's uttered it, and before long we're into death march territory.
Also I wouldn't want to blame the developer in question, because as your mate points out, the same thing keeps happening time and again but there's no intervention or help from the guy's colleagues or his manager. Clearly they're at a loss as to how to fix the problem but unless someone can show him a better way, how is he going to improve?...
Anyway... if I was working with him I'd strongly encourage him to forget about eating the whole elephant, and just start by trying to nibble its trunk. Use TDD religiously from the outset, stick to the red-green-refactor cycle and You Ain't Gonna Need It (even more religiously), avoid Big Design Up Front, and let the application architecture evolve as he uncovers smells in the code that he's writing.
See how far he gets nibbling the elephant's trunk in one day, and if all goes then we'll reconvene and decide on the next bit of the elephant to consume, and carry on in that fashion. If he takes longer than a day - and we'll know that because we have daily stand-ups right? - then we'll pair up and see what the problem is. Invariably if the guy's a first rate coder, it will only be because he's bitten off more than he can chew, so get him to stop, re-plan the work, and decrease the size of the immediate task in front of him so he can actually finish it. In short, KISS, keep tasks small and try to conquer complexity. Writing code is intrinsically hard, we should actively avoid anything that makes it harder.
I think it's also worth bearing in mind that many, if not most developers, are naturally quite reluctant to provide honest progress reports if things aren't going well. Combine that with our tendency to give hopelessly optimistic estimates and we're already in a very bad place. I've found that it's almost endemic in our industry that developers are scared of giving bad news to anyone, let alone their line manager or project manager. Developers are usually clever people who spent much of their academic careers being praised for their intellectual prowess, and to be put in a position where they are suddenly the bad guys because they didn't hit an arbitrary deadline can be very distressing and something to avoid at all costs.
If there was just one thing I would wish for in the industry it would be to reform the attitude that project managers, team leads and developers have to estimation. Estimates should be treated by all stakeholders as just that, they are not cast iron commitments written in blood. They should be based on evidence (dude we go through six stages of peer review on every task, why did you give an estimate of 10 minutes?) and when the evidence is demonstrating that our estimates are wrong, then remaining estimates should be re-calibrated. All this is only possible if managers and developers are prepared to create a culture where realistic and responsible estimation, combined with predictable delivery is championed over masochistic estimation and constant failures to hit deadlines.
One final point, when your mate writes "it's a good idea to keep new features *out* of QA for as long as possible", I totally disagree! :-) I'd want feedback from the QA team as soon as possible. The shorter the feedback cycle the better and in an ideal world the QA team would be writing automated acceptance tests in parallel with my development efforts. I'm a firm believer in the agile principle that "Working software is the primary measure of progress" and feedback from QA I'm just guessing that the software I'm writing is actually working.
Saturday, 7 February 2015
PHP: objects as arrays, and can CFML benefit from the concept?
G'day:
We had a situation at work the other day in which we needed a collection class, I think it was Properties (that's in the real estate sense of "property", not the OO one: I work for an online travel company, and this was part of a search result). This is distinct from a single Property which represents a single hotel / hostel / whatever.
Using CFML we would have used a record set to represent this, and in PHP we'd generally represent this as an array (of individual Property objects). However the collection of Properties needed other behaviour as well: storing the minimum price per night, highest rating percentage etc. So we're definitely wanting an object of some description to keep the array of properties and other... erm... properties (in the OO sense now) defined together in the one place.
A traditional approach might be this (pseudo code):
And that would have been fine. However I rather liked the notion of instead of having to pluck the properties array out of the object before doing anything with it... just to use the object itself as an array. C# has an idea of Indexers, and I was pretty sure PHP had at least an Iterable interface, and perhaps some other bits and pieces that might work as well.
Indeed it does, and I was able to come up with an object which one can treat like an array. Almost.
I've highlighted the method implementation that each of the Iterator, Countable and ArrayAccess interfaces require. These methods allow me to traverse through the object - as one would in a
Having implemented these interfaces, I no longer need to extract the array from the object to use, I can access it via the object itself.
Here I populate a test collection with Maori numbers 1-4:
Output:
Collection after initialisation: ["tahi","rua","toru","wha"]
Then loop over the object using a normal
Output:
tahi rua toru wha
I can also set elements in the array via either a direct index reference:
Output:
Collection after setting [4]: ["tahi","rua","toru","wha","rima"]
and even via the shorthand append syntax:
Output:
Collection after setting []: ["tahi","rua","toru","wha","rima","ono"]
All that's reasonably good!
However from here, the behaviour leaves a bit to be desired. Firstly here's an example of where PHP's array implementation demonstrates itself as being a bit wanting. I tested removing an element:
The intent here is to remove the third element: toru. Here's what we get:
Collection after unset(): {"0":"tahi","1":"rua","3":"wha","4":"rima","5":"ono"}
OK, that ain't good. It's cleared the value and removed the element, but it's not reindexed the "array", so there's a gap left in the indexes. As a result the JSON encoder has gone "array? that's not an array, sunshine... erm, I better make it an object" (notice the curly braces, not the square ones, and the explicitly stated keys).
Initially I thought this was a glitch in my implementation, so I tested on a native array:
Output:
Array after unset(): {"0":"tahi","1":"rua","3":"wha"}
That's pants. Amateurs. This is an artifact of PHP not understanding that there's a difference between an indexed array (indexed via sequential numbers) and an associative array (indexed on arbitrary key), which is a fairly fundamental implementation shortfall.
And it turns out PHP's implementation of the ArrayAccess interface is a bit hamstrung too. Well the interface implementation itself is OK, but then PHP's internal usage of the interface is... lacking. Here I use a native array function on my collection:
And this yields:
Warning: array_map(): Argument #2 should be an array in Collection.php on line 101
Collection used in array_map(): null
Because... whilst PHP offers the interface, it doesn't itself actually expect the interface in its array functions:
array array_map ( callable
Note that it expects an array for the second (and subsequent) arguments. Not an ArrayAccess. If PHP is going to offer an interface to enable objects to have array behaviour... it really ought to have gone the whole hog, and implemented an ArrayImplementation interface, and then expect one of those in all its array functions.
This is the story of PHP: they seem to have the ideas, but they don't seem to know how to do a thorough job of planning and implementing the ideas.
If it was implemented thoroughly... could this general idea be useful for CFML? I'm thinking of one unified interface that describes what one needs for native CFML array functionality to be able to operate on a collection object? What would be needed?
That should provide all the hooks native CFML would need to be able to use an object as an array (eg: using native array access syntax, functions, methods, and general language constructs such as for() loops).
I think this would be a pretty handy addition to CFML? I think CFML has always offered reasonable OO opportunities for CFML code, but it itself is pretty lacking in the OO department when it comes to interacting with that code. This is an example of this: its array functionality only works with native arrays. It should work with any object that has the behaviour of an array. Something CFML has in common with PHP then. Another thing.
One thing my colleague/mate Brian pointed out (this was in the context of the PHP code, not this CFML idea) is that if we're using a dynamic language, should we need to implement actual interfaces? There's a good case for duck typing here. Provided a class implements all the requisite methods to be able to be treated as an array, should the runtime simply not allow an object to be used in place of an array.
EG: for an object to be used in a
For PHP, they need to get their act together and understand the difference between an indexed array and an associative array. However this is a fundamental language problem that is unlikely to be addressed. For CFML... there's an opportunity here to make the language a bit more cooler.
Thoughts?
--
Adam
We had a situation at work the other day in which we needed a collection class, I think it was Properties (that's in the real estate sense of "property", not the OO one: I work for an online travel company, and this was part of a search result). This is distinct from a single Property which represents a single hotel / hostel / whatever.
Using CFML we would have used a record set to represent this, and in PHP we'd generally represent this as an array (of individual Property objects). However the collection of Properties needed other behaviour as well: storing the minimum price per night, highest rating percentage etc. So we're definitely wanting an object of some description to keep the array of properties and other... erm... properties (in the OO sense now) defined together in the one place.
A traditional approach might be this (pseudo code):
class Properties {
Property[] properties;
Decimal lowestPrice;
Decimal highestRating;
// and some accessors etc
}
And that would have been fine. However I rather liked the notion of instead of having to pluck the properties array out of the object before doing anything with it... just to use the object itself as an array. C# has an idea of Indexers, and I was pretty sure PHP had at least an Iterable interface, and perhaps some other bits and pieces that might work as well.
Indeed it does, and I was able to come up with an object which one can treat like an array. Almost.
// Collection.php
class Collection implements Iterator, Countable, ArrayAccess
{
protected $collection;
protected $index;
public function __construct($array){
$this->collection = $array;
}
public function getCollection() {
return $this->collection;
}
public function current() {
return $this->collection[$this->index];
}
public function key() {
return $this->index;
}
public function next() {
++$this->index;
}
public function rewind() {
$this->index = 0;
}
public function valid() {
return array_key_exists($this->index, $this->collection);
}
public function count() {
return count($this->collection);
}
public function offsetExists($index) {
return array_key_exists($index, $this->collection);
}
public function offsetGet($index) {
return $this->collection[$index];
}
public function offsetSet($index, $value) {
echo "offsetSet() called with [$index], [$value]<br>";
if (is_null($index)) {
$this->collection[] = $value;
}else{
$this->collection[$index] = $value;
}
}
public function offsetUnset($index) {
unset($this->collection[$index]);
}
}
I've highlighted the method implementation that each of the Iterator, Countable and ArrayAccess interfaces require. These methods allow me to traverse through the object - as one would in a
foreach()
loop; "count" the elements in the object (eg: using count()
); and set elements into the object's array using array notation.Having implemented these interfaces, I no longer need to extract the array from the object to use, I can access it via the object itself.
Here I populate a test collection with Maori numbers 1-4:
$numbers = ['tahi', 'rua', 'toru', 'wha'];
$numbersCollection = new Collection($numbers);
printf('Collection after initialisation: %s<br>', json_encode($numbersCollection->getCollection()));
echo '<hr>';
Output:
Collection after initialisation: ["tahi","rua","toru","wha"]
Then loop over the object using a normal
foreach()
loop:
foreach ($numbersCollection as $number) {
echo "$number ";
}
echo '<hr>';
Output:
tahi rua toru wha
I can also set elements in the array via either a direct index reference:
$numbersCollection[4] = 'rima';
printf('Collection after setting [4]: %s<br>', json_encode($numbersCollection->getCollection()));
Output:
Collection after setting [4]: ["tahi","rua","toru","wha","rima"]
and even via the shorthand append syntax:
$numbersCollection[] = 'ono';
printf('Collection after setting []: %s<br>', json_encode($numbersCollection->getCollection()));
Output:
Collection after setting []: ["tahi","rua","toru","wha","rima","ono"]
All that's reasonably good!
However from here, the behaviour leaves a bit to be desired. Firstly here's an example of where PHP's array implementation demonstrates itself as being a bit wanting. I tested removing an element:
unset($numbersCollection[2]);
printf('Collection after unset(): %s<br>', json_encode($numbersCollection->getCollection()));
The intent here is to remove the third element: toru. Here's what we get:
Collection after unset(): {"0":"tahi","1":"rua","3":"wha","4":"rima","5":"ono"}
OK, that ain't good. It's cleared the value and removed the element, but it's not reindexed the "array", so there's a gap left in the indexes. As a result the JSON encoder has gone "array? that's not an array, sunshine... erm, I better make it an object" (notice the curly braces, not the square ones, and the explicitly stated keys).
Initially I thought this was a glitch in my implementation, so I tested on a native array:
unset($numbers[2]);
printf('Array after unset(): %s<br>', json_encode($numbers));
Output:
Array after unset(): {"0":"tahi","1":"rua","3":"wha"}
That's pants. Amateurs. This is an artifact of PHP not understanding that there's a difference between an indexed array (indexed via sequential numbers) and an associative array (indexed on arbitrary key), which is a fairly fundamental implementation shortfall.
And it turns out PHP's implementation of the ArrayAccess interface is a bit hamstrung too. Well the interface implementation itself is OK, but then PHP's internal usage of the interface is... lacking. Here I use a native array function on my collection:
$uppercaseNumbers = array_map(function($number){
return strtoupper($number);
}, $numbersCollection);
printf('Collection used in array_map(): %s<br>', json_encode($uppercaseNumbers));
And this yields:
Warning: array_map(): Argument #2 should be an array in Collection.php on line 101
Collection used in array_map(): null
Because... whilst PHP offers the interface, it doesn't itself actually expect the interface in its array functions:
array array_map ( callable
$callback
, array $array1
[, array $...
] )Note that it expects an array for the second (and subsequent) arguments. Not an ArrayAccess. If PHP is going to offer an interface to enable objects to have array behaviour... it really ought to have gone the whole hog, and implemented an ArrayImplementation interface, and then expect one of those in all its array functions.
This is the story of PHP: they seem to have the ideas, but they don't seem to know how to do a thorough job of planning and implementing the ideas.
If it was implemented thoroughly... could this general idea be useful for CFML? I'm thinking of one unified interface that describes what one needs for native CFML array functionality to be able to operate on a collection object? What would be needed?
// Array interface
interface {
// access
any function get(required numeric id);
void function set(required numeric id, required any value);
void function remove(required numeric id);
numeric function size();
// traversal
void function reset();
void function next();
any function current();
}
That should provide all the hooks native CFML would need to be able to use an object as an array (eg: using native array access syntax, functions, methods, and general language constructs such as for() loops).
I think this would be a pretty handy addition to CFML? I think CFML has always offered reasonable OO opportunities for CFML code, but it itself is pretty lacking in the OO department when it comes to interacting with that code. This is an example of this: its array functionality only works with native arrays. It should work with any object that has the behaviour of an array. Something CFML has in common with PHP then. Another thing.
One thing my colleague/mate Brian pointed out (this was in the context of the PHP code, not this CFML idea) is that if we're using a dynamic language, should we need to implement actual interfaces? There's a good case for duck typing here. Provided a class implements all the requisite methods to be able to be treated as an array, should the runtime simply not allow an object to be used in place of an array.
EG: for an object to be used in a
foreach()
loop, it needs to provide a next()
and get()
method (maybe a getKey()
and getValue()
method. Provided the object has those methods, it should be usable in that situation. Even more so in the CFML situation: the class doesn't need to have the methods specified... provided the method is there at runtime, that should be fine (for my PHP readers, in CFML one can inject new methods into an object at runtime). This is true language dynamicism at work. This is where CFML (and PHP) should be positioning themselves: get rid of the ceremony, and just crack on with the business of being a modern dynamic language.For PHP, they need to get their act together and understand the difference between an indexed array and an associative array. However this is a fundamental language problem that is unlikely to be addressed. For CFML... there's an opportunity here to make the language a bit more cooler.
Thoughts?
--
Adam
Labels:
Arrays,
Brian Sadler,
CFML,
PHP
Saturday, 27 December 2014
Learn CFML in 24 hours: chapter 0
G'day:
I've been mulling this over since I released the CFScript docs on Github. I made a point of saying it was not aimed at teaching CFScript, it was just a reference. I then started thinking about writing a document which was a tutorial, not a reference, covering CFML. We need a decent guide to using CFML (I do not consider any of the current options to be "decent").
Thinking about it, it became clear it was going to be a non-trivial undertaking, and given my career was vacillating all over the place at the time, I pushed it into the back of my mind.
I've been mulling this over since I released the CFScript docs on Github. I made a point of saying it was not aimed at teaching CFScript, it was just a reference. I then started thinking about writing a document which was a tutorial, not a reference, covering CFML. We need a decent guide to using CFML (I do not consider any of the current options to be "decent").
Thinking about it, it became clear it was going to be a non-trivial undertaking, and given my career was vacillating all over the place at the time, I pushed it into the back of my mind.
Labels:
Brian Sadler,
CFML
Monday, 1 December 2014
Hammers and screws (and when not to use a list in CFML)
G'day:
I hope Salted (/Ross) and Ryan don't mind me pinching our IRC conversation for a blog article, but... well... too bad if they do. It's done now. This is it.
Ross was asking "whats the most efficient way to remove the first two items in a list and get back the remainder", which made my eyes narrow and decide there's probably something to get to the bottom of here. Because he mentioned "lists". And these should - in general - be avoided in CFML.
After some back and forth with Ryan, Ross settled on:
(pseudo code, obviously).
Now... I think lists are overused in CFML, and - indeed - they are a bit rubbish. Having a string of data and then using one character (OK, or "some characters ~") from the string as a delimiter is seldom going to work, especially with the default delimiter of comma. Because in really a lot of situations, commas are actually part of the data. And same with any other char-based delimiter (except say characters specifically intended to be delimiters like characters 28-31 (file, group record and unit separators, respectively). I really don't think lists should generally be part of a solution to anything.
With this in mind, I decided to check what was going on. This is copy-and-pasted from the IRC log (spelling and formatting addressed, otherwise it's unabridged):
16:36 <adam_cameron> Salted: first question that should be asked... why is it a list in the first place? Do you have any control over it?
16:36 <Salted> no, it's returned from a postcode lookup service
16:36 <Salted> I'm fairly certain that the address string will always have the postcode first so [p1] [pt2] [line1]
16:37 <Salted> it's not a list its a string but I'm treating it as a space delimited list for these purposes...
16:37 <adam_cameron> so it's a string? Not a list per se?
16:37 <adam_cameron> ok
16:37 <Salted> indeed
16:37 <adam_cameron> So what you actually want is everything after the second space?
16:37 <Salted> essentially yes
16:37 <adam_cameron> Not a list with the first two elements dropped off
16:37 <Salted> in this instance what you've mentioned is one and the same, if I treat it as a list
16:38 <adam_cameron> well literally, yes. However once you start treating it as a list, you're thinking about it wrong
16:39 <Salted> I don't see how thats different from thinking of strings as char arrays, for example
16:40 <Salted> "my name is ross" is both a char array of 14 elements and a list of 4 elements
16:40 <adam_cameron> because you start trying to use list functions on it
16:40 <adam_cameron> Which makes your code shit
My point here - one that Ross doesn't entirely agree with - is that just because one can use a string as a list, doesn't mean - semantically - it is a list. And one should try to keep one's code as semantically clear as possible.
What Ross has here is not a space-separated list. Later in the conversation I extracted form him what the actual data was like, and this was a sample:
X1 2YZ Apartment 1903, 10 Random Street Suburbia, OURTOWN
What we have here is... a mess. It can be assessed in a number of ways:
One should look at the actual requirement, and solve that.
I hope Salted (/Ross) and Ryan don't mind me pinching our IRC conversation for a blog article, but... well... too bad if they do. It's done now. This is it.
Ross was asking "whats the most efficient way to remove the first two items in a list and get back the remainder", which made my eyes narrow and decide there's probably something to get to the bottom of here. Because he mentioned "lists". And these should - in general - be avoided in CFML.
After some back and forth with Ryan, Ross settled on:
listtoarray(foo," "), arrayslice(bar,3), arraytolist(bar," ")
(pseudo code, obviously).
Now... I think lists are overused in CFML, and - indeed - they are a bit rubbish. Having a string of data and then using one character (OK, or "some characters ~") from the string as a delimiter is seldom going to work, especially with the default delimiter of comma. Because in really a lot of situations, commas are actually part of the data. And same with any other char-based delimiter (except say characters specifically intended to be delimiters like characters 28-31 (file, group record and unit separators, respectively). I really don't think lists should generally be part of a solution to anything.
With this in mind, I decided to check what was going on. This is copy-and-pasted from the IRC log (spelling and formatting addressed, otherwise it's unabridged):
16:36 <adam_cameron> Salted: first question that should be asked... why is it a list in the first place? Do you have any control over it?
16:36 <Salted> no, it's returned from a postcode lookup service
16:36 <Salted> I'm fairly certain that the address string will always have the postcode first so [p1] [pt2] [line1]
16:37 <Salted> it's not a list its a string but I'm treating it as a space delimited list for these purposes...
16:37 <adam_cameron> so it's a string? Not a list per se?
16:37 <adam_cameron> ok
16:37 <Salted> indeed
16:37 <adam_cameron> So what you actually want is everything after the second space?
16:37 <Salted> essentially yes
16:37 <adam_cameron> Not a list with the first two elements dropped off
16:37 <Salted> in this instance what you've mentioned is one and the same, if I treat it as a list
16:38 <adam_cameron> well literally, yes. However once you start treating it as a list, you're thinking about it wrong
16:39 <Salted> I don't see how thats different from thinking of strings as char arrays, for example
16:40 <Salted> "my name is ross" is both a char array of 14 elements and a list of 4 elements
16:40 <adam_cameron> because you start trying to use list functions on it
16:40 <adam_cameron> Which makes your code shit
My point here - one that Ross doesn't entirely agree with - is that just because one can use a string as a list, doesn't mean - semantically - it is a list. And one should try to keep one's code as semantically clear as possible.
What Ross has here is not a space-separated list. Later in the conversation I extracted form him what the actual data was like, and this was a sample:
X1 2YZ Apartment 1903, 10 Random Street Suburbia, OURTOWN
What we have here is... a mess. It can be assessed in a number of ways:
- a post code part followed by an address part
- a post code followed by a street address followed by a city
- a post code followed by the street part of the address, the suburb part of the address, and the city part of the address
One should look at the actual requirement, and solve that.
Labels:
Brian Sadler,
Clean Code,
Ryan Guill,
Salted
Subscribe to:
Posts (Atom)