Showing posts with label Theory. Show all posts
Showing posts with label Theory. Show all posts

Sunday 1 September 2024

Design patterns: singleton

G'day:

(Long time no see, etc)

The other day on the CFML Slack channel, whilst being in a pedantic mood, I pointed out to my mate John Whish (OG CFMLer) that he was using the term "singleton" when all he really meant was "an object that is reused". A brief chat in the thread and in DM ensued, and was all good. We both revisited the docs on the Singleton Pattern, and refreshed & improved our underatanding of it, and were better for it. Cool. The end. Short article.

...

..

Then the "well ackshually" crowd showed up, and engaged in a decreasingly meritorious diatribe in how there's no difference between a class that implements the Singleton Pattern, and an object that one happens to decide to reuse: they're both just singletons.

OK so the reasoning wasn't quite that daft (well from one quarter, anyhow), but the positioning was missing a degree of nuance, and there was too much doubling down in the "I'm right" dept that I just gave up: fine fellas, you do you. In the meantime I was still talking to John in DM, and I mentioned I was now keen to see how an actual singleton might come together in CFML, so there was likely a blog article ensuing. I predicted CFML would throw up some barriers to doing this smoothly, which is interesting to me; and hopefully other readers - if I have any still? - can improve their understanding of the design pattern.

I'll put to bed the "debate" first.

The notion of a singleton comes from the Singleton Pattern, which is one of the perennial GoF Design Patterns book. It's a specific technique to achieve an end goal.

What's the end goal? Basically one of code organisation around the notion that some objects are intended to be re-used, and possibly even more strongly: must be re-used in the context of the application they are running in. One should not have more than one of these objects in play. An obvious, oft-cited, and as it turns out: unhelpful, example might be a Logger. Or a DatabaseConnection. One doesn't need to create and initialise a new DatabaseConnection object every time one wants to talk to the DB: one just wants to get on with it. If one needed to instantiate the DatabaseConnection every time it was used, the code gets unwieldy, breaks the Single Responsibility Principle, and is prone to error. Here's a naïve example:

numbers = new DatabaseConnection({
    host = "localhost",
    port = "3306",
    user = "root",
    password = "123", // eeeeek
    database = "myDb"
}.execute("SELECT * FROM numbers"))

One does not wanna be doing all that every time one wants to make a call to the DB. It means way too much code is gonna need to know minutiae of how to connect to the DB. One will quickly point out the deets don't need to be inline like that (esp the password!), and can be in variables. But then you still need to pass the DB credentials about the place.

There's plenty of ways to solve this, but the strategy behind the Singleton Pattern is to create a class that controls the usage of itself, such that when an instance is called for, the calling code always gets the same instance. The key bit here is a class that controls the usage of itself[…] always[…] the same instance. That is what defines a singleton.

My derision in the Slack conversation was that the other participants were like "yeah but one can do that with a normal object just by only creating one of them and reusing it (muttermutterDIcontainer)". Yes. Absolutely. One can def do that, and often it's exactly what is needed. And DI containers are dead useful! But that's "a normal object […] and reusing it". Not a singleton. Words have frickin meanings(*). I really dunno why this is hard to grasp.

It's like someone pointing to white vase with blue art work on it, and going "this is my Ming vase". And when one then points out it says "IKEA" on the bottom, they go "doesn't matter. White with blue detail, and can put flowers in it. That's what a Ming vase is, for all intensive purposes (sic)". "You know a Ming vase is a specific sort of vase, right?". "DOESN'T MATTER STILL HOLDS FLOWERS". OK mate.

Digression, around that (*) I put above. I'm a firm believer in words are defined by their usage, not by some definition written down somewhere. A dictionary, for example, catalogues usage, it doesn't dictate usage. This is fine, but it's also less than ideal that words like "irregardless" end up in the dictionary because people are too…erm… "lacking a sense of nuance"… to get easy things right. This is pretty much where I am coming from here: "Singleton" means something; it'd be grand if the usage of it didn't get diluted due to people "lacking a sense of nuance" to get easy things right. And then to double-down on it is just intellectually-stunted, and not something I think we should be revelling in. I do also feel rather like Cnut vs the surf in this regard though.

Ah well.


Anyhoo, can I come up with a singleton implementation in CFML?

The first step of this didn't go well:

// Highlander.cfc
component {

    private Highlander function init() {
        throw "should not be runnable"
    }
}
// test.cfm    
connor = new Highlander()

writeDump(connor)

I'd expect an exception here, but both ColdFusion and Lucee just ignore the init method, and I get an object. Sigh.

This is easily worked-around, but already my code is gonna need to be less idiomatic than I'd like it to be:

public Highlander function init() {
    throw "Cannot be instantiated directly. Use getInstance"
}

Now I get the exception.

Next I start working on the getInstance method:

public static Highlander function getInstance() {
    return createObject("Highlander")
}
// test.cfm
connor = Highlander::getInstance()

writeDump(connor)

This still returns a new instance of the object every time, but it's simply a first step. To easily show whether instances are the same or different, I'm gonna given them an ID:

// Highlander.cfc
component {

    variables.id = createUuid()

    public Highlander function init() {
        throw "Cannot be instantiated directly. Use getInstance"
    }

    public static Highlander function getInstance() {
        return createObject("Highlander")
    }

    public string function getId() {
        return variables.id
    }
}
// test.cfm
connor = Highlander::getInstance()

writeDump(connor)

goner = Highlander::getInstance()

writeDump([
    connor = connor.getId(),
    goner = goner.getId()
])

 

See how the IDs are different: they're different objects.

We solve this by making getInstance only create the object instance once for the life of the class (not the object: the class).

public static Highlander function getInstance() {
    static.instance = isNull(static.instance)
        ? createObject("Highlander")
        : static.instance

        return static.instance
}

It checks if there's already an instance of itself that it's created before. If so: return it. If not, create and store the instance, and then return it.

Now we get better results from the test code:

 

Now it's the same ID. Note that this is not isolated to that request: it sticks for every request for the life of the class (which is usually the life of the JVM, or until the class needs to be recompiled). I'm altering my writeDump call slightly:

writeDump(
    label = "Executed @ #now().timeFormat('HH:mm:ss')#",
    var = [
        connor = connor.getId(),
        goner = goner.getId()
    ]
)

 

 

The ID sticks across requests. It's not until I restarts my ColdFusion container that the static class object is recreated, and I get a new ID:

 

One flaw in this implementation is that there's nothing to stop the calling code using createObject rather than using new to try to create an instance of the object. EG: this "works":

// test2.cfm
connor = createObject("Highlander")
goner = createObject("Highlander")

writeDump(
    label = "Executed @ #now().timeFormat('HH:mm:ss')#",
    var = [
        connor = connor.getId(),
        goner = goner.getId()
    ]
)

 

When I say this "works" I am setting the bar very low, in that "it doesn't error": it's not how the Highlander class is intended to be used though.

Oh: in case it's not clear why there's no exception here: it's cos when one uses createObject, the init method is not automatically called.

Can I guard against this?

Sigh.

OK, on Lucee I can do this with minimal fuss:

// Highlander.cfc
component {

    if (static?.getInstanceUsed !== true) {
        throw "Cannot be instantiated directly. Use getInstance"
    }

    variables.id = createUuid()

    public Highlander function init() {
        throw "Cannot be instantiated directly. Use getInstance"
    }

    public static Highlander function getInstance() {
        static.getInstanceUsed = true

        static.instance = isNull(static.instance)
            ? createObject("Highlander")
            : static.instance

        static.getInstanceUsed = false

        return static.instance
    }

    public string function getId() {
        return variables.id
    }
}

What's going on here? The conceit is that the class's pseudo-constructor code is only executed during object creation, and when we are creating an object via getInstance we disable the "safety" in the pseudo-constructor, but then re-enable it once we're done creating the instance. if we don't use getInstance, then the safety has either never been set - exception - or it's been set to false by an erstwhile call to getInstance - also exception.

Looking at that code, I can see that there's a race-condition potential with getInstance's setting/unsetting of the safety, so in the real world that code should be locked.

As I alluded to above: this code does not work in ColdFusion, because ColdFusion has a bug in that the pseudo-constructor is run even when running static methods, so the call to getInstance incorrectly calls the pseudo-constructor code, and triggers the safety check. Sigh. I can't be arsed coming up with a different way to work around this just for ColdFusion. I will raise a bug with them though (TBC). ColdFusion also has another bug in that static?.getInstanceUsed !== true errors if static.getInstance is null, as the === doesn't like it. I guess I'll raise that too (also TBC).

So. There we go. A singleton implementation.


PHP's OOP is more mature than CFML's, so a PHP implementation of this is a bit more succinct/elegant:

class Highlander {
    
    private ?string $id;
    
    private static self $instance;
    
    private function __construct() {
        $this->id = uniqid();
    }
    
    public function getId() : string {
        return $this->id;
    }
    
    public static function getInstance() : static {
        self::$instance = self::$instance ?? new static();
        return static::$instance;
    }
}

$connor = Highlander::getInstance();
$goner = Highlander::getInstance();

var_dump([
    'connor' => $connor->getId(),
    'goner' => $goner->getId()
]);

$transient = new Highlander();
array(2) {
  ["connor"]=>
  string(13) "66d457f9d4cd1"
  ["goner"]=>
  string(13) "66d457f9d4cd1"
}

Fatal error: Uncaught Error: Call to private Highlander::__construct() from global scope in /home/user/scripts/code.php:30
Stack trace:
#0 {main}
  thrown in /home/user/scripts/code.php on line 30

And that's that.

Righto.

--
Adam

Wednesday 5 October 2022

DRY: don't repeat yourself

G'day:

This should be a short one. I've had this text lying around for a while, wondering if I could spin it out to be a longer article somehow, but I never managed to work out how. Then a few days ago I needed to point to something about "DRY" (so I didn't have to… erm… repeat myself), and got annoyed that I didn't have this article already. I guess articles don't need to be long, if the point itself doesn't need much discussion. So here goes.

Everyone has heard about the DRY Principle, I would hope. From that same article on Wikipedia:

The DRY principle is stated as "Every piece of knowledge must have a single, unambiguous, authoritative representation within a system".

The articles goes further to explain it more thoroughly, but I think that's about as far as most intermediate-level devs ever read, and they get the wrong end of the stick. They perceive it to mean "never write the same code more than once", and they stick to that dogmatically, de-duping their code at every step and making an unholy mess of unnecessary complexity along the way. But this is not what the DRY Principle means. DRY is not about code, it is about complexity of concepts. Any complexity should be extracted/refactored/de-duplicated so as to reduce complexity. I just found a handy quote explaining DRY well:

the DRY Principle is actually about removing duplicate ideas and not duplicate code

Steven Solomon in Don't Repeat Yourself Is Misunderstood

If one takes three very similar pieces of simple functionality and extracts and combines them into one more complex piece of functionality that serves all three original usages in a generic fashion, that is likely to be increasing the complexity of the original code (and the extracted code). That's not a good application of the DRY principle. For one thing you now have three places in your codebase that are only similar, but have different variations, all coupled together. This violates the Single Responsibiltity Principle, for one thing.

Another way of saying much the same thing (now that I re-read it) is that just because multiple pieces of functionality seem superficially the same doesn't mean they are the same, and accordingly need to be the same implementation. For example if three pieces of code need measure off the minutes in a day, it's OK for each of them to store the value 1440. We do not gain anything from extracting that into a AmountsOfTime::MINUTES_IN_A_DAY constant. It's quite possibly just coincidence that each piece of code is using "minutes in a day" for whatever they are measuring at that point in time, and tightly coupling them together is not appropriate. One might be "minutes until the task should run again", the other might be "minutes to cache that thing", and the other might be… oh, I dunno: "minutes until the same time tomorrow" (OK that's dumb). Here the value is the same, and it's measuring the same thing, but the purpose of each is not actually inter-related. So it's OK to duplicate the simple concept of 1440 is the number of minutes in the day, however label it with why we need to know that value, not simply "what the value is". Digression: I guess this is actually the same motivation behind not writing comments that describe what the code does, eg:

Pointless waste of time:

// re-run the task in 1440min
reRun(theTask, 1440)

Actually useful:


// don't do this again for 1440min because it needs to report on the whole day's activity
reRun(theTask, 1440)

On the other hand if we have two pieces of code that define how we apply (random invented example not at all related to my previous job. [cough]) the FX hedge to a value… we pretty much need that calculation - even though it's simply multiplying the input by a static multiplier - to be in only one place because it's a specific business rule, and it reduces complexity to have it in one place, and simply to call it from wherever it's needed.

class ForeignExchange {
    
    hedgeMultiplier = 1.05

    applyHedge(amount) {
        return amount * hedgeMultiplier
    }
}

The business rule is a simple multiplication expression, but it's to do with financial transactions, and its usage needs to be uniform (because customers notice and complain when it isn't. In theory. In this fictitious analaogy I am using here. [another cough]). Plus, as we found out: we needed to make the expression slightly more complex than that. Again this comes back to the Single Responsibility Principle.

I think a lot of intermediate-level devs get into the trap of premature optimisation, and this can cause a misapplication of the DRY Principle: as soon as they find themselves starting to write similar code a second time, they immediately refactor it into a common location. Personally I think the shape that a refactoring might take only starts to surface on the third replication of the same code. When one is looking at a sample size of two when considering refactoring, it's pretty difficult to tell if it's a pattern, or if it's just a coincidence. Wait a bit longer to see if the pattern forms before refactoring stuff (ie: applying the DRY Principle).

Right. So that's 50/50 stuff I had already written, and a few more thoughts. It's a pretty short article for me, but, well: this is all I had to say so I ain't gonna spin it out further.

Bottom line: it's OK for code to be duplicated, if the code is simple.

Righto.

--
Adam

Thursday 2 December 2021

A question about DAO testing, abstraction, and mocking

G'day:

This is another "low-hanging fruit" kinda exercise (see also previous article: "A question about the overhead of OOP in CFML"). I'm extracting this one from the conversation in the CFML Slack channel so that it doesn't get lost. I have a growing concern about how Slack is a bit of a content-black-hole, and am cooling to the notion of helping people there. Especially when my answer is a fairly generic one. The question is a CFML-based one; the answer is just a general testing-strategy one. I say this so people are less likely to stop reading because it's about CFML ;-) Anyhoo: time for some copy and paste.

The question is a simple one:

Just dropping it in here. Unit testing DAO object using qb. How would you guys go about it or why not? Go!
Kevin L. @ CFML Slack Channel

I say "simple". I mean short and on-point.

Here's the rest of the thread. I've tidied up the English in some places, but have not changed any detail of what was said. I will prefix this saying I don't know "qb" from a bar of soap; it's some sort of DBAL. For the purposes of this, how it works doesn't really matter.


The participants here are myself, Kevin and Samuel Knowlton.

It depends on how you design this part of your application.

For me the interface between the application and the storage tier is via a Repository class, which talks to the higher part of the domain in object and collections thereof: no trace of any persistence DSL at all (no SQLish language, for example).

For the purposes of testing, I abstract the actual storage-tier communications: in CFML this would be queryExecute calls, or calls to an external library like qb. I fashion these as DAOs if they contain actual SQL statements, or an Adapter if it was something like qb.

So the repo has all the testable logic in it, the DAO or Adapter simply takes values, calls the persistence implementation, and returns whatever the persistence call returns. The repo converts that lot to/from domain-ready stuff for the application to use.

I'll just add an example of this for the blog version of this conversation:

selectFooById(id) {
    return queryExecute("
        SELECT col1, col2, col3, etc
        FROM some_table
        WHERE id = :id
    ", {id=id})
}

It's important to note that this method does nothing other than abstract the queryExecute call away from any application logic in the method that calls this. No manipulation of inputs; no remapping of outputs. Both of those are the job of the repository method that calls this DAO method. This DAO method only exists to remove the external call from the repo code. That's it.

To that end, there is no unit testable logic in the DAO. But I will do an integration test on it. For a CFML example if I call a method on the DAO, then I do get the kind of query I'd expect (correct columns).

I'd unit test the repo logic along the lines of the mapping to/from the domain/storage is correct, and any transformation logic is correct (converting datetime objects to just date-only strings and the like). I would test that if a DAO call returned a three-row query, then the repo returns a three-object collection, with the expected values.

NB: Repos don't have business logic; they only have mapping / transformation logic.

An initial task here might be to look at the DAO code, and even if it rolls-in transformation/mapping logic and persistence-tier calls, at the very least it doesn't do both in the same method.

One thing to be careful to not do is to re-test qb. Assume it's doing its job correctly, and just test what your code does with the data qb gives you.

In that case my current tests are not correct at all… They felt wrong from the start, but I was struggling with how to test the DAO layer (which is abstracted through an interface). My DAO layer at this point does have a hard reference to / dependency on qb, which in that case violates some OO principles… This is a brainbreaker

Ah yes everything I mentioned above is predicated on using DI and an IoC container which should really be the baseline for any application designed with testing in mind (which should be every application).

(NB: DI is the baseline; using an IoC container to manage the DI is a very-nice-to-have, but is not essential).

You can "fake it til you make it" with the DI, by extracting hard references to qb object creation into methods that solely do that. You can then mock-out those methods in your tests so you can return a testable object from it.

So instead of this:

// MyRepository
component {

    getObjectCollection() {
        qb = new QB()
        
        raw = qb.getStuff()
        
        collection = raw.map(() => {
            // convert to domain model objects. this is what you want to test
        })
        
        return collection
    }
}

One has this:

// MyRepository
component {

    getObjectCollection() {
        qb = getQb()
        
        raw = qb.getStuff()
        
        collection = raw.map(() => {
            // convert to domain model objects. this is what you want to test
        })
        
        return collection
    }
    
    private getQb() {
        return new QB()
    }
}

And can test like this:

it("maps the data correctly", () => {
    sut = new MyRepository()
    mockbox.prepareMock(sut) // sut is still a MyRepository, but can selectively mock methods
    
    rawData = [/* known values that exercise the logic in the mapper */]
    sut.$("getQb").$results(rawData)
    
    collection = sut.getObjectCollection()
    
    expectedCollection = [/* expected values based on known values */]
    
    expect(collection).toBe(expectedCollection)
})

You could also extract the mapping logic from the fetching logic, and test the mapping logic directly. That's getting a bit bitty for my liking though. However do what you need to do to be able to separate the logic into something testable, away from code that isn't testable. It's always doable with a bit of refactoring, even if the refactoring isn't perfect.

[…] I already have a getQueryBuilder method in the DAO object. So that's a good starting point. How would you test update / create / deletion?

Two sets of tests:

Integration tests

Test that your app integrates with QB and the DB. Call the actual live method, hit the DB, and test the results.

So…

  • create a record using your app. Then use queryExecute to get the values back out, and test it's what you expect to have been saved.
  • Insert a row into the DB (either via your create mechanism, or via queryExecute). Update it via your app. Use queryExecute to get the values back out, and test it's what you expect to have been updated.
  • Insert a row into the DB (either via your create mechanism, or via queryExecute). Delete it via your app. Use queryExecute to try to get the values back out, and make sure they ain't there.

Unit tests

  • When you call your app's create method with [given inputs] then those inputs (or equivalents, if you need to transform them) are the ones passed to the expected method of the (mocked) object returned from your mocked getQueryBuilder call.
  • Same with update.
  • Same with delete.
it("deletes the data from storage", () => {
    testId = 17
    
    sut = new MyRepository()
    mockbox.prepareMock(sut)
    
    mockedQb = createMock("QB")
    expectedReturnFromDelete = "I dunno what QB returns; but use some predictable value"
    mockedQb.$("delete").$args(testId).$results(expectedReturnFromDelete)
    
    sut.$("getQueryBuilder").$results(mockedQb)
    
    result = sut.delete(testId)
    
    expect(result).toBe(expectedReturnFromDelete) // it won't have been returned unless testId was passed to QB.delete
})

That's a very simplistic test, obvs.

Oh: run all the integration tests in a transaction that you rollback once done.

Here the unit tests don't hit the actual storage, so are easier to write, maintain, and quick to run in your red/green/refactor cycle. You run these all the time when doing development work. The integration tests are fiddly and slow, so you only run those when you make changes to that tier of the app (DB schema changes that impact the app, generally; and then when putting in a pull request, merging it, and doing a build). It is vital the code being integration tested does not have any logic in it. Because then you need to unit test it, which defeats the purpose of separating-out yer concerns.

[…]To add our two cents: we use QB and Quick all the time in integration tests to make sure that objects are created, updated, or deleted the way we want. Sometimes we will use cfmigrations to pre-populate things so we are not doing a whole create/update/destroy cycle on each test run (which can be awesome, but also slow)

[Agreed]

A lot of information to take in. I'll take my time to read this through. Thank you @Adam Cameron and @sknowlton for taking your time to answer my questions!

NP. And ping away with any questions / need for clarification.


[a few days pass]

[…]Thanks to your input I was able to refactor the "dao" layer through test-driven design!

I do have a general testing question though. Let's assume I have method foo in a service with parameters x, y, z with the following pseudocode:

function foo( x, y, z ) {
  /* does something */
  var args = ...
  
  /* validate, throw, etc... */
 
  return bar( 
    argumentCollection = args 
  );
}

Should the unit test mock out the bar function and assert the args parameter(s) and / or throw scenarios or actually assert the return value of the foo function. Or is the latter case more of a integration test rather than a unit test?

Sorry if this is a noob question, been trying to wrap my head around this…

A handy thing to do here is to frame the tests as testing the requirement, not testing the code.

Based on your sample pseudo code, then you have a requirement which is "it does the foothing so that barthing can happen". I have purposely not used the method names there directly; those are implementation detail, but should describe the action taking place. It could be "it converts the scores into roman numerals": in your example "it converts the scores" is the call to foo, and "into roman numerals" is the call to bar.

If you frame it like that, your tests will be testing all the variations of foothing and barthing - on variation per test, eg:

  • x, y, z are valid and you get the expected result (happy path)
  • x is bung, expected exception is thrown
  • y is bung [etc]
  • z is bung [etc]
  • say there's a conditional within bar… one test for each true / false part of that. The true variant might be part of that initial happy path test though.

This is how I read the pseudo code, given that bar looks like another method in the same class - probably a private method that is part of some refactoring to keep foo streamlined.

If your pseudocode was return dao.bar(argumentCollection = args), then I'd say that the DAO is a boundary of your application (eg, it is an adapter to a queryExecute call), and in this case you would mock-out dao.bar, and use spying to check that it received the correct arguments, based on the logic of foo. For example foo might multiple x by two... make sure bar receives 2x, etc

Does that cover it / make sense? Am off to a(~nother) meeting now, so had to be quick there. I'll check back in later on.

It covers my question, Adam, thanks!

One thing I didn't have time to say... I used a DAO in that example specifically to keep my answer short. A DAO is by definition a boundary.

If it was something else like a RomanNumeralWriterService and it was part of your own application, just not internal to the class the method you're testing is in... it's not so clear cut as to whether to mock or use a live object.

If RomanNumeralWriterService is easy to create (say its constructor take no or very simple arguments), and doesn't itself reach outside your application... then no reason to mock it: let it do its job. If however it's got a bunch of its own dependencies and config and will take a chunk of your test just to create it, or if it internally does its own logging or DB writing or reaching out to another service, or it's sloooow, then you mock it out. You don't want your tests messing around too much with dependencies that are not part of what yer testing, but it's not black or white whether to mock; it's a judgement call.

Yeah I was struggling with the thing you mention in your last paragraph. Service A has Service B as a dependency which itself has a lot of other dependencies (logging, dao, …). The setup for the test takes ages while the test method is 1 line of code :/.

Yeah, mock it out.

Make sure whatever you mock-out is well-test-covered as well in its own right, but it's not the job of these tests to faff around with that stuff.


That was it. I think that was a reasonably handy discussion, so was worth preserving here. YMMV of course: lemme know.

Righto.

--
Adam

 

 

Thursday 4 November 2021

A question about the overhead of OOP in CFML

G'day:

A question cropped up on the CFML Slack channel the other day. My answer was fairly long-winded so I decided to post it here as well. I asked the original questioner, and they are OK with me reproducing their question.

Again, I have a question to experienced OOP cfml coders. From the clean code concept I know I should break code into smaller (er even its smallest ) pieces. Is there any possible reason to stop doing that at a certain level in CFML? Eg. for performance reasons? Eg. lets assume I have a component named Car.cfc. Should I always break a Car.cfc component into Wheel.cfc, Engine.cfc, CarBody.cfc accordingly? Does the createObject behave like include files that would come with a certain overhead because of physical file request? What is when I also break Engine.cfc into many little pieces (and Wheel.cfc also)?
Andreas @ CFML Slack Channel

Here's my answer. I've tidied up the English in some places, but have not changed any detail of what I said.


Eventually there will be a meaningful overhead cost of creating a lot of objects.

Note that your comment about "behave like include files that would come with a certain overhead because of physical file request" isn't really accurate because the files are only read once, and after that they're in memory. Same applies with includes, for that matter. The process is (sort of):

  • code calls for an object to be created
  • if implementation for object is not found in memory, its code is loaded from disk
  • object is created
  • object is used
  • object is at some point cleaned up once it's not being referenced any more (at the discretion of the garbage collector)

That second step is only performed if needed, and all things being equal, will only be needed once in the lifetime of yer app in the JVM.

So don't worry about file system overhead; it's not going to be significant here.

Creating objects does come at a cost, and neither CFML engine has traditionally been particularly efficient at doing so (Lucee is better I believe; and CF is not as slow as it used to be). This could be a consideration at some point.

However performance considerations like this shouldn't be worried about until they start becoming an issue.

Design your application in a way that best represents the data and behaviour of your business domain. Make it modular, employing a sense of reusability and following the Single Responsibility Principle.

Keep an eye on your JVM. Use FusionReactor or something else similar. I suspect FR is the only game in town for CFML code; but there are other general JVM profiling tools out there as well which will do as good a job, but be Java-centric. If you see performance spikes: sort them out.

Load test your application with real-world load. This doesn't mean looping over object-creation one million times and doing "tada! It took x seconds". This means close to nothing and is not really a meaningful test. Use a load testing tool to load test your application, not your code. Back when I uses to do such things, There was tooling that could re-run a web server log, so one could easily test with real-world traffic. This is important because concurrency issues which might cause locking bottlenecks, and application slow-downs.

[I forgot to say this bit in my original answer]. Irrespective of the overhead of creating objects, these will be (orders of magnitude more ~) trivial compared to the overhad of a poorly-written DB query, or bad indexing, or bad locking of code, heavy (and possibly badly-designed) string processing etc. There's stacks of things I'd be worrying about before I wondered if I was calling new Thing() too often.

[conted…]

That said, don't go crazy with decomposing your domain models. A Car doesn't intrinsically need to have a collection of Wheel objects. It might just need to know the number "4" (number of wheels). Wait until there is behaviour or data needed for the wheels, and make sure to keep those elements separate in your Car class. At some point if there's a non-trivial proportion of data and/or behaviour around the Wheel implementation, or you need another sort of Car that has Wheels with different (but similar) properties: then extract the Wheels into a separate class.

Making sure you have all your domain requirements tested makes this sort of refactoring much safer, so that one can continually engineer and improve one's domain design without worrying too much about breaking our client's requirements.


Andreas followed up with an observation of falling into the trap of using bad practices when writing ones code, due to not knowing what the good practices are (my wording, not his). To this, I've just responded:

I think doing stuff that is "bad practice" when you don't know any better is fine; it's when you do know better and still follow bad practice for invented reasons (fictitious deadlines, mythical complexity, general "CBA") that's an issue.

That said one can mitigate some of this by actively improving one's knowledge of what is good and bad practice. This is why I advocate all devs should - at a minimum - have read these books:

  • Clean Code - Martin
  • Head First Design Patterns (second ed) - various
  • Test-Driven Development by Example - Beck
  • Refactoring - Fowler

There's other ones like Code Complete (I couldn't wade through it I'm afraid) and The Pragmatic Programmer (am about 20% of the way through it and it's only slightly engaging me so far) which other people will recommend.

One should make sure one is comfortable with the testing framework of choice for one's environment, and testing one's code. Either before via TDD or even just afterwards is essential to writing good stable, scalable, maintainable code.


I'm pretty sure there's not an original thought here, but hey: most of my writing is like that, eh? Anyway, there you go.

Righto.

--
Adam

Thursday 29 April 2021

Definitions of ~

G'day

I'm just building on some thoughts here. Some of the thoughts are from learning from people who know a lot more than me, and from watching my teams through their successes and… "less than successes".

When we work, whether we realise it or not, the intent is to deliver value to some client. The client could be an external user who we would like to hand money over to us as a result of our efforts. Or the client might be the boss and their latest revenue-making idea. Or the client might be our own application that is need of some love because the codebase is not exactly as good as we'd like it to be. But there is always a client, and the end result of our work needs to benefit the client. Sometimes it's hard to get a handle on what the client wants, and what exactly we need to do to solve their problems and to add value.

But all our work needs to add value. And that value need to be able to be both measured, and to be realised.

To do this, when we set-out to do & deliver some work, we need a coupla benchmark points along the way. Most notably before we even agree to start, a "Definition of Ready"; and before we agree it's finished: a "Definition of Done".

My notes below are from an engineer's perspective, and there's no-doubt some nuance I'm missing, or the points are specific to situations I've been in. It's not really a generic template, but a theoretical example of what might go into these documents. But to get our work done, I figure we ought to be considering some super/sub -set of these kind of things. These points represent a statement of obvious, and an aide-mémoire for stuff that might be less obvious.

Terminology

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC-2119. I like including this, because it makes sure these words that might be read as being fairly loaded are just used as a reference vocab. I'm not looking at you when I say this.

Definition of Ready

  • A story should represent effort that SHOULD fit in one sprint (ie: from "Ready" to "Done" in one sprint).
  • Within the above constraint, some estimation of complexity, risk and effort SHOULD be indicated.
  • A Behaviour Driven Development (BDD) phraseology SHOULD be used for User Stories. This is so we actively and deliberately make clear each case that need to be addressed to fulfil the requirement, and to reduce ambiguity.
  • Inbound and outbound parameters (eg for a new web page or API end point) incl. route slug, query string params, request method, POST body, headers, cookies, anything else) MUST be clearly defined, where the definition MUST include any validation constraints (eg: "[some date] must be in the past").
  • Where relevant, outbound status codes SHOULD also be detailed (200-OK, 201-CREATED, 400-BAD-REQUEST etc). For non-2xx responses, error information (or lack thereof when there are TMI concerns) SHOULD be defined if appropriate.
  • Logging requirements of non-2xx statuses SHOULD be defined. EG: 404s probably nothing. Some 400s: maybe? 401 & 403: probably? 5xx: definitely.
  • Interactions with external systems - eg DBs - SHOULD be considered.
  • If the story represents partial work, consideration MUST be made as to how it will be deployed in an "unavailable state" to external uses (eg: what feature toggles are in place, and the criteria for toggle on/off).
  • Similarly, consideration SHOULD be made as to how the partial work can be internally QAed (eg: feature toggle rules include the environment, so that features behave "live" in a QA environment).
  • For stories that are investigative rather than development-oriented, BDD phraseology SHOULD still be used.
  • For investigative stories, there MUST still be a deliverable ("we have a clear idea how to integrate Varnish on the front end to remove load from the application servers on seldom-changing content"). The deliverable MUST be tangible (eg: written down and filed), not just "oh just google some stuff and get an idea how it works".
  • For stories related to bugs: steps to reproduce, expected behaviour and actual behaviour, any error messages and log entries SHOULD be included in the ticket detail, if possible.
  • For stories relating to UI changes, an example of how it will look SHOULD be included on the ticket. This MAY NOT be pixel-perfect, just indicative.
  • Engineering MUST have an idea of how any legacy code changes will have automated testing. This is because we accept legacy code has not been written with testing in mind, but we still MUST test our work.

One thing to note here is that all this stuff requires collaboration from the client and the squad. Some stuff is business need, some stuff is technical fulfilment. For example one would not expect the client to know about DB rollback minutiae, but it's our job to know that it might need to be done, and how to do it. We are their technical advocates and advisors. And gatekeepers. And backstops. And there might be a toner cartridge needing to be changed. Shrug. You know the deal. But it's an important deal.

But anyway… if some work meets a "Definition of Ready", then a squad MAY consider it to be added into the workflow for an upcoming release.

For some work to be considered completed, then we need to consider if it's been "done".

Definition of Done

  • There SHOULD be development guidelines (whatever they are) and they MUST be followed.
  • There SHOULD be automated code quality checks, and they MUST pass.
  • All new code SHOULD have automated tests, and test coverage MUST pass.
  • Code relating to other processes such as DB updates and rollbacks SHOULD be in source control and/or readily to hand.
  • In feature-toggled situations, both toggle-on and toggle-off states MUST be tested.
  • Work MUST have been QAed and actively signed-off as good-to-go.
  • Work MUST be actively accepted by the client (whoever the client is, and that could well be a colleague).
  • Work that is under feature toggle MAY be considered "done" even if toggled off, provided it's passed QA in the "off" state.
  • Work MUST be in production to be considered "done".

The bottom line here is particularly relevant for devs: you've not finished your work until it's live. It's not done when you've finished typing. It's not done when it's in code review. It's not done when it's in QA. It's not done when yer bored of it and would rather be doing something else. It's only done when the client sees the work and accepts it. You are on-the-hook until then, and it's your responsibility until then. You should be actively chasing whoever is necessary to get your work into production and earning value, because if it's not: there's no point in having done it.

For the squad as a whole, every person has a part in seeing the squad's work seeing the light of day. Make sure you're doing what you can to expedite work getting in front of the client and to help them either sign it off, or kick it back for revision.

There will be more factors on both sides of this table, measuring whether work ought to be done, or is done. My point here is more that it needs to be a series of active checks along the way. We don't undertake work if we can't see the light at the end of the tunnel, and we don't actually know it will benefit our client. And we need to think about this all the way.

Let me know what other considerations there might be in this. I'm very web-app-centric in my programming / business / leadership (such as it is) exposure, and there's probably more - of fewer! - things to think about.

Righto.

--
Adam

Monday 7 May 2018

In defence of public properties

G'day:
Recently I was re-writing one of our cron job scripts. It's the sort of thing that runs at 2am, reads some stuff from the DB, analyses it, and emails some crap to interested parties. We usually try to avoid complete rewrites, but this code was written... erm... [cough insert some period of time ago when it was apparently OK to write really shit code with no code quality checks and no tests]... and it's unmaintainable, so we're biting the bullet and rewriting coherently whilst adding some business logic changes into it.

The details for the email - to, from, subject, data to build the body from - come from three different sources in total, before sending to the mailer service to be queued-up for sending. So I knocked together a simple class to aggregate that information together:

class SpecialInterestingReportEmail {

    public $from;
    public $to;
    public $subject;
    public $body;

    public static $bodyTemplate = "specialInterestingReportEmail.html.twig";
    
    function __construct($from, $to, $subject, $body) {
        $this->from = $from;
        $this->to = $to;
        $this->subject = $subject;
        $this->body = $body;
    }
}

I then have a factory method in an EmailMessageService (not entirely sure that's the best name for it, but [shrug]) which chucks all the bits and pieces together, and returns an email object:

function createSpecialInterestingReportEmail($special, $interesting) {
    $body = $this-twigService->render(
        SpecialInterestingReportEmail::bodyTemplate,
        [
            "special" => $special,
            "interesting" => $interesting
        ]
    )
    return new SpecialInterestingReportEmail(
        $this->addressesForReport->from,
        $this->addressesForReport->to,
        $this->reportSubject,
        $body
    );
}

I don't think this design is perfect, but in the context of what we're doing it's OK.

Notice:
All interaction I mention below with my colleagues is paraphrased, embellished, or outright changed to make my point. I am not quoting anyone, and take this as a work of fiction. It's also drawn from previous similar conversations I've had on this topic.

One of my code reviewers was horrified:

Code Reviewer: "You can't have public properties! That's bad OOP!"
Me: "Why?"
CR: "You're breaking encapsulation. You need to do it like this [proceeds to tutor me on how to write accessor methods, because, like, I needed a tutorial on design anti-patterns]".
Me: "Why? What are we gaining by making those properties private, just to force us to write a bunch of boilerplate code to then access them? And how is that a good change to make to this code that now exists and is fulfilling the requirement? This is code review... we've already done the planning for this. This class is just a single container to aggregate some data into a conceptual 'email message' object. It doesn't need any behaviour - it doesn't need any methods - it's just for moving data between services".
CR: "But you can't break encapsulation!"
Me: "Why not? Other than parroting dogma read from some OOP 101 book, why is this a problem? And how's it solved by accessor methods?"
CR: "Well someone could directly change the values to be wrong!"
Me: "Well... they probably shouldn't do that then. That'd be dumb. They could equally put the wrong values straight into the constructor too. It'd still be just as wrong. Look... we don't need to validate the data - that seems to be your concern here? - as it's just a script reading known-good values from the DB, and sending them to our email queue. The code's right there. There's no scope for the data to accidentally be wrongified. And if it was... the tests would pick it up anyhow".
CR: "But what if other code starts using this code?"
Me: "What... yer saying 'what it some other application starts using this cronjob as some sort of library?' Why would that happen? This is not a public API. It makes no pretence of being a public API. If anyone started using this as an API, they deserve everything they get".
CR: "But they might".
ME: "OK, let's say some of this code is gonna be needed somewhere else, and we need to make it into a library. At that point in time, we'd extract the relevant code, consider stuff like encapsulation, data hiding, providing a public interface etc. But that would be different code from this lot".
CR: "But you should write all code like it will be used that way".
Me: "Again: why? This code is not going to be used this way. It just isn't. And anyhow, what yer suggesting is YAGNI / coding-for-an-unknown-future anyhow. We don't gain anything for the purposes of the current requirement chucking pointless boilerplate code into these classes. That's not an improvement".

And it went on.

One problem I encounter fairly frequently - both at work and in the wild - is people who will read about some concept, and thenceforth That Concept Is Law. Someone has Said It, therefore it applies to every single situation thereafter. They don't ever seem to bother trying to reason why the "law" was stated in the first place, what about it makes it a good rule, and when perhaps it's not necessary.

I look at these things and think "will this benefit my current task?". Generally it does because these things don't acquire acceptance without scrutiny and sanity-checking. But sometimes, however, it doesn't make sense to follow the dogma.

In this situation: it does not help.

In a different situation, if I was writing a separate API which handled the Email object creation, and other apps were gonna use it, I'd've possibly tightened-up the property access. But only possibly. My position on such things is to be liberal with what one permits to be done with code. If all my theoretical accessor method was gonna achieve was to return a private value... I'd really consider just leaving it public instead, and allow direct access. Why not?

There's a risk that later I might need to better control access to those properties, but... I'd deal with that at the time: these things can be handled as/when. It's even easy enough to have transitionary code from direct-access to accessor-access using __get and __set. I know these are frowned-upon, but in transitionary situations: they're fine. So one could seamlessly patch the API for code already consuming it via direct access with that transitionary approach, and then in the next (or some subsequent ~) version, make the breaking change to require using accessors, eg:

v1.0 - direct access only.
v1.1 - add in transitionary code using __get and __set. Advise that direct access is deprecated and will be removed in the next release. Also add in accessor methods.
v2.0 - remove direct access.

It doesn't even need to be v2.0. Just "some later version". But for the sake of making sure the transitionary code is temporary, better to do sooner rather than later. The thing is that software versioning is there for a reason, so it's OK to only introduce necessary coding overhead when it's needed.

Another thing that occurred to me when I was thinking about this. Consider this code:

$email = [
    "from" => "from@example.com",
    "to" => "to@example.com",
    "subject" => "Example subject",
    "body" => "Example body"
];

$fromAddress = $email["from"];

Perfectly fine. So how come this code is not fine:

$email = new Email(
    "from@example.com",
    "to@example.com",
    "Example subject",
    "Example body"
);

$fromAddress = $email->from;

Why's it OK to access an array directly, but it's not - apparently - OK to give the collection of email bits a name (Email), and otherwise use it the same way?

I can't think of one.

Rules are great. Rules apply sensibly 95% of the time. But when one reads about a rule... don't just stop there... understand the rule. The rules are not there simply for the sake of enabling one to not then think about things. Always think about things.

Righto.

--
Adam

PS: completely happy to be schooled as to how I am completely wrong here. This is half the reason I wrote this.

Sunday 23 July 2017

PHP: A function that returns two different things

G'day:
I'm gonna plagiarise one of me own answers to a Stack Overflow question here, as I think it's good generic advice.

The question was "PHP 7: Multiple function return types". The subject line sums it up really. The person here wants to return a value or false from a function. I see this a lot in PHP code. Indeed... PHP itself does it a lot.

My answer was as follows:

[...]

From a design perspective, having a function that potentially returns different types of result indicates a potential design flaw:

  • If you're returning your result, or otherwise false if something didn't go to plan; you should probably be throwing an exception instead. If the function processing didn't go according to plan; that's an exceptional situation: leverage the fact. I know PHP itself has a habit of returning false if things didn't work, but that's just indicative of poor design - for the same reason - in PHP.
  • If your function returns potentially different things, then it's quite possible it's doing more than one thing, which is bad design in your function. Functions should do one thing. If your function has an if/else with the true/false block handling different chunks of processing (as opposed to just handling exit situations), then this probably indicates you ought to have two functions, not one. Leave it to the calling code to decide which is the one to use.
  • If your function returns two different object types which then can be used in a similar fashion in the calling code (ie: there's no if this: do that; else do this other thing in the calling code), this could indicate you ought to be returning an interface, not a concrete implementation.
There will possibly be legit situations where returning different types is actually the best thing to do. If so: all good. But that's where the benefit of using a loosely typed language comes in: just don't specify the return type of the function. But… that said… the situation probably isn't legit, so work through the preceding design considerations first to determine if you really do have one of these real edge cases where returning different types is warranted.

I'll add two more considerations here.

There's another option which is kinda legit. In PHP 7.1 one can declare the return type to be nullable.

Consider a function which potentially returns a boolean:

function f($x) {
    if ($x) {
        return true;
    }
}

We can't declare that with a return type of bool as if our logic strays done that falsey branch, we get a fatal error:

function f($x) : bool {
    if ($x) {
        return true;
    }
}

f(1); // OK

f(0); // PHP Fatal error:  Uncaught TypeError: Return value of f() must be of the type boolean, none returned


But in 7.1, one can declare the return type as nullable:

function g($x) : ?bool {
    if ($x) {
        echo "returning true";
        return true;
    }

    echo "returning null";
    return null;
}

g(1); // returning true
g(0); // returning null

Now sometimes this is legit. One might be looking for something from some other resource (eg: a DB record) which quite legitimately might not exist. And there might not be a legit "null implementation" of the given class one is wanting to return: for example returning the object with some/all of its properties not set or something. In this case a null is OK. But remember in this case one is forcing more logic back on the calling code: checking for the null. So not necessarily a good approach. I'd consider whether this situation is better handled with an exception.

The last consideration is that one can still specify multiple return types in a type-hint annotation, eg:

/**
 * @param bool $b
 * @return A|B
 */
function f($b){
    if ($b) {
        return new A();
    }else {
        return new B();
    }
}


This doesn't actually enforce anything, so it's mostly a lie, just like all comments are, but at least one's IDE might be able to make sense of it, and give code assistance and autocomplete suggestions when writing one's code:



See how it offers both the method from an A and a B there.

Still: I really dislike these annotations as they're really just comments, and especially in this case they're lying: f can actually return anything it likes.

Right... must dash. I am supposed to be down @ my ex's place hanging out with my son in 10min. I'll press "send" and proof-read this later ;-)

Righto.

--
Adam

Saturday 23 January 2016

Dependency injection strategy discussion

G'day:
Our dev department is spread across two campuses: at our head office in Dublin, and the one I'm based at in London.

Yesterday (Friday) the London bods had our weekly Dev Team catch-up, to discuss various issues that cropped up during the week which need more concerted discussion than an "ad hoc make a decision and get on with it" situation might resolve.

One of the chief discussion points is our dependency injection strategy, and the question was raised if we're doing it right, as it seems there were a few code smells creeping in in places.

As coincidence would have it... when we got out of the meeting a few of us had an email in our inbox from one of our mates in Dublin who was amidst much the same dilemma in some code he was working on.

This resulted in a further discussion between the London people, rounding out the afternoon. I think we clarified a few things really well, re-formalised our DI strategy in our heads, and identified some code we need to look at to work out what the actual cause of the smell is. What was really good is at the end of it we all agreed with where we got to, and the direction to take. As we have a few people who like holding fair strong opinions in the team ([raises his hand gingerly]), this is encouraging.

The gist of the conversation is worth repeating I reckon, and I hope the relevant parties don't mind. And I reckon it shows our team in pretty good light so I reckon the bosses won't mind either. I'm sure I'll find out if not!

OK, so - as you probably know - we're a PHP shop. For the main project I work on (www.hostelbookers.com) - we have just finished retiring the old ColdFusion-based site ("The end of ColdFusion"), replacing it with a PHP one running on the Silex micro-framework. Silex is built around Pimple, Sensio's DI Container. I've written a bit about Silex and Pimple if you want to get the gist of it.

Very quickly, Pimple implements DI via the notion of service providers, one of which might look like this (this is sample code from an earlier blog article: it's not code from our own app, nor is any code in this article):

namespace \dac\silexdemo\providers;

class ServiceProvider implements ServiceProviderInterface {

    use Silex\ServiceProviderInterface;
    use Silex\Application;
    use \dac\silexdemo\beans;

    public function register(Application $app){
        $app["services.user"] = $app->share(function($app) {
            return new services\User($app["factories.user"], $app["services.guzzle.client"]);
        });        

        $app["services.guzzle.client"] = function() {
            return new Client();
        };
    }
    
    public function boot(Application $app){
        // no booting requirements
    }
}


There's a register() method within which one defines all the objects and their dependencies, and a boot() method for calling code on a a defined object to be run before it's used. As all of these are defined using closures, no objects are actually created until they need to be used. Cool.

In this example I'm defining some services, but we define everything via service providers: services, controllers, repositories, factories, helpers etc.

We're defining the user service object there, and you can see from that that its constructor expects a coupla dependencies:

function __construct($userFactory, $guzzleClient){
    $this->userFactory = $userFactory;
    $this->guzzleClient = $guzzleClient;
}

From there, all the code in UserService can use the dependency to help out with its logic. Fairly standard stuff.

The chief conceit here is that we definitely use configuration over convention, and we configure our objects explicitly with the dependencies they need. UserService needs a UserFactory and it needs a GuzzleClient, so we specifically pass those in.

Aside: a word on configuration over convention

I really dislike the "convention" approach to pretty much anything like this... DI config... routing config etc. The reason being that the framework should simply busy itself with frameworking stuff, it should not be dictating to me how I write my code. That is not the job of the framework (be it a DI one or an MVC one). Also conventions are very opinionated, and we all know what is said about opinions. Frameworks that implement convention over configuration position the code as if the framework is the central element to the application, wherein it really ought to simply be some code that sits off to one side, and just gets on with it. My application is what is central to the application.

People will claim that it's easier to use convention of configuration, but I don't believe "easier" should be a primary consideration when designing one's app. Especially when it's "easier to use the framework", rather than "intrinsically good design".

But I digress.

This is fine, and there's no alarm bells ringing there.

Well one slight alarm might that say the object we're defining might have some actual OO type properties. This is a really contrived example (not to mention wrong) example, but this illustrates it:

Saturday 22 November 2014

My answer to a question on Stack Overflow about DI

G'day:
I'm re-posting this here, hoping to get feedback from the DI doyens like Sean, Brad and Luis. Plus it's an easy article to post whilst I get together my article about Dave's Python code (from the quiz: "Something for the weekend? A wee code quiz (in CFML, PHP, anything really...)").

The question is here: "Dependency injection is only for testing?", and I'll reproduce it here (I'll ask the OP if this is OK, and remove/summarise if not):

Thursday 9 October 2014

Proposed TDD logic flow

G'day:
I've been really busy this week, and haven't been able to discover much interesting about either PHP or CFML or anything, hence being quite quiet. I'll admit this is very much a filler article, and a bit of a cheeky nod to something Andy said the other day:



Earlier I asked for people's opinions regarding TDD vs private methods: "The received wisdom of TDD and private methods".

I didn't get much feedback [scowl], but thanks to Dom and Gerry (there's a cat 'n' mouse joke in there somewhere) for offering up some thoughts.

I needed to provide a workflow for the team, and I thought I'd stick it up here as well, as a bit of closure on the previous article. And to give Andy a picture to look at.

What do you think of this approach (other than "unreadable at that size". Click here):


Forget the first two steps "Assign Ticket", etc, as that's just our Jira workflow and a bit of context for our peeps, but the rest of it after that. Also the associated process of how to maintain private methods whilst still adhering to TDD.

I think there's a reasonable mix of pragmatism and dogmatism in that?

Thoughts?

--
Adam

Monday 29 July 2013

Javascript: explicit arguments or config object? When to use which?

G'day:
Just to go off on a tangent... I'm doing Javascript this week, so the puzzling stuff I encounter will probably be JS rather than CFML.

Whilst I know a decent amount about CFML, my JS is... limited. I know enough to get into trouble... I know enough to know when I'm in trouble... but not necessarily enough to get me back out of trouble again.

Monday 15 April 2013

Do you have your code reviewed?

G'day:
I've just been chatting to Chris Weller this afternoon about various odds 'n' sods, and the topic of code review came up. I recalled that Stack Exchange has a code review sub-site, but the CFML presence on it is pretty limp: 12 questions in two years. I think it's a pretty good idea though. I've added it to my RSS feed, and see what people post. And I'll put my oar in as needs must ;-)

Sunday 24 March 2013

Hungarian Notation (revisited)

G'day:
Ages ago (in the context of the lifetime of this blog, anyhow... Aug last year)  I wrote an article on my thoughts regarding Hungarian Notation. This fell back on my radar as someone has just posted a comment against that article.

Friday 21 December 2012

Need help? Know how to ask for it

G'day:

This article is borne out of exasperation rising from dealing with people on StackOverflow. But this time not the nazi muppets who like to think they are in a position of authority (and they're at it again today…), but the people asking the questions. To be fair: not all the people asking the questions, but a reasonable percentage of them. An uncomfortably high percentage of them. And the same extends to people asking questions on the Adobe ColdFusion forums, and other similar environments created for people to seek and get help.

I never cease to be amazed at how bloody useless people are at asking for help. And continue to marvel (not in a good way) at how disconnected from reality people must be to approach asking for help the way they go about it. And even before getting to the point of asking for help, how lacking in gumption people can be when it comes to trying to solve their own problems before expecting someone else to pitch-in and help them.

Before I start ranting too much (some might say "too late!" but I've only just started, trust me)… I enjoy participating on help forums. I've been doing it since I first located the alt.comp.lang.coldfusion Usenet group in 2001 (I started with CF in late 1999).  I enjoy the feeling I get when I've successfully helped someone move past a problem (both for myself, and knowing that the other bod will be pleased to move on from this stumbling block they'd encountered); I like the selfish sense of satisfaction I get when I work something out that someone else couldn't work out (I'm not proud of this, but I'm a realist, and it is a factor); most of all I like it when I initially don't know the answer to something I'm trying to help with, and end up learning something as a result. It also helps the ColdFusion community keep ticking over, which Adobe kinda leave up to us to do, so that's gotta be good for ColdFusion's viability going forward. Albeit in a very small way. But community activity is important to the viability of the language, I think.

Anyway, as much as I enjoy it, boy do I find it irritating at times. You probably got that already.

OK, so what's the problem?  Actually the problem is basically described by inference in this distillation of how to ask questions the smart way. It should be required reading for anyone signing up to a help forum. It should be required reading for everyone working in technology. Go read it. Now. But here's my take on where people go wrong when they encounter a problem they don't know the answer to, and where they start going wrong right from the outset.

Identify the problem

If you encounter a problem - like getting CF error on your screen - approach solving it coherently. A lot of people when they get bamboozled by an error simply start trying random things to try to solve it: "I'll restart CF and see if that helps". "I'll use a list instead of an array". "I'll just rewrite it, maybe it'll go away". This is a daft approach to solving things. Even if the problem does happen to go away, you probably won't end up knowing what the problem was, so you really can't be sure it's not some sort of transient problem that didn't just not occur after your remedial action because of some contributing factor that was not present at the time you tested. Plus isn't it nice to know what caused a problem so you don't do it again, rather than just masking it?  Also generally random actions like this don't help, and just make things worse, so attacking things incoherently is just a waste of time in a situation in which you're already probably not wanting to "invest" your time. This just increases frustration.

I use scientific method when I'm faced with a problem I can't nut out. Well a layperson's version of it anyhow. The process I use is basically this:

  1. Make an observation, eg: the manifestation of the error on the screen, or the unexpected results, or whatever it is.
  2. Form a hypothesis of what could be causing the error ("ah… that application-scoped variable has vanished").
  3. Identify the manifestations / side-effects of the situation being hypothesised ("well if that variable has been blitzed, I'll be able to tell by looking in this other place").
  4. Formulate an experiment to test the hypothesis, giving consideration to the bits identified in (3), as well as making sure the results of the experiment will conclusively demonstrate that the hypothesis is borne out. EG: simply restarting CF is not going to be a good experiment for testing a missing application scoped variable, as it doesn't actually identify what caused the problem in the first place. Basically you need measurable results (not just a nebulous change in state).
  5. Perform the experiment, and aggregate the results & any side-effects noticed from the experiment, which might need to be tuned-out in subsequent round of experiment.
  6. Draw a conclusion based on the results. Do the results of the experiment bear out the hypothesis? Quite possibly not, but they should help to finetune the next test of the hypothesis, or alter the hypothesis itself.
  7. Rinse and repeat as necessary.

Now I'm not suggesting one needs to don a labcoat and deploy the pocket protectors to do all this (but hey, if that's your outfit already, all good ;-).  All it means is stop and give some thought to what you're doing, work out a coherent way to test the situation and approach things logically.

Most people I know don't do this, they just throw their hands up in despair (or in a jazz-esque sort of way) and post all their code to StackOverflow going "doesn't work".

And most people really struggle with step 4 there. Judging by the code people end up posting on the the forums, they don't generally have a concept of a portable reproduction case. If - for example - you are having an issue with a JDBC error coming back from a <cfquery> tag, the only things that are relevant are:

  • the SQL statement being passed to the DB driver;
  • any variables that contribute to the construction of said string;
  • the values of any parameters being passed;
  • the data source (and, by inference, which DB server one is using, eg: Oracle, MySQL etc);
  • depending on the nature of the error, the schemas of any tables etc being queried;
  • the error message
  • what the expected results were ("this query is supposed to return [whatever]" or similar)

What is not relevant to any of this is:

  • any code after the <cfquery> tag
  • any code before the <cfquery> tag that don't contribute to the points above;
  • never ever ever any "code" other than CFML. HTML is never going to be relevant to a CF error.

But what I often see posted to forums (or maybe in pastebin etc if the person is remotely sensible in that regard) is the contents of the entire file the error occurs in, including all mark-up, CSS, JS, and unrelated CFML. The immediate effect this has on me is to think:

  • they've done bugger-all of their own troubleshooting if they haven't actually factored-out any of that from contributing to the situation;
  • it's not occurred to them that by posting all that cruft, the person reading the issue is kinda forced to wade through it if only to find the bit they're talking about;
  • if they can't be arsed doing any troubleshooting, why would they think I should be arsed either?

Now… back to my mention of a repro case. What one should do in this situation is pull the code that's causing the problem completely out of the context it's running in, mock the variables that it uses as inputs to perform its logic, decide what one would expect to get as a result, and then run the code. There will be three possible results of this:

  1. If the code works, then one's assumptions as to what the problem is are actually incorrect.  It suggests that one of the inputs is not what one thought it is. That needs to be tracked down and then shift the repro case to that code to work out what the story is. This is forward progress because it has fine-tuned one's understanding of the issue.
  2. If the code doesn't work in a different way from the initial error then one hasn't replicated the inputs properly. This needs to be fixed, and then the experiment repeated. In doing this, it will clarify what's going on and what's contributing to the problem. By the very nature that one's initial baseline expectation of the inputs and the results were off suggests that there's something missing from the analysis of the situation. This is a forward step in solving the problem.
  3. If the code errors in exactly the same way as it was in its original context, then one has a repro case, and one has greatly reduced the factors that could be contributing to it. From here, it's generally pretty easy to work out what's wrong. Almost all issues that initially seemed "weird", "like a bug in ColdFusion", or "just don't make any sense" get solved at this point.

If one is still at stages 1-2, then one is not yet ready to post to a forum for help. Once one gets to stage 3 - and still hasn't cracked the problem - one's got a nice terse piece of code with known inputs and outputs which can be posted to a forum. At this point one will have a far better understanding of what's going on, can articulate the experimentation that has already been done and discarded as not contributing, and there's a small piece of code that someone trying to help can copy & paste down to their local system and run without having to horse around too much. It's also demonstrated that a reasonable amount of effort has already gone into trying to sort it out one's self, which will be a positive sign to anyone thinking of helping. Plus a lot of potential help suggestions will have already been tested and discarded, saving the people helping some time. It's an all-round win.

Also, if perchance the repro case demonstrates a ColdFusion bug, you've already got a repro case to send to Adobe. Excellent.

Describe the problem

A lot of this will have been covered already if the approach above has been taken. But what I mean here is too many people start a thread / question and basically say something along the lines of "I've got this code and it doesn't work. Why doesn't it work?"  The issue here is the person reading it doesn't know what constitutes "not working". It could mean there's an error. But one still doesn't know what sort of error (compile error, logic error, SQL error, etc); it could mean one was expecting ten rows back from the DB and is only getting one; it could mean the table is outputting ten rows by five columns when one wanted five rows of ten columns. The point is: one needs to identify what constitutes "not working". Seriously, this should be bloody obvious, but for some reason it is not.

Furthermore, if one has got an error, one needs to post the bloody error message. Do not describe the error (yes, people do this: "oh I got some error about the array being wrong").  And if there's code involved (OK, obviously with an error code will be involved! ;-), post the code and identify which line the error is saying is a problem.

Pay attention

Actually… let's back up. Before posting the error… read the bloody thing. So many questions I see are asking what the problem is where the error message is saying exactly what the problem is, and is identifying the precise line and column of the line the problem is at (usually this is with compile errors). Even if they're not that explicit, ColdFusion errors are generally very descriptive, and do pretty much identify what's gone wrong. I also encounter people who say "oh, I never understand that stuff" (seriously!), but when challenged and I say "OK, which word are you struggling with?" and ask "OK, even if you've not heard the term before: what do you think a 'complex object' [for example] might be?" they realise that it's just a case of never having bothered paying attention to the error message, rather than it being impenetrable in some way. I would say 80% of errors I get asked about can be solved by simply reading the error message.

There is no such thing as telepathy

People also have a tendency to assume that just because they know something (or are aware of something about their environment) that everyone else magically will. This is not the case. No-one else is privy to the internal mechinations of your mind, nor with they know your enviroment (or even your ColdFusion version, etc). One needs to make sure that any environment consideration or other peripheral factor needs to be actually articulated. People might be able to guess, but people probably actually can't. But either way: they shouldn't be made to guess. If nothing else it's a barrier to them helping you, which is counterproductive for you.  Consider this problem (this is actually something that I have had to deal with in the past):

SELECT        *
FROM        someTable
WHERE        someColumn = #someValue#
ORDER BY    anotherColumn
LIMIT        5

My DSN is MySQL, but this errors for some reason. If I run the same query in MySQL it works fine.

The problem isn't the filter or the table schema or anything like that. It's that this code wasn't a DB query, it was a QoQ on an earlier-fetched record set. The initial data came from MySQL, sure… but it didn't occur to the person that a QoQ is not run on the DB server, it's run on CF's own little SQL stub thing, and that doesn't support LIMIT.

Now the person should have just read the damned error message (and posted it), but at the very least they should have mentioned that this wasn't a DB query.

There is no such thing as telepathy (redux)

Another thing you need to do is to research the issue, once you've identified it. This means googling error messages, programming constructs you're struggling with, and anything else relevant. And if one still needs to ask for help (TBH, googling will solve 80% of problems: very few problems are unique), then part of your question should also detail what sort of research you've already done. Because if you don't mention it, then the first valid sugggestion one ought to offer you is "did you google it?". Some of you might think "well of course I will have researched it first, that goes without saying", but let me assure you: you're the good 'uns. Most people are bloody useless, and all they do is use StackOverflow and other forums as a kind of mechanical turk: to try to get other people to be their search engines. So let us know what you've already found. Also let us know what things you've found which might seem like a solution, but turned out not to be: this will save us making the same suggestion to you, and it's also useful information to fine-tune your issue.

Most errors are your fault

A lot of people posting questions start off from a fairly defensive position that whatever is going wrong, it's not their fault. They'll say "I don't know what's wrong, because my code works fine", or "I've found a bug in ColdFusion".  I'm sorry, but almost always this is not the case. 99% of the time… perhaps more… it's a problem with the code or the approach, or something like that. ColdFusion might have a number of bugs, but they're generally pretty esoteric these days, so the odds of someone who is having a problem with a listAppend expression having suddenly found a new bug in listAppend is rather small.

The problem I have with this way of thinking is that it's much easier to discover what a problem is if one starts with the mindset that it's one's self who's cocked-up, rather than someone else or Adobe. Firstly because it almost certainly is them, but secondly because is one is troubleshooting in a defensive way and going "well I know that works, so it can't be that", one's not going to spot the fact that actually it is that code that's got the problem. I think it's also a logical fallacy that "well my code works, so it can't be that, it must be something else", because the evidence that has lead one to decide something is wrong will be manifesting from a line in one's own code, so - really - even on first indications that's probably where the problem lies.

Approach this stuff with… um… an open mind (or just a neutral mind… just not a closed mind), and one will be in the best place mentally to find the problem.

Also one comes across as a bit of a cock if one starts asserting it's a bug in CF but it happens to be that they don't know the difference between listFind and listContains (or just didn't spot they'd used the wrong one). And it's best to not look like a cock if possible, yes? ;-)


Well that was a nice rant for a Friday just before Xmas, wasn't it?

I'm off to Ireland from tomorrow through until Dec 27 - I get to see my son for a dozen or so hours in that time… thankfully 2h of which is on Xmas Day - so I dunno how much attention this blog will get across that period. On one hand I'll have my laptop with me, but on the other hand finding internet connectivity is tricky. But I'm mostly stuck out in the middle of nowhere with little else to do, so perhaps wittering on will pass the time. Who knows?  I'll at least get part 3 of the regex stuff posted at some stage in the next coupla days.

In case this is it before you head off to do festive things that are more interesting than hanging on my every blog-written word, have a good break. And make sure to eat too much. And have a beer for me.

Merry Xmas.

--
Adam

Thursday 6 December 2012

Constants for CFML

G'day:

Updated 2024-08-30

Reworded to avoid my misuse of the word "singleton" which I was using to mean "an object that is created once and reused", which is not a singleton. A singleton is about implementation, not usage. This was pointed out to me by John Whish of the CFML community, during a conversation we were having about misuse of that term.

This one is just an idea I had for ColdFusion 11 (or Railo... just "for CFML" really). Well it's not an idea that's particular to me, nor is it a new one. Or an earth-shattering one. Or one that will sell any licences in and of itself. It would be handy though.

CFML has no notion of named constants, EG:

const PI = 3; // near enough ;-)

Whilst one can created variables, not all values are supposed to be variable. For example π doesn't change: it's not a variable. It doesn't vary.