Wednesday 5 October 2022

DRY: don't repeat yourself


This should be a short one. I've had this text lying around for a while, wondering if I could spin it out to be a longer article somehow, but I never managed to work out how. Then a few days ago I needed to point to something about "DRY" (so I didn't have to… erm… repeat myself), and got annoyed that I didn't have this article already. I guess articles don't need to be long, if the point itself doesn't need much discussion. So here goes.

Everyone has heard about the DRY Principle, I would hope. From that same article on Wikipedia:

The DRY principle is stated as "Every piece of knowledge must have a single, unambiguous, authoritative representation within a system".

The articles goes further to explain it more thoroughly, but I think that's about as far as most intermediate-level devs ever read, and they get the wrong end of the stick. They perceive it to mean "never write the same code more than once", and they stick to that dogmatically, de-duping their code at every step and making an unholy mess of unnecessary complexity along the way. But this is not what the DRY Principle means. DRY is not about code, it is about complexity of concepts. Any complexity should be extracted/refactored/de-duplicated so as to reduce complexity. I just found a handy quote explaining DRY well:

the DRY Principle is actually about removing duplicate ideas and not duplicate code

Steven Solomon in Don't Repeat Yourself Is Misunderstood

If one takes three very similar pieces of simple functionality and extracts and combines them into one more complex piece of functionality that serves all three original usages in a generic fashion, that is likely to be increasing the complexity of the original code (and the extracted code). That's not a good application of the DRY principle. For one thing you now have three places in your codebase that are only similar, but have different variations, all coupled together. This violates the Single Responsibiltity Principle, for one thing.

Another way of saying much the same thing (now that I re-read it) is that just because multiple pieces of functionality seem superficially the same doesn't mean they are the same, and accordingly need to be the same implementation. For example if three pieces of code need measure off the minutes in a day, it's OK for each of them to store the value 1440. We do not gain anything from extracting that into a AmountsOfTime::MINUTES_IN_A_DAY constant. It's quite possibly just coincidence that each piece of code is using "minutes in a day" for whatever they are measuring at that point in time, and tightly coupling them together is not appropriate. One might be "minutes until the task should run again", the other might be "minutes to cache that thing", and the other might be… oh, I dunno: "minutes until the same time tomorrow" (OK that's dumb). Here the value is the same, and it's measuring the same thing, but the purpose of each is not actually inter-related. So it's OK to duplicate the simple concept of 1440 is the number of minutes in the day, however label it with why we need to know that value, not simply "what the value is". Digression: I guess this is actually the same motivation behind not writing comments that describe what the code does, eg:

Pointless waste of time:

// re-run the task in 1440min
reRun(theTask, 1440)

Actually useful:

// don't do this again for 1440min because it needs to report on the whole day's activity
reRun(theTask, 1440)

On the other hand if we have two pieces of code that define how we apply (random invented example not at all related to my previous job. [cough]) the FX hedge to a value… we pretty much need that calculation - even though it's simply multiplying the input by a static multiplier - to be in only one place because it's a specific business rule, and it reduces complexity to have it in one place, and simply to call it from wherever it's needed.

class ForeignExchange {
    hedgeMultiplier = 1.05

    applyHedge(amount) {
        return amount * hedgeMultiplier

The business rule is a simple multiplication expression, but it's to do with financial transactions, and its usage needs to be uniform (because customers notice and complain when it isn't. In theory. In this fictitious analaogy I am using here. [another cough]). Plus, as we found out: we needed to make the expression slightly more complex than that. Again this comes back to the Single Responsibility Principle.

I think a lot of intermediate-level devs get into the trap of premature optimisation, and this can cause a misapplication of the DRY Principle: as soon as they find themselves starting to write similar code a second time, they immediately refactor it into a common location. Personally I think the shape that a refactoring might take only starts to surface on the third replication of the same code. When one is looking at a sample size of two when considering refactoring, it's pretty difficult to tell if it's a pattern, or if it's just a coincidence. Wait a bit longer to see if the pattern forms before refactoring stuff (ie: applying the DRY Principle).

Right. So that's 50/50 stuff I had already written, and a few more thoughts. It's a pretty short article for me, but, well: this is all I had to say so I ain't gonna spin it out further.

Bottom line: it's OK for code to be duplicated, if the code is simple.