Wednesday 17 July 2013

CFML: My CFLib approval process

G'day:
A week or so ago there was some derision expressed that when approving UDFs submitted to CFLib we might change the code (for whatever reason). There was also derision expressed at how long it had thusfar taken to approve that person's UDF (it's still in the queue, and as far as I'm concerned: there it will stay for the foreseeable future... pretty much until everything else in the queue is cleared).

But due to this conversation I figured I might explain what it is I do when I approve a UDF. I cannot vouch for the process that Ray undertakes when he approves a UDF... this is simply the process I undertake. And I might perform some, all or none of these steps, as I see fit when doing the approving.

Firstly, the code submitted to CFLib is, IMO, not "your" code, it's CFLib's code. Which we will then do with as we like, with the ultimate goal of releasing a well-polished UDF for public usage. Your code is still there, on the machine in front of you. That's yours. Do with it what you like.

So what do I do when I check a UDF for approval?

Do I think the UDF has merit?

This is purely subjective. If the UDF is selectRandomColourForMyLittlePony(), I'll probably ignore it and leave it in the queue. Basically I try to estimate how broad the need for a given UDF would be before selecting it for triage. This isn't to say selectRandomColourForMyLittlePony() would be rejected - we have a very low bar as to what we'll approve for CFLib - it just means I'll bench it in favour of triaging more widely-useful UDFs first.

Does the UDF cover ground already covered by another UDF?

You'd be surprised (or maybe you wouldn't be) how many people don't bother to check to see if their UDF has already been implemented by someone else. I'd say this caters for about 10% of our submissions. A UDF doesn't need to be exactly the same as an existing one for me to reject it, if they cover mostly the same ground just say with a slightly different approach (but where the differences aren't a useful distinction to have), I'll reject it.

If, on the other hand, a new UDF covers the same ground as an existing one, but does it better: I'll improve the existing one based on the improvements in the new one, crediting the work of the new UDF's author for the improvement.

Do I understand what it sets out to do?

Sometimes a UDF is a CFML port of an existing recognised algorithm - like getHaversineDistance() - or perhaps calculates some "real world" value, for example isVIN(). I didn't know either what a "haversine distance" was, nor what a "VIN" is before sitting down in front of the UDF code. So I needed to find out. This takes some googling and either understanding the process (in the case of a VIN, how it was composed), or find some prior art from a reputable source. Then I can move on to the next step.

Does it work?

So having understood the intent of the UDF, the next thing I check is whether or not it runs, with a fairly predictable / default set of inputs. You'd be surprised (or, again, perhaps not...) how many submitted UDFs don't even work with the example code the person has posted. This test is not exhaustive test, it's just one to see how much more work I have in front of me to get the thing into a publishable state. If the code for the UDF is simple and I see the problem straight away, I'll fix it. Or if the UDF is more complex but would be something I think CFLib would benefit from, I'll roll up my sleeves and undertake to sort out what the problem is. If it's neither of those, I just reject it.

Does it do what it claims it does?

A variation on the above is whether the UDF actually achieves what it sets out to do. If someone submits a function calculateCircumference(), but uses 3 instead of p, it's not doing what it says. That's an egregious example, but not that egregious for some of the submissions I've seen. Again, if the fix is easy or the UDF would be useful, I'll queue it up for a fix job.

Does it rely on other UDFs?

If so: reject.

Does it have an appropriate name?

Sometimes the name doesn't really simply describe what it does, or it's spelled wrong, or it's unnecessarily abbreviated etc. I fix that. Similarly sometimes the function names have underscores in them, which is just (IMO), an awful style to use, so my_udf() will become myUdf(). Similarly with variable names. This is one of the few things in the CFLib style guide. Most everything else is left up to the individual approver's discretion.

Does it do the right amount of work?

I cover this idea in a separate article: "When functions do too much, and how to get them to do more". A function should do one thing. One rule of thumb I have which I'll repeat here is that the function name should describe what the function does. And if I feel that the function name needs the word "and" in it: it's doing too much.

Is the code stable?

An example of this would be unVARed local variables (or not using the local scope... people still seem to err towards VARing though. This is my preference too, so that's cool with me). Another example will be whether it relies on optional arguments actually existing, or a specific range of values for arguments, otherwise it'll fail.

Is the logic sound?

I've seen UDFs which take in a list as an argument, and then covert it to an array and then use the array exclusively. if this is the case... make the argument an array! Sometimes - for example with a list function - it makes sense to take a list as an argument. But if the function merely needs to take a collection of simple values, and the best way to deal with the data is via CFML's array functions: use an array.

Similarly, if an argument needs to be a positive integer for it to work sensibly, then simply specifying an argument type of "numeric" is inadequate validation. It will need to have some positive-integer validation on that argument too. Sometimes it's OK to let the code just break at the point the requirement becomes relevant, but generally the UDF should reject invalid values.

Is the code as efficient as it could be?

Sometimes the idea behind the UDF is a good 'un, but the implementation is poor. Like looking through a string to achieve something a single regex operation would effect.

This also extends to this sort of thing:

var resultVar = someSimpleOperation();
return resultVar;

There's no point creating that variable, as it's never really "used". So this would be fine:

return someSimpleOperation();

There's a balance to be struck here though...

Is the code clear?

As well as being functional, the code needs to be maintainable (and understandable). In the example above the modified code is no less understandable than the original version, so that's an expedient optimisation. However sometimes I encounter unduly long compound statements which can't be unpicked on first look, so I will break those down into separate statements, using sensibly-named intermediary variables. In addition to this, variable names (and the UDF name!) should be clear, which means most abbreviated ones need de-abbreviation. Except when the abbreviation is an industry-recognised one, like GUID or UTC or something like that. For example if there's a variable scoreInc, I will change that to be scoreIncrement.

Is the code in tags?

For pete's sake... it's seldom appropriate to use tags when writing UDFs. Use tags iff:
  • The UDF requires using CFML functionality only available in tags;
  • You're doing text processing which benefits from CF being able to intermingle code with text.
If you're not doing either of those, don't use tags. I don't mean just in the context of CFLib... I mean at all.

I will accept tag-based functions if they fulfill all other requirements without modification. Otherwise I'll rewrite them. With a grumpy scowl on my face. But I'll first look for any excuse to reject them.

Thorough testing

If I get this far, it's a UDF I'm interested in getting working and publishing on CFLib. So I'll polish up the code, and test all reasonable scenarios that the original author claimed the UDF should work in. And if there are any shortfalls: fix 'em if I can work out the problem, or refer back to the author if not. If I find a problem that would take me more than [some arbitrary amount of time, dependent on the situation] to fix, I will reject the UDF but contact the author to see if they can sort it out and re-submit.

This testing phase can take a while... each optional argument needs testing for being present and not present, etc (and combinations thereof), similarly conditional statements need true/false options tested etc. It would be a big help if people included thorough tests with their submissions (or a link to where their unit tests are!), but people seldom... OK... never do this. Anything submitted to CFLib ought to be unit tested before submission, really, to save me having to do it. Well: any code you write should be unit tested before you use it. So it should be no problem including the unit tests along with your submission.

Approved

So once I wade through all that, I'll approve the UDF.

I'd say about 5% of UDFs get through unchanged. I like those ones. About 20% are good ideas, but I end up pretty much re-writing them. But most fall within that range, generally erring towards being almost spot-on.

But hopefully you can see that it's not a 5sec job, and all I have to do is go "oh yeah, that'll be just fine [clicks submit]". I'd say it'd never be less than an hour's work, and has sometimes been more like 3-4h work to test things thoroughly.

If you're the impatient type, and your UDF needs to be up on CFLib now, then write it in a fashion appropriate for a general audience, test it thoroughly, and provide me the tests. And that'll expedite things, as it minimises how much of my time I have to invest in doing this work for you ;-).

And keep 'em coming. This might sound like a gripe, but I do like working through the queue, and getting new UDFs up on CFLib.

Cheers for your ongoing community input in this, btw!

--
Adam