Adam Cameron's Dev Blog: Feedback on UDF submitted to CFLib

G'day:
This is just some feedback on one of the UDFs I just approved on CFLib (convertBBCodeToHtml()). I changed the function a fair bit, so figured it might be helpful to look at what I did.

Here's the original function:

public function gethtml(required string text){
        var theList = '\[i\](.*?)\[\/i\],<i>\1</i>
                \[b\](.*?)\[\/b\],<strong>\1</strong>
                \[u\](.*?)\[\/u\],<u>\1</u>
                \[s\](.*?)\[\/s\],<strike>\1</strike>
                \[code\](.*?)\[\/code\],<pre>\1</pre>
                \[img\](.*?)\[\/img\],<img src="\1" alt="" />
                \[quote\](.*?)\[\/quote\],<blockquote><p>"\1"</p></blockquote>
                \[quote="(.*?)"\](.*?)\[\/quote\],<blockquote><p>\2<br /> - \1</p></blockquote>
                \[style size="(.*?)"\](.*?)\[\/style\],<span style="font-size:\1">\2</span>
                \[style color="(.*?)"\](.*?)\[\/style\],<span style="color:\1;">\2</span>
                \[list\]\s\[\*\],<ul><li>
                \[\*\],</li> <li>
                \[\/list\],</li></ul>
                \[table\],<table style="border:1px solid black;">
                \[\/table\],</table>
                \[tr\],<tr>
                \[\/tr\],</tr>
                \[td\],<td>
                \[\/td\],</td>
                \[url\](?:http(s)?:\/\/)?(.*?)\[\/url\],<a href="http\1://\2" target="_blank">http\1://\2</a>';
        var theReturn = ''; var theLine = ''; var theLeft = ''; var theRight = '';
        
        //fileObj = fileRead(ExpandPath('comps/bbcode.txt'));
        
        for(i=1; i lte listLen(theList, chr(10)); i++){
                theLine = ListGetAt(theList, i, chr(10));
                theLeft = ListFirst(Trim(theLine));
                theRight = ListGetAt(Trim(theLine), 2);
                arguments.text = ReReplaceNoCase(arguments.text, '#theLeft#', '#theRight#', 'All');
        }
        return arguments.text;
}

And here's the version I released:

public string function convertBBCodeToHtml(required string text, array substitutions){
    param array substitutions = [
        {match="\[i\](.*?)\[\/i\]", replace="<i>\1</i>"},
        {match="\[b\](.*?)\[\/b\]", replace="<strong>\1</strong>"},
        {match="\[u\](.*?)\[\/u\]", replace="<u>\1</u>"},
        {match="\[s\](.*?)\[\/s\]", replace="<strike>\1</strike>"},
        {match="\[code\](.*?)\[\/code\]", replace="<pre>\1</pre>"},
        {match="\[img\](.*?)\[\/img\]", replace='<img src="\1" alt="" />'},
        {match="\[quote\](.*?)\[\/quote\]", replace='<blockquote><p>"\1"</p></blockquote>'},
        {match='\[quote="(.*?)"\](.*?)\[\/quote\]', replace="<blockquote><p>\2<br /> - \1</p></blockquote>"},
        {match='\[style size="(.*?)"\](.*?)\[\/style\]', replace='<span style="font-size:\1">\2</span>'},
        {match='\[style color="(.*?)"\](.*?)\[\/style\]', replace='<span style="color:\1;">\2</span>'},
        {match="\[list\]\s*\[\*\]", replace="<ul><li>"},
        {match="\[\*\]", replace="</li> <li>"},
        {match="\[\/list\]", replace="</li></ul>"},
        {match="\[table\]", replace='<table style="border:1px solid black;">'},
        {match="\[\/table\]", replace="</table>"},
        {match="\[tr\]", replace="<tr>"},
        {match="\[\/tr\]", replace="</tr>"},
        {match="\[td\]", replace="<td>"},
        {match="\[\/td\]", replace="</td>"},
        {match="\[url\](?:http(s)?:\/\/)?(.*?)\[\/url\]", replace='<a href="http\1://\2" target="_blank">http\1://\2</a>'}
    ];
    
    for(var substitution in substitutions){
        text = reReplaceNoCase(text, substitution.match, substitution.replace, "All");
    }
    return text;
}

I've highlighted various issues I've addressed:

the function name. getHtml() didn't - to me - really describe what the function did. "Get HTML? From where? What HTML?" Whereas convertBBCodeToHtml() is immediately clear as to what it's for.
The original function used a list for the substitution values. Avoid lists where at all possible; they're a pretty crappy (pseudo ~) data type, and perform poorly. Instead I've used an array, which...
... is a lot more wieldy than horsing around extracting elements from the list.
Also the name theList (and theLeft and theRight) are not very descriptive variable names. It's not simply a list, it's a series of substitution matches and replacements, so I've renamed them accordingly.
Don't forget to var all your variables (admittedly this variable isn't in the reworked version, but it's an important observation anyhow).
I don't think there's merit in scoping the arguments here. It's just clutter, and arguments are present in the function-local scope as well anyhow.
No need for the quotes and pound signs here; just the variable name itself is fine.
Lastly I noticed we are only substituting a subset of well-known BBCodes here, so - rather than trying to capture them all in the inbuilt array, I made the array an argument with a default value of the supplied ones. This makes it easier for uses to make different substitutions if they need to. Note: I know I could have put a default value for the argument directly, but due to the size of the value I decided against that.

As part of the approval process I knocked together a few superficial unit tests:

// TestConvertBBCodeToHtml.cfc
component extends="testbox.system.BaseSpec" {

    function run(){
        include "udfs/convertBBCodeToHtml.cfm";

        describe("Tests for convertBBCodeToHtml()", function(){
            it("is handles an empty string", function(){
                expect(
                    convertBBCodeToHtml("")
                ).toBe("");
            });
            it("handles an isolated substitution", function(){
                expect(
                    convertBBCodeToHtml("[i]italicised[/i]")
                ).toBe("<i>italicised</i>");
            });
            it("handles multiple same substitutions", function(){
                expect(
                    convertBBCodeToHtml("[i]italicised[/i][i]italicised again[/i]")
                ).toBe("<i>italicised</i><i>italicised again</i>");
            });
            it("handles different substitutions", function(){
                expect(
                    convertBBCodeToHtml("[i]italicised[/i][b]bold[/b]")
                ).toBe("<i>italicised</i><strong>bold</strong>");
            });
            it("handles leading, embedded and trailing text", function(){
                expect(
                    convertBBCodeToHtml("text before[code]code[/code]text between[quote]quote[/quote]text after")
                ).toBe('text before<pre>code</pre>text between<blockquote><p>"quote"</p></blockquote>text after');
            });
            it("handles complex substitution", function(){
                expect(
                    convertBBCodeToHtml("[list][*] item 1[*] item 2[/list]")
                ).toMatch("\s*<ul>\s*<li>\s*item 1</li>\s*<li>\s*item 2</li></ul>\s*");
            });
            it("handles multi-line substitution", function(){
                expect(
                    convertBBCodeToHtml("
                        [list]
                            [*] item 1
                            [*] item 2
                        [/list]
                    ")
                ).toMatch("\s*<ul>\s*<li>\s*item 1\s*</li>\s*<li>\s*item 2\s*</li></ul>\s*");
            });
            it("handles custom substitution", function(){
                expect(
                    convertBBCodeToHtml("[foo]moo[/foo]", [{match="\[foo\](.*?)\[\/foo\]",replace="<foo>\1</foo>"}])
                ).toMatch("<foo>moo</foo>");
            });
        });
    }

}

I mention this solely as it should be part of the process of submitting any UDF to CFLib (Ray should be insisting people provide tests for these, to save me having to write them!). And, yes, I know that's not complete coverage, and I did not test all the regexes. Life's too short.

Finally, just as a proof of concept, here's an approach using Array.reduce() instead of a loop:

public string function convertBBCodeToHtmlUsingReduce(required string text, array substitutions){
    param array substitutions = [
        {match="\[i\](.*?)\[\/i\]", replace="<i>\1</i>"},
        {match="\[b\](.*?)\[\/b\]", replace="<strong>\1</strong>"},
        {match="\[u\](.*?)\[\/u\]", replace="<u>\1</u>"},
        {match="\[s\](.*?)\[\/s\]", replace="<strike>\1</strike>"},
        {match="\[code\](.*?)\[\/code\]", replace="<pre>\1</pre>"},
        {match="\[img\](.*?)\[\/img\]", replace='<img src="\1" alt="" />'},
        {match="\[quote\](.*?)\[\/quote\]", replace='<blockquote><p>"\1"</p></blockquote>'},
        {match='\[quote="(.*?)"\](.*?)\[\/quote\]', replace="<blockquote><p>\2<br /> - \1</p></blockquote>"},
        {match='\[style size="(.*?)"\](.*?)\[\/style\]', replace='<span style="font-size:\1">\2</span>'},
        {match='\[style color="(.*?)"\](.*?)\[\/style\]', replace='<span style="color:\1;">\2</span>'},
        {match="\[list\]\s*\[\*\]", replace="<ul><li>"},
        {match="\[\*\]", replace="</li> <li>"},
        {match="\[\/list\]", replace="</li></ul>"},
        {match="\[table\]", replace='<table style="border:1px solid black;">'},
        {match="\[\/table\]", replace="</table>"},
        {match="\[tr\]", replace="<tr>"},
        {match="\[\/tr\]", replace="</tr>"},
        {match="\[td\]", replace="<td>"},
        {match="\[\/td\]", replace="</td>"},
        {match="\[url\](?:http(s)?:\/\/)?(.*?)\[\/url\]", replace='<a href="http\1://\2" target="_blank">http\1://\2</a>'}
    ];
    return substitutions.reduce(function(text, substitution){
        return reReplaceNoCase(text, substitution.match, substitution.replace, "All");
    }, text);
}

So that's just a single statement :-)

I love these new iteration functions.

That's it. Not a very awe-inspiring article, but hopefully explains where I'm coming from with those changes.

--
Adam

Friday, 20 June 2014

Feedback on UDF submitted to CFLib