Wednesday 24 July 2013

Stupid (but trivial) bug with htmlEditFormat()

G'day:
I was putting Adam Tuttle's htmlDecode() UDF through the approval process just before, and found a daft (and admittedly pretty inconsequential) bug with htmlEditFormat() whilst I was about it.

When I was testing the UDF, I read in an HTML file I had, encoded, decoded it, and did a compare() on the strings. And the result was a mismatch (ie: the test failed). Knowing Adam is fairly clued-up, my immediate reaction was not "Adam... what have you done wrong?" (be that Adam Tuttle in his UDF or ~ Cameron in his testing), it was "Adobe... what have you done wrong?" Especially when I ran the same test on Railo and it worked fine.

I refined the test rig as follows:

<!-- src.html -->
<!doctype html>
<html lang="en">
    <head>
        <meta charset="utf-8">
        <title>This & that</title>
    </head>
    <body>
        <h1>The heading</h1>
        <p>The content</p>
    </body>
</html>

And the test:

// htmlEditFormat.cfm
original = fileRead(expandPath("./src.html"), "UTF-8");

encoded = htmlEditFormat(original);

function htmlDecode(HTML){
    return replaceList(arguments.HTML, "&lt;,&gt;,&amp;,&quot;", '<,>,&,"');
}
decoded = htmlDecode(encoded);

compared = compare(original,decoded);

if (compared != 0){
    loopTo = min(len(original), len(decoded));
    for (i=1; i <= loopTo; i++){
        cOriginal = mid(original, i, 1);
        cDecoded = mid(decoded, i, 1);
        match = cOriginal == cDecoded;
        writeOutput("#cOriginal#:#cDecoded# : (#asc(cOriginal)#:#asc(cDecoded)#): #match#<br>");

        if (!match){
            loopTo = min(i+10, loopTo);    // stop soon, but continue for a bit to give some context
        }

    }
}

On Railo this outputs nothing, which is correct as there's only output if the strings don't match. On ColdFusion (9.0.1), I get this:

<:< : (60:60): YES
!:! : (33:33): YES
d:d : (100:100): YES
o:o : (111:111): YES
c:c : (99:99): YES
t:t : (116:116): YES
y:y : (121:121): YES
p:p : (112:112): YES
e:e : (101:101): YES
: : (32:32): YES
h:h : (104:104): YES
t:t : (116:116): YES
m:m : (109:109): YES
l:l : (108:108): YES
>:> : (62:62): YES
: : (13:10): NO
:< : (10:60): NO
<:h : (60:104): NO
h:t : (104:116): NO
t:m : (116:109): NO
m:l : (109:108): NO
l: : (108:32): NO
:l : (32:108): NO
l:a : (108:97): NO
a:n : (97:110): NO
n:g : (110:103): NO

Note how my source file has CRLF as its line separator. But the resultant string has just LF. Obviously looking at Adam's UDF, it was not the culprit, but to prove it was CF being daft, I added this code:

loopTo = len(encoded);
for (i=1; i <= loopTo; i++){
    char = mid(encoded, i, 1);
    code = asc(char);
    writeOutput("#char#:#code#<br>");
    if (code == 10){
        loopTo = i + 5;
    }
}

And this output:
&:38
l:108
t:116
;:59
!:33
-:45
-:45
:32
s:115
r:114
c:99
.:46
h:104
t:116
m:109
l:108
:32
-:45
-:45
&:38
g:103
t:116
;:59
:10
&:38
l:108
t:116
;:59
!:33

So... for some reason best known to itself... ColdFusion thinks it knows best when it comes to line breaks, and decides I was being silly using CRLF, and LF is the way all the cool kids are doing it these days.

Sigh.  Seriously, ColdFusion... just do what yer told. Please.

I don't have ColdFusion 10 to play with, so I cannot test this myself. Could someone with CF10 run this code except using encodeForHtml() instead of htmlEditFormat(), and report back? Cheers.


--
Adam