Thursday 12 June 2014

CFHTTP, presumption, rubber ducks and CRLFs

G'day:
Here's another one that's just "dumb shit I did today".



I am intrigued by the list of bugs fixed in ColdFusion 11, as per this well-hidden list "Adobe ColdFusion Splendor (codename) & Adobe ColdFusion Thunder (codename) Release Notes" (PDF). Part of my intrigue is that it's homed on download.macromedia.com. WTF? But the other intrigue is how many Coldfusion 10 bugfixes have been released in ColdFusion 11, and what Adobe are planning about also fixing them in ColdFusion 10. My suspicion is that they're gonna mostly go "nuh-uh... upgrade if you want those puppies", but I want to come up with a subset of issues which are really significant, and present them to Rakshith - hopefully with the community behind me - and go "fix these ones in ColdFusion 10, 'please'".

First I need to get the bug details down from the bugbase and review them. That PDF doesn't even have links in it (amateur, guys), let alone enough detail to do that, so I decided to extract the IDs, <cfhttp> the bug URLs and write a report for myself so I can review the detail.

Easy.

Easy especially as I already have all the bits and pieces to do the scraping, thanks to the @CfmlNotifier app which does this very thing.

I did something dreadful (and Jaybo, if you're reading, you'll love this), I c*nt & pasted the code I needed from CfmlNotifier into a new directory, and banged some code out to loop over a file containing all the IDs (manually extracted from that PDF), cfhttped them, extracted the detail, and wrote it to file. From there I could write some more code to extract what I want for a report, etc.

Here's the code (not my proudest moment, this lot, and this is not at all "clean code" or TDD or anything like that. Oops).
<cfscript>
// loadBugDetail.cfm
cfflush(interval=16);
include "udfs.cfm";

bugUrl = "https://bugbase.adobe.com/index.cfm?event=bug&id=";

bugDetails = fileRead(expandPath("./bugIds.txt")).listToArray(chr(13)).map(function(bugId){
    writeOutput("Processing #bugId#&hellip;");
    try {
        var bug = getBug(bugId);
        writeOutput(bug.title & "<br>");
        return bug;
    }
    catch (BugNotFoundException e){
        return {id=bugId};
    }
});
bugData = serializeJson(bugDetails);
fileWrite(expandPath("./bug.json"), bugData);
writeDump(bugDetails);
</cfscript>

<cfscript>
//udfs.cfm

// copied from BugBaseProxy.cfc

variables.patterns    = {
    notfound    = "<title>The information requested is not found</title>",
    version        = '<h1 class="title">\s*ColdFusion\s*(\S+)',
    title        = "<h2>Title</h2>\s*<p>(.*?)</p>",
    status         = "<h3>status</h3>\s*<div[^>]+>\s+<b>state</b>(.*?)</div>\s*<div[^>]+>\s+<b>status</b>(.*?)</div>\s*<div[^>]+>\s+<b>reason</b>(.*?)</div>",
    comments    = '<div id="comment">.*?Notes\s+\((\d+)\).*?</div>',
    attachments    = "<h3>Attachments\s+\((\d+)\)</h3>",
    votes        = '<div id="votes">.*?Votes\s+\((\d+)\).*?</div>'
};

public struct function getBug(required numeric adobeId){
    var thisBugUrl = variables.bugUrl & adobeId;
    var httpService = new http(
        method        = "get",
        url            = thisBugUrl,
        useragent    = createUuid()
    );
    var response = httpService.send().getPrefix();

    // there's a coupla error conditions: HTTP errors, or a bung bug ID. Deal...
    if (response.statusCode != "200 OK"){
        throw(message="Request failed", type="RequestFailedException", detail="The request to #thisBugUrl# returned #response.statusCode#", errorcode=val(response.statusCode));
    }

    var bugHtml = response.fileContent;
    // OK, we have the bug's mark-up. Find the bits 'n' pieces we want
    var bugDetails = {
        id            = adobeId,
        title        = "",
        version        = "",
        status        = "",
        state        = "",
        comments    = 0,
        attachments    = 0,
        votes        = 0
    };
    if (reFindNoCase(variables.patterns.notfound, bugHtml)){
        writeDump(bugHtml);
        bugDetails.title = "BUG NOT FOUND";
        return bugDetails;
    }


    var match = reFindNoCase(variables.patterns.title, bugHtml, 1, true);
    if (arrayLen(match.pos) >= 2){
        bugDetails.title = trim(mid(bugHtml, match.pos[2], match.len[2]));
    }else{
        bugDetails.title = "";
    }


    match = reFindNoCase(variables.patterns.version, bugHtml, 1, true);
    if (arrayLen(match.pos) >= 2){
        bugDetails.version = trim(mid(bugHtml, match.pos[2], match.len[2]));
    }else{
        bugDetails.version = "";
    }

    match = reFindNoCase(variables.patterns.status, bugHtml, 1, true);
    if (arrayLen(match.pos) >= 2){
        bugDetails.status = trim(mid(bugHtml, match.pos[2], match.len[2]));
    }else{
        bugDetails.status = "";
    }
    if (arrayLen(match.pos) >= 3){
        bugDetails.state = trim(mid(bugHtml, match.pos[3], match.len[3]));
    }else{
        bugDetails.state = "";
    }

    match = reFindNoCase(variables.patterns.comments, bugHtml, 1, true);
    if (arrayLen(match.pos) >= 2){
        bugDetails.comments = trim(mid(bugHtml, match.pos[2], match.len[2]));
    }else{
        bugDetails.comments = 0;
    }

    match = reFindNoCase(variables.patterns.attachments, bugHtml, 1, true);
    if (arrayLen(match.pos) >= 2){
        bugDetails.attachments = trim(mid(bugHtml, match.pos[2], match.len[2]));
    }else{
        bugDetails.attachments = 0;
    }

    match = reFindNoCase(variables.patterns.votes, bugHtml, 1, true);
    if (arrayLen(match.pos) >= 2){
        bugDetails.votes = trim(mid(bugHtml, match.pos[2], match.len[2]));
    }else{
        bugDetails.votes = 0;
    }

    return bugDetails;
}
</cfscript>

And an extract from bugIds.txt would be:

3010319
3022568
3043161
3061231
3065830

If you can see where I've gone wrong at this point, I'll buy you a beer.

What I was seeing is that the first bug was coming down file, but all subsequent ones were just returning the notfound status. So - to be clear - the HTTP request was returning "200 OK", but ti was hitting the bugbase error page, eg:


(this is what one would get if browsing to something like https://bugbase.adobe.com/index.cfm?event=bug&id=INVALID_ID).

And it didn't matter which was the first bug ID to check... that one always worked, and the subsequent ones did not. So it was like the first request was fine, subsequent requests were somehow being munged. But how on earth would the bugbase site "know" that it was the same client (<cfhttp> and my CF server, in this case) making these consecutive requests?

Coincidentally, the bugbase bug wherein one needs to clear one's cookies or get this exact error for every bug one looks at has cropped up in the last day or so.

So Adam got all orwellian with his arithmetic, and worked out 2+2=5... ie: my issue with this code was caused by Adobe's dumb-arse bug. Ahem. Well there's definitely a dumb-arse involved. Read on.

Adobe's issue is all about cookies... but I know <cfhttp> doesn't manage cookies by default (there's an upcoming feature in Railo, but that's another story...), so it can't be that. But it seemed like that. What else could it be.

I asked Twitter for help:

A bunch of people all tried to help, which was cool, and I have now confirmed what I thought I already knew about <cfhttp> and cookies, which is good.

This evening the subject came up on IRC, and Ray got intrigued so I hastily committed my code to github and shared the link with him, and he gave it a go on his end.

And the code errored straight away. We quickly ascertained this was because he's not on Windows so just CR was not good enough for a line delimiter here:

bugDetails = fileRead(expandPath("./bugIds.txt")).listToArray(chr(13))

He needed the whole CRLF. He changed that, and the process ran fine for him (we were just testing with the testIds.txt file, not the whole lot).

So why the hell was it dodgy for me? At this juncture I decided it perhaps was not Adobe's fault, and the problem lay closer to home, with Muggins here.

I ran the thing again, this time noticing the console output:

Jun 12, 2014 19:17:14 PM Information [http-bio-8511-exec-4] - Starting HTTP request {URL='https://bugbase.adobe.com:443/index.cfm?event=bug&id=3194160', method='get'} Jun 12, 2014 19:17:15 PM Information [http-bio-8511-exec-4] - HTTP request completed {Status Code=200 ,Time taken=1156 ms} Jun 12, 2014 19:17:15 PM Information [http-bio-8511-exec-4] - Starting HTTP request {URL='https://bugbase.adobe.com:443/index.cfm?event=bug&id= 3186972', method='get'} Jun 12, 2014 19:17:16 PM Information [http-bio-8511-exec-4] - HTTP request completed {Status Code=200 ,Time taken=1168 ms}

And it dawned on me. Whilst CR was fine for a delimiter, the delimiter was actually a CRLF, so I ended up with each ID being prefixed with a linefeed. Which <cfhttp> was escaping as some whitespace, I guess (LFs aren't legit in URLs after all).

So the IDs I was passing to Adobe were not, indeed, OK. Sigh.

I felt daft.

I fixed the code, and the process ran fine:

bugDetails = fileRead(expandPath("./bugIds.txt")).listToArray(chr(13)&chr(10))

I stopped feeling quite so daft when Ryan pointed something out... had Adobe correctly respected my numeric typing on getBug():

public struct function getBug(required numeric adobeId){

then I would have received a decent error message, along the lines of "that ain't numeric, dick", and I would have checked the code and spotted the issue quick smart. So thanks for that Adobe. I do so wish you would comprehend what words like "numeric" actually mean. Here's a demo of ColdFusion's stupidity:

// numericTest.cfm

any function acceptNumeric(required numeric x){
    return;
}

numeric function returnNumeric(required any x){
    return x;
}

s = " 123 "; // This. Is. Not. Fucking. Numeric.

safe(function(){
    acceptNumeric(s);
});

safe(function(){
    returnNumeric(s);
});



function safe(required function f){
    try {
        f();
        writeOutput("OK");
    }
    catch (any e){
        writeOutput("FAILED");
    }
    writeOutput("<hr>");
}

This outputs:

OK

OK



It's seriously not OK, Adobe. Get your act together.

(and, for that matter, you too, Railo. Although I suspect you are just copying CF here?)

Anyway, that only excuses me partially. I am always quick to tell people that bugs are seldom with the product, they are with one's own code. And certainly the bug I was thinking existed in ColdFusion did not exist; it was indeed a shoddy logic error of my own.

And now... I can scrape their bugs and see what I think they should be retrofitting into ColdFusion 10. Stay-tuned for that one (tomorrow or perhaps Saturday).

Righto.

--
Adam