Tuesday, 23 April 2013

CFML compilation into Java .class files

G'day:
Once again, Stack Overflow is my muse. Today a person called David Mulder is asking a question the answer to which relates to how CFML source code is processed into a Java .class file before the class file is then executed.

David's question was basically why this code doesn't behave how one might hope:

<!---test.cfm --->
<cfset msg = "G'day World">
<cfoutput>
<cfinclude template="incNoOutput.cfm">    
</cfoutput>
<cfinclude template="incWithOutput.cfm">

<!--- incNoOutput.cfm --->
#msg#<br>

<!--- incWithOutput.cfm --->
<cfoutput>#msg#</cfoutput><br>


The output of this is:

#msg#
G'day World


I don't think surprises anyone, as we've all be bitten by this in the past: because the <cfinclude> is wrapped in <cfoutput> tags, we expect the code within the included file act as if it's wrapped in those <cfoutput> tags, and so the variable name outputs. But ColdFusion doesn't work this way. I think it's the sort of thing we all accept, but we don't necessarily know why, other than "well: because".

Before I get into why it's the way it is in ColdFusion, I'll use some C code (which I am so rusty with I'll need to google-up) to demonstrate how it doesn't work, despite it being some people's assumption as to what's going.

C has a construct #include, which one can use in one's C source code to pull in some other code, much the same as how <cfinclude> does the same sort of thing in CFML:

/* main.c */
#include<stdio.h>
main(){
    printf("G'day");
    #include inc.c;
}

/* inc.c */
printf("World");

Unlike the <cfinclude> tag, which is executed at runtime, the #include construct in C is a preprocessor directive. What that means is that when the code comes to be compiled, the code is preprocessed, and various directives are processed before code compilation takes place. The #include directive basically says "get the source code from the specified file and put it right here". Then once all the preprocessing stuff is done, the resultant code is compiled. So the code that actually goes to the compiler is this:

main(){
    printf("G'day");
    printf("World");
}

(obviously the stuff from the stdio.h library also gets included, but that's not relevant to what I'm demonstrating here)

And then that source code is compiled down to a binary, and one can then run the binary.

A lot of people assume <cfinclude> works exactly the same way, but it doesn't. In a way it's misnamed: it's not including the code from the file (in the same sense as it is in C), it's just executed. This is slightly different.

You're hopefully aware that one cannot do this in CFML:

<!--- switch.cfm --->
<cfswitch expression="#expression#">
    <cfinclude template="cases.cfm">
</cfswitch>

<!--- cases.cfm --->
<cfcase value="tahi">
    ONE
</cfcase>
<cfcase value="rua">
    TWO
</cfcase>
<cfcase value="toru">
    THREE
</cfcase>
<cfcase value="wha">
    FOUR
</cfcase>

This results in:

Only cfcase or cfdefaultcase tags may be nested within a cfswitch tag.

ColdFusion was looking at the following text:cfinclude
encountered on line 4 at column 10.

<cfinclude> is not simply pulling the code from the included file into the including file.

The ColdFusion "compiler" (I use quotes because some pedants might baulk at me calling it that, as the CFML source is not compiled all the way down to a native binary, it's just compiled to Java bytecode, but it amounts to the same thing for the purposes of what I'm discussing here. I shall be using the term "compile" for the sake of brevity) compiles each file separately, and before runtime. This means that the CFML within each file must be syntactically complete and correct.

What's less obvious is that stuff like <cfoutput> isn't just like a toggle that causes subsequent CFML expressions to be output until a closing </cfoutput> tag is encountered... because one needs to bear in mind that it's not the CFML that's actually being executed at runtime, it's the Java bytecode that was the result of the compilation. One of the ramifications of this is the whole notion of tags and stuff nested in tags is long gone by the time the code executes, not least of all because Java doesn't have such a concept as "nestable tags".

What happens with a <cfoutput> tag is that it's a sign to the compiler to compile things slightly differently so that CFML expressions within the tags are processed. However the compiler obviously (?) only knows to do this if the tags are actually in the file its compiling. Bear in mind that a <cfinclude> is a runtime operation, so when the compiler is compiling incNoOutput.cfm, it has no idea that the request will actually be calling that code from another file which happens to have <cfoutput> tags around the <cfinclude>. As far as the compiler is concerned the resultant class file is simply a class file (the bytecode is reflected in a class), and it can be used whenever and from wherever one likes. So just because this request has <cfoutput> tags around the call to it doesn't mean all requests will do so. Make sense?

I guess a blurring factor here is that with CFML there's no discrete "compile the code" step when we're running it: the mere act of browsing to a CFM file triggers the compilation process (if it's needed), and then runs the request. This is why when you hit a CFM file for the first time it's substantially slower than subsequent times: all the code needs to be compiled before it's run. Once it's compiled it's kept in memory, so subsequent executions of the file simply executes the already-compiled byte code.

One can better see what I mean if we grab the class files for incNoOutput.cfm and incWithOutput.cfm, as it'll show what the compiler has done with it. I've decompiled these.

This is the decompiled code for incNoOutput.cfm:

import coldfusion.runtime.AttributeCollection;
import coldfusion.runtime.CFPage;
import coldfusion.runtime.CfJspPage;
import java.io.Writer;
import javax.servlet.jsp.JspContext;
import javax.servlet.jsp.JspWriter;
import javax.servlet.jsp.tagext.Tag;

public final class cfincNoOutput2ecfm2057894054 extends CFPage {
    public static final Object metaData;

    static {
        metaData = new AttributeCollection(new Object[0]);
    }

    public final Object getMetadata() {
        return metaData;
    }

    protected final Object runPage(){
        Object value;
        JspWriter out = this.pageContext.getOut();
        Tag parent = this.parent;
        bindImportPath("com.adobe.coldfusion.*");
        out.write("\r\n#msg#<br>");
        return null;
    }
}

And this is for incWithOutput.cfm

import coldfusion.runtime.AttributeCollection;
import coldfusion.runtime.CFPage;
import coldfusion.runtime.Cast;
import coldfusion.runtime.CfJspPage;
import coldfusion.runtime.LocalScope;
import coldfusion.runtime.Variable;
import coldfusion.runtime.VariableScope;
import coldfusion.tagext.GenericTag;
import coldfusion.tagext.QueryLoop;
import coldfusion.tagext.io.OutputTag;
import java.io.Writer;
import javax.servlet.jsp.JspContext;
import javax.servlet.jsp.JspWriter;
import javax.servlet.jsp.tagext.Tag;

public final class cfincWithOutput2ecfm66131501 extends CFPage {
    private Variable MSG;
    static final Class class$coldfusion$tagext$io$OutputTag;
    public static final Object metaData;

    static {
        class$coldfusion$tagext$io$OutputTag = Class.forName("coldfusion.tagext.io.OutputTag");
        metaData = new AttributeCollection(new Object[0]);
    }

    protected final void bindPageVariables(VariableScope varscope, LocalScope locscope){
        super.bindPageVariables(varscope, locscope);
        this.MSG = bindPageVariable("MSG", varscope, locscope);
    }

    public final Object getMetadata() {
        return metaData;
    }

    protected final Object runPage() {
        Throwable t8;
        Throwable t7;
        Object t6;
        int mode0;
        Object value;
        JspWriter out = this.pageContext.getOut();
        Tag parent = this.parent;
        bindImportPath("com.adobe.coldfusion.*");
        _whitespace(out, "\r\n");
        OutputTag output0 = (OutputTag)_initTag(class$coldfusion$tagext$io$OutputTag, 0, parent);
        _setCurrentLineNo(2);
        output0.hasEndTag(true);
        try {
            if ((mode0 = output0.doStartTag()) != 0)
                do
                    out.write(Cast._String(_autoscalarize(this.MSG)));
                while (output0.doAfterBody() != 0);
                if (output0.doEndTag() == 5)
                    return null; 
        } catch (Throwable localThrowable1) {
            output0.doCatch(localThrowable1);
        } catch (Throwable localThrowable2) {
            jsr 6;
            throw localThrowable2;
        }
        Object t9 = returnAddress;
        output0.doFinally();
        ret;
        out.write("<br>");
        return null;
    }
}

Because it's automatically generated code, it's not so easy to follow, but I've highlighted three bits of code that differs between the two files:
  • the yellow stuff seems to be intrinsic overhead for dealing with CFML, as the parser has found some in the file;
  • the orange stuff deals with the <cfoutput> tags;
  • and the green stuff deals with the variable itself.
So note the way the code is compiled is rather different depending on what code is in the file, and the parser simply has no way of knowing that the #msg# in incNoOutput.cfm could possibly indicate a variable rather than just some text.

The main thing to take away from this is to remember that it's not the CFML you type in that gets executed; all that's for is to tell the compiler what Java bytecode to generate. It's the bytecode that's run. And each CFML file ends up being its own discrete class, and code outside the given file has no bearing on how the file is compiled, irrespective of how we as developers read & understand the source code and its flow when we're writing it. The runtime context that the code is run in is not known at the time the compilation takes place.

I can't help but think that's not the best explanation of what's going on (and I also concede it's a bit of a "lay-person's" take on it), but hopefully it might make some sense.

And now I have bags to pack. I'm on a plane back to the UK in a few hours.

Righto.

--
Adam