Thursday 11 October 2012

Thanks Dave Harris: a footnote about getMetadata()

G'day:
I'm following up on some comments people have left against various articles I've written recently: cheers for the thoughts / advice / general input, everyone.

I've updated my article about creating a query and populating it with data in one fell swoop in CF9 to reflect a tip Brian Swatzfager offered me, and tested out some UTF-8 stuff that Nik Stephens reminded me of in my article questioning what Adobe were thinking with their implementation of pageEncoding.  Thanks for the input you fellas.

However something Dave Harris (with whom I used to work, back in NZ) said in response to this article about my expectations of how getMetadata() works warranted a brief article of its own, to get it onto people's radar.



I was observing that after having injected a method into an existing object, when I then called getMetadata() on the object the inserted method is nowhere to be seen.

Dave's comment revealed what's going on here.  He said:
I wonder if it's something to do with that the "getMetaData" is run and then the result stored as a singleton.
There was a "I didn't know you could do that with ColdFusion" session someone did that mentioned that, and suggested using the metadata information to store static constants.

So the flow is something like:
CF creates the object and stores the metadata of the object as an application singleton
A clever dev injects functions in to the component
the above mentioned dev calls "getMetaData"
CF returns the singleton information that was defined before the dev injected more functions.

Disclaimer: I am only guessing
Well good guesswork Dave: that seems to be exactly what's happening.  Here's the evidence:

// My.cfc
component {

    function f(){}

}

<!--- test.cfm --->
<cfscript>
    o = new My();
    md1 = getMetadata(o);
    md2 = getMetadata(o);

    writeDump(var=md1.functions, label="m1 before");
    writeDump(var=md2.functions, label="m2 before");
    writeOutput("<hr />");
    md1.functions[1].hint = "How many places will this show up in?";
    writeDump(var=md1.functions, label="m1 after");
    writeDump(var=md2.functions, label="m2 after");
</cfscript>

So this instantiates an object, takes two copies of its metadata, updates one copy of the metadata and dumps both out again.  And this is what it outputs:

m1 before - array
1
m1 before - struct
HINTHow many places will this show up in?
NAMEf
PARAMETERS
m1 before - array [empty]
m2 before - array
1
m2 before - struct
HINTHow many places will this show up in?
NAMEf
PARAMETERS
m2 before - array [empty]

m1 after - array
1
m1 after - struct
HINTHow many places will this show up in?
NAMEf
PARAMETERS
m1 after - array [empty]
m2 after - array
1
m2 after - struct
HINTHow many places will this show up in?
NAMEf
PARAMETERS
m2 after - array [empty]

Note that it shows up where we'd expect it to, as well as where perhaps we would not expect it. But this does mean there's just the one instance of the object's metadata being held in the background, and each call just returns that.  Suck.

So the next question is why?  Why does it work this way?  I mean: how hard can it be to interrogate the object and get its metadata back.  Maybe ages!  Let's see:

I made this CFC:

component {
    function f1(string uuid="23A2DD42-D067-E5E6-F12EB621A59CC403"){
        return ucase(arguments.uuid);
    }
    function f2(string uuid="23A2DD43-D067-E5E6-F12ED0A60636617A"){
        return ucase(arguments.uuid);
    }

    // etc, for 3-999

    function f1000(string uuid="23A2E1D2-D067-E5E6-F12E00DE2F8BB36F"){
        return ucase(arguments.uuid);
    }
}

And ran this code:

<!--- testTimings.cfm --->
<cfscript>
    start = getTickCount();
    o1 = new Big();
    writeOutput("#getTickCount()-start#ms to create the first object<br />");

    start = getTickCount();
    o2 = new Big();
    writeOutput("#getTickCount()-start#ms to create the second object<br />");

    start = getTickCount();
    md1 = getMetadata(o1);
    writeOutput("#getTickCount()-start#ms to generate metadata the first time<br />");

    start = getTickCount();
    md2 = getMetadata(o2);
    writeOutput("#getTickCount()-start#ms to generate metadata the second time<br />");
</cfscript>

So the idea is we have a very large CFC (with 1000 distinct methods in it) which'll take a while to generate the metadata for (perhaps), as it has a lot of methods in it, and a lot of unique values in the method metadata too. And we then time how long it takes to create an instance of it, and a second instance of it, and get the metadata for each.  I was hoping that I'd see an explanatory spike in the time it takes to generate the metadata the first time, thus validating this approach Adobe have taken in only generating the metadata once.

Here are the results:

1ms to create the first object
0ms to create the second object
0ms to generate metadata the first time
0ms to generate metadata the second time

Sometimes the second object took a millisecond or so to create instead.  I would say the timings here are within the margin of error of CF being able to time things.  Note: on the first compile of the CFC, it took about 1.5sec to compile, but all the other timings were around zero.  I wonder if the metadata is extracted at compile time, and the [meaningful] overhead of doing it is part of that compilation time?

It might be handy if someone from the Adobe ColdFusion team could comment on this. I'll chase.

I have to say, though, if there's no overhead in generating metadata, then due to CF's highly dynamic nature, it is not appropriate to generate this metatdata ahead of time (for the very reasons under discussion here).

Cheers for the heads-up here, Dave.  Nice one.  Say g'day to Foo from me.

Righto.

--
Adam