Saturday 7 December 2013

CFML weirdness with chr(0)

I was trying to help someone on Stack Overflow y/day with their question "How to pad a string with null/zero-bytes in ColdFusion & difference between CF on MacOS and Windows". The question is actually shorten than the title: how to add null characters to a string in CFML.

I thought the answer was a simple "use chr(0)", but this turned out to not be a viable answer on ColdFusion (or Railo for that matter).

In response to my suggestion, Stack Overflow veteran Leigh made the observation "Unfortunately no. The character just disappears. But URLDecode("%00") will generate a single null byte". I've not known Leigh to be wrong so I didn't doubt him, but that sounded really odd, so I decided to check it out.

<!--- baseline.cfm --->
<cfset s = "test" & chr(0)>
string: [#s#]<br>
length: [#len(s)#]<br>

And the - surprising to me - output:

string: [test]
length: [4]

Um... where gone?

I tried this on Railo... thinking Railo's more likely to get it right than CF is, and it had the same output. Testing on OpenBD got what I'd consider to be the correct results:

string: [test]
length: [5]

The NULL isn't printable, so doesn't render anything, but it should still actually be there, and that string should consist of five bytes: 0, 116, 101, 115, 116. That's a length of five. As per OpenBD's output.

That said, I know that NULLs do have special meaning in some strings, for example C has the concept of a null-terminated string, in which the null signifies the end of the string, not a character in the string. I wasn't aware of this being a "thing" in Java, but maybe it was.

I refined my code somewhat to not be simply end-padding the string:

<!--- outputStringContainingChr0.cfm --->
<cfset s = chr(0) & "foo#chr(0)#">

So I'm sticking a NULL at the beginning and end of the string. If it was acting as a terminator, s would simply be an empty string afterwards. But I get this (CF and Railo both):


102 is indeed the ASCII code for f.

So what's the story here? I still had a suspicion that something "non-stupid" was happening here, and I just didn't get it. Maybe there's something about a NULL char's standard handling that means it's not added to strings. Although this seems far-fetched as obviously there's use-cases for it (see the Stack Overflow question), and indeed Leigh came up with the fudged way to do it:

// fudge.cfm
s = "test" & urlDecode("%00");
writeOutput("string: [#s#]<br>");
writeOutput("length: [#len(s)#]<br>");
writeOutput("bytes: ");
        writeOutput(asc(c) & " ");

And the result is what we'd expect:

string: [test]
length: [5]
bytes: 116 101 115 116 0

As a comparison, I checked how other languages deal with NULL characters. Also as a personal exercise to learn a bit more of the languages concerned (or in the case of Groovy... any of the language at all!).


# outputStringContainingChr0.rb
s = 0.chr + "foo#{0.chr}"
puts "#{s}:#{s.length}:#{s.slice(0).ord}"


// outputStringContainingChr0.php
$null = chr(0);
$s = $null . "foo$null";

echo "{$s}:" . strlen($s) . ":" . ord(substr($s, 0));

PHP sux in that - it seems (and correct me if I'm wrong, and I'll update the article accordingly) - one cannot embed an expression in a string. Just a variable. Seems primitive.


// outputStringContainingChr0.groovy
s = Character.toString((char) 0) + "foo${Character.toString((char) 0)}"
println "${s}:${s.length()}:${(int)s.getAt(4)}"

There's a good chance this is not model Groovy code. These are the 3rd and 4th Groovy statements I have every written (the first two are documented here: "Groovy: G'day World"). Please let me know how I should be writing it if it's less than idea.


One thing I will say on all this, as a bit of digression, is that the CFML code is far neater than any of the other three languages here. I'd expect CFML to be more pleasing than PHP because PHP's a mess, but aren't Ruby and Groovy known for their elegance (at least to ask their zealots)? Of course I could be getting ahead of myself here... I could be writing absolute rubbish in all three other languages, but the code came from googling how other people arrive at the same ends.  Dunno. Interesting observation if I'm right though!

Anyway, all three of these output exactly the same thing:

C:\webroots\shared\git\blogExamples\coldfusion\bugs\asciiNull>ruby outputStringC
 foo :5:0

C:\webroots\shared\git\blogExamples\coldfusion\bugs\asciiNull>php outputStringCo
 foo :5:0
C:\webroots\shared\git\blogExamples\coldfusion\bugs\asciiNull>groovy32 outputStr
 foo :5:0


IE: NULL is part of the string, just like any other character.

OK, so I think it's safe to say that CF is doing this wrong. And I hope Railo is simply following-suit by way of cross-compat, and didn't decide to do this under their own steam. And OpenBD is getting it right.

Lastly, I decompiled a very simple piece of  code:

<!--- decomp.cfm--->
<cfset s = chr(0) & "foo#chr(0)#">

The full decompilation of this is in Github @, but the relevant bit is here:

_whitespace(out, "\r\n");
_whitespace(out, "\r\n");
OutputTag output0 = (OutputTag)_initTag(class$coldfusion$tagext$io$OutputTag, 0, parent);

Note the assignment of s: the chr(0) values are just not there.

OK, so a more deliberate test:

<cfset nul = chr(0)>

And the (relevant bit of the ~) decompilation:


The CF compiler simply ignores the chr(0). Let's see what it does with another unprintable ASCII character:

<cfset bel="chr(7)"><!--- 7 is "bell": it makes yer computer beep! --->

And the decompilation of the assignment:

This pretty clearly demonstrates that Adobe are going out of their way to ignore chr(0). I wonder WTF possessed them to do that?

I'm gonna raise a bug with Adobe: 3681134; and Railo: RAILO-2788. And bravo for OpenBD for getting something right.