Monday, 14 October 2013

CFCamp: <cfprocessingdirective> and how not to use it

This is gonna be a quick one. The topic of <cfprocessingdirective> came up today, in the context of I18n, and character encoding.

Just to make something very clear. Here are some situations in which <cfprocessingdirective> is no use whatsoever when it comes to dealing with character encoding:
  • ensuring database data is treated transmitted to and received from the database in the correct encoding;
  • altering the handling of FORM or URL parameters in the context of character encoding;
  • flagging to the browser that the response text is in a given encoding.
It does not impact any of those things at all. It does precisely one thing.

It's a compiler instruction, and what it does is tell the compiler to handle your source code using a specific encoding. For example, if you have this:

<cfset msg = "Эх, чужак, общий съём цен шляп (юфть) – вдрызг!">

And have it saved as UTF-8, unless you actually tell the CF compiler that it's UTF-8, it won't know, and it will get compiled wrong, and end up outputting something like this:

Эх, чужак, общий Ñ?ъём цен шлÑ?п (ÑŽÑ„Ñ‚ÑŒ) – вдрызг!

This is nothing to do with the character encoding of the response data or anything like that, it'd because CF has compiled it wrong. This is what's been compiled:

this.MSG.set("Эх, чужак, общий Ñ?ъём цен шлÑ?п (ÑŽÑ„Ñ‚ÑŒ) – вдрызг!");

So no matter what you do to the response... the code has been compiled wrong.

Here's where <cfprocessingdirective> comes in:

<cfprocessingdirective pageEncoding="utf-8">
<cfset msg = "Эх, чужак, общий съём цен шляп (юфть) – вдрызг!">

This results in being compiled to:

this.MSG.set("Эх, чужак, общий съём цен шляп (юфть) – вдрызг!");

And outputs as:

Эх, чужак, общий съём цен шляп (юфть) – вдрызг!

Which is what we want.

This is also why <cfprocessingdirective> needs to go into each and every file that has non-ASCII-equivalent UTF-8-encoded characters in them, rather than once in Application.cfc, or in a parent file which then includes files with the UTF-8 stuff within: it's a compiler instruction. Runtime considerations like Application.cfc and which files include which other ones are not considered when files are compiled, so none of that matters.

So if you're doing your non-English website and wondering why stuff is rendering in the browser like random glyphs and punctuation... unless the stuff that's rendering wrong is actually in your source code, then <cfprocessingdirective> is irrelevant to the issue. Check your JDBC connection strings, whether or not you're specifiying "UTF-8" (or whatever is appropriate) on file-read/write operations (including xmlParse() calls and stuff like that that hits the file system), or check the HTTP headers you're sending to the browser etc.

Hopefully that clears-up things about <cfprocessingdirective> a bit.

And now back to this presentation on I18n with Mura...