Thursday 26 December 2013

PDFs and Japanese and weirdness and bugs

G'day:
Here's one I was looking at on Stack Overflow today. It involves generating a PDF with Japanese text in it. I did not solve the issue. This doesn't surprise me too much, because I have never once generated a PDF in CFML in a production environment. I've just never had the need to. My entire exposure to <cfdocument> is in situations like this: investigation and "helping" other people.

Here's the Stack Overflow question: "CFDocument PDF not showing Japanese data". In short, the person has this code:

<cfcontent type="application/pdf">
<cfheader name="Content-Disposition" value="attachment;filename=test.pdf">
<cfprocessingdirective pageencoding="utf-8">
<cfdocument format="PDF" localurl="yes" marginTop=".25" marginLeft=".25" marginRight=".25" marginBottom=".25" pageType="custom" pageWidth="8.5" pageHeight="10.2">
    <cfoutput>
        <?xml version="1.0" encoding="UTF-8"?>
        <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
        <html xmlns="http://www.w3.org/1999/xhtml">
            <head>
                <title>PDF Export Example</title>
                <style>
                    body { font-family: Verdana; }
                </style>
            </head>
            <body>
                <h1>PDF Export Example</h1>
                <p>担当するクライエントの大半は様々な規模の企業だが、カナダの大学や政府関連の研究機関の担当経験もある。</p>
                <h1>PDF Export English Example</h1>
                <p>This is an example.</p>
            </body>
        </html>
    </cfoutput>
</cfdocument>

And the Japanese is not rendering. I can replicate this on CF10:


I'd normally assume - when seeing question marks (or boxes) instead of glyphs - that it would be an encoding problem. But as far as I can tell, the person's doing that side of things correctly. And if I altered their code to just render to screen instead of creating a PDF, it worked fine.

I fiddled around a bit and googled a lot, and found out that not all fonts in PDFs support all possible glyphs. This makes some sense, I guess. Kinda. I dunno how this stuff renders OK on the screen in Verdana and then not in a PDF, but... shrug.

During testing I refined my test code back to this:

<cfprocessingdirective pageencoding="utf-8">

<cfset system = structKeyExists(server, "railo") ? "railo" : (structKeyExists(server, "bluedragon") ? "bluedragon" : "coldfusion")>
<cfset basePdfFileName = listFirst(listLast(getCurrentTemplatePath(), "\/"), ".")>
<cfset pdfFileName =  basePdfFileName & "_" & system &  ".pdf">
<cfset pdfFilePath =  expandPath('./#pdfFileName#')>

<cfset title = "Japanese via &lt;cfdocument&gt;">
<cfdocument format="PDF" filename="#pdfFilePath#" overwrite="true">
    <cfoutput>
    <html>
    <head>
        <style>
            .japanese {
                font-family : mingliu;
            }
        </style>
        <title>#title#</title>
    </head>
    <body>
        <h1>#title#</h1>
        <p>#pdfFileName#</p>
        <div class="japanese">
            以呂波耳本部止<br>
            千利奴流乎和加<br>
            餘多連曽津祢那<br>
            良牟有為能於久<br>
            耶万計不己衣天<br>
            阿佐伎喩女美之<br>
            恵比毛勢須<br>
        </div>
    </body>
    </html>
</cfoutput>
</cfdocument>

(I've changed the text to be the poem  いろは which includes ever Hiragana glyph in it, for the sake of completeness).

This allows me to run the code on ColdFusion, Railo, Open BlueDragon and not overwrite my saved files each time. I've also got rid of the "Save as..." approach being originally taken, via the <cfheader> tag, as that was just annoying during testing.

Running this code via ColdFusion 10, I get this:

Which is exactly what we want. It seems that simply changing the font is all we needed to do.

However here's what Railo comes up with:

And with OpenBD:

So neither of those are great. I'll let the Railo bods know about this, to see what they think. Every time I try to suggest something isn't perfect with the OpenBD guys they get arsey at me, so I'm just not going to bother letting them know. It was one of the rare times recently I had test code originally written for CF/Railo that didn't simply break when I tried to run it on OpenBD, so I guess this represents a win for them. Of sorts.

The person on Stack Overflow is still having problems with this, but I don't know what else might be the problem. They're on ColdFusion 8, but running that code via ColdFusion 8 for me worked fine. So... I dunno. Anyone else know what else might be contributing to this? If you wanna mess around with the code to see what you can come up with, it's on GitHub: https://github.com/daccfml/scratch/tree/master/blogExamples/cfml/pdf.

Righto.

--
Adam