1. It's widely known that Jcode.pm has Unicode map problem that Full-Width-Tilde (U+FF5E) doesn't map well to euc-jp. It's due to the mistake of Unicode.org's own mapping table.
But with the recent Encode.pm, it still has problem:
% perl -MEncode -e 'print encode("euc-jp" "\x{ff5e}", Encode::FB_CROAK)'
"\x{ff5e}" does not map to euc-jp at/usr/lib/perl/5.8.2/Encode.pm line 149.
What does it mean? grepping ucm files shows:
% grep -i FF5E ucm/euc-jp.ucm
<UFF5E> \xA2\xB2 |3 # 1-2-18
Doesn't it mean that UFF5E maps to \xA2\xB2 in euc-jp?
2. What's the best practice in develop application in multi-encoding environment, like web+db+xml applications? It'd make me a mess while developing in such enviroment that:
Concatinating non-Unicode strings with Unicode strings raise UTF-8 Auto Upgrading and thus raw UTF-8 Strings get corrupted.
For example at least, how do I tell Template-Toolkit that template is written in euc-jp? It calls open() in its own modules, so binmode or encoding.pm, unless you open template files and pass its filehandle explicitly, which is not the case of mine.
I tend to think there should be encoding layers to all data-stream-handling modules like DBI, Template-Toolkit, CGI.pm (or Apache::Request) etc. Am I thinking right here?
Troubles with Template Toolkit (Score:1)
as he is being sponsored to work on the next version (TT3 [tt2.org]) for a few months.
use AxKit; (Score:2)
Partly it's the beauty of XML - that it has been written to explicitly handle different encodings cleanly.
Unicode in Template toolkit and Apache::Request (Score:2)
Ilya Martynov (http://martynov.org/ [martynov.org])