Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

TorgoX (1933)

TorgoX
  sburkeNO@SPAMcpan.org
http://search.cpan.org/~sburke/

"Il est beau comme la retractilité des serres des oiseaux rapaces [...] et surtout, comme la rencontre fortuite sur une table de dissection d'une machine à coudre et d'un parapluie !" -- Lautréamont

Journal of TorgoX (1933)

Sunday October 12, 2003
04:14 PM

Malformed UTF8 Hoohah

[ #15182 ]
Dear All,

I am puzzled. The current verison of Pod::Simple installs happily and quietly under 5.8. But under 5.6, trying to install it leads to banshee-like screaming about malformed UTF8 characters. I've no idea why. Can anyone offer any clues? I'm really not doing anything exotic in my Pod::Simple code.

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • Any chance of telling a little bit more about the banshee part?

    • I think this is about the most minimal case I can produce: podsimpleutf8hell.pl [interglacial.com]
      • Re:details (Score:3, Insightful)

        I'm afraid it looks like 5.6.1 is out of luck here. In other words, looks like yet another Unicode bug in 5.6.1. What it looks like is that the 0xe9 or ISO Latin 1 eacute does not get correctly upgraded to UTF-8, and this fatally annoys s/// and ord().

        You *may* be able to dodge this by forcing the data at appropriate spots into UTF-8: something along the lines of:

        $ascertain_utf8 = $] < 5.008 ? sub { $_[0] .= chr(0x100); chop $_[0] } : sub { };