Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

miyagawa (1653)

miyagawa
  (email not shown publicly)
http://bulknews.vox.com/
AOL IM: bulknews (Add Buddy, Send Message)

Journal of miyagawa (1653)

Wednesday February 14, 2007
04:20 AM

Encode::DoubleEncodedUTF8

[ #32397 ]

I released a new hacky module Encode::DoubleEncodedUTF8.

This module adds new fake encoding "utf-8-de" that automatically finds doubly encoded utf-8 bytes (like \x{c2}\x{e9}) which always happens when you concatenate strings with utf-8 flag on and off. I wouldn't suggest using this module for a production environment (because it might be slow), but this would be really handy to fix the common mistakes made in perl and Unicode/I18N stuff.

The same methodology could be also applied to PHP/Java sites since I see the same bugs on Amazon Web Services or YouTube.

UPDATE: I released 0.02 and now it doesn't heavily use Encode::encode/decode when it finds double encoded utf-8 bytes, hence it's 30 times faster than 0.01.

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • You are my new hero.
    • Heh, thanks! I fixed the code to make it free from Encode::decode/encode when it seeks dodgy utf-8 bytes and released it as 0.02 on CPAN. It's now all regexp based and it's now 30 times faster :)