Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

Journal of jjore (6662)

Sunday July 26, 2009
04:20 AM

Your future Unicode overlords

[ #39354 ]

Recently while testing perl-5.10.1-RC0 at $work I found that it worked seamlessly for everything except our custom debianized build. It turns out http://cpansearch.perl.org/src/ADAMK/Parse-CPAN-Meta-1.39/t/data/utf_16_le_bom.y ml a UTF-16LE encoded file with a BOM (byte order mark) breaks my build of a .deb including this.

debuild makes both the binary and source packages. It's nice for presenting the whole picture to someone trying to follow along later. debuild uses dpkg-source to generate a diff between the vendor's source package and the source tree used in the actual debian build. dpkg-source uses your diff which probably comes from diffutils-2.8.1 which was the last release in 2002 (http://ftp.gnu.org/pub/gnu/diffutils/).

It looks like dev continued on til 2008 including the dev releases (ftp://alpha.gnu.org/pub/gnu/diffutils/, http://git.savannah.gnu.org/cgit/diffutils.git). Somewhere along the way it got support for multibyte characters. It still doesn't work for that UTF-16LE text file with a BOM.

Oh well. Anyway, consider this a tale of basic tools just not working nicely when your source code is in Unicode.

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • It took me a couple of years to get BOM support into PPI. It frustrates me that we still have to fight basic battles like this. The Java world has it pretty good, with Unicode from day 1. I hope Parrot/Rakudo does that well.

    • I've been on the sidelines for getting proper unicode support into perl but I'd think this was the domain of the PerlIO layers to suss this out.