Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

rjbs (4671)

rjbs
  (email not shown publicly)
http://rjbs.manxome.org/
AOL IM: RicardoJBSignes (Add Buddy, Send Message)
Yahoo! ID: RicardoSignes (Add User, Send Message)

I'm a Perl coder living in Bethlehem, PA and working Philadelphia. I'm a philosopher and theologan by training, but I was shocked to learn upon my graduation that these skills don't have many associated careers. Now I write code.

Journal of rjbs (4671)

Thursday April 10, 2008
09:07 AM

making addex talk american real good

[ #36122 ]

Look, I respect the diversity of foreign cultures and everything. I try to pronounce silly foreign names correctly, and I have learned to stop referring to Holland as "the Netheregions." In turn, could everybody please officially transliterate their languages to 7-bit? Honestly, it would make everything a lot easier... at least for Addex, which is the top world priority, right?

I have some friends and colleagues who refuse to change their names "just because my software is too parochial," so I've been forced to try to deal with it. See, Apple Address Book is all unicode, but Mac::Glue returns strings in MacRoman when it can (read: for the names I've got in there). My mutt doesn't even like Unicode very much. Anyway, too, if I want to send a message to my friend José, I want to be able to hit j-o-s-e-TAB.

So, I don't want to try to fix mutt and everything else, because that would help too many other people. I just want to help Addex users.

I had to go through a lot of weird steps to get this working. The first problem was to decompose the decoded-from-MacRoman Unicode that I was getting back. Then I dropped out the NUL at the end and any combining characters. This didn't fix Søren, whose stupid Viking name kept its stupid Viking letter. It turns out that LATIN SMALL LETTER O WITH STROKE doesn't decompose. I figured this out only after assuming it was a Mac::Glue bug and whining at pudge about it for a while.

You can see the horrible, horrible steps I've taken below. This code will be optional in the next App::Addex.

use Unicode::Normalize qw(normalize);
use Unicode::UCD 'charinfo';
use charnames ':full';

sub __degrade_to_ascii {
  return $_[0] if $_[0] =~ /^[\x01-\x79]*$/;
  my $decomp = normalize(D => $_[0]);
  my $recomp =
    join '', map { chr(hex($_->{code})) }
    map  {
      ($_->{name} =~ /^(LATIN \w+ LETTER .) WITH/)
      ? charinfo(charnames::vianame("$1"))
      : $_
    }
    grep { $_->{code} =~ /[^0]/ }
    grep { $_->{block} !~ /combin/i }
    map  { charinfo(ord substr $decomp, $_, 1) }  0 .. length $decomp;
}

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.