NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.
All the Perl that's Practical to Extract and Report
Stories, comments, journals, and other submissions on use Perl; are Copyright 1998-2006, their respective owners.
real world example (Score:1)
My recommendation is to avoid \N and \x escapes except for whitespace and combining characters. Literal characters that can be read immediately and copy-pasted anywhere are much more useful.
»Perl« is a proper name and is not translated (I haven't even seen it transliterated where it would be possible), »monger« is also very difficult to translate because of its multiple denotations in English (of course that word was picked deliberately for this reason). Can you substitute something easier?
I have a much better regex example anyway. I held a talk about this at the last Vienna.pm meeting. The purpose is to break down a long list of country and city names into pages. This code sample is very DWIMmy and demonstrates several features:
use utf8;
# [...]if ('ja' eq $self->_language) { # godyûon pagination
%pages = (
all => { label => '[all]', re => qr/.*/msx },
0 => { label => '[あ]', re => qr/\A [ぁ-おァ-オ]/msx },
1 => { label => '[か]', re => qr/\A [か-ごカ-ゴ]/msx },
2 => { label => '[さ]', re => qr/\A [さ-ぞサ-ゾ]/msx },
3 => { label => '[た]', re => qr/\A [た-どタ-ド]/msx },
4 => { label => '[な]', re => qr/\A [な-のナ-ノ]/msx },
5 => { label => '[は]', re => qr/\A [は-ぽハ-ポ]/msx },
6 => { label => '[ま]', re => qr/\A [ま-もマ-モ]/msx },
7 => { label => '[や]', re => qr/\A [ゃ-よャ-ヨ]/msx },
8 => { label => '[ら]', re => qr/\A [ら-ろラ-ロ]/msx },
9 => { label => '[わ]', re => qr/\A [ゎ-ゔヮ-ヴ]/msx },
);
} else { # latin pagination
%pages = (
all => { label => '[all]', re => qr/.*/msx },
0 => { label => '[A-D]', re => qr/\A [A-D]/msx },
1 => { label => '[E-H]', re => qr/\A [E-H]/msx },
2 => { label => '[I-L]', re => qr/\A [I-L]/msx },
3 => { label => '[M-P]', re => qr/\A [M-P]/msx },
4 => { label => '[Q-T]', re => qr/\A [Q-T]/msx },
5 => { label => '[U-Z]', re => qr/\A [U-Z]/msx },
);
}
This is not the complete code to achieve the result. People who are experienced in using i18n and collation will see the exceptions and edge cases at one glance. I left it away here because I want to concentrate on the topic at hand. Do you want the rest, too?
Reply to This