Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • My recommendation is to avoid \N and \x escapes except for whitespace and combining characters. Literal characters that can be read immediately and copy-pasted anywhere are much more useful.

    »Perl« is a proper name and is not translated (I haven't even seen it transliterated where it would be possible), »monger« is also very difficult to translate because of its multiple denotations in English (of course that word was picked deliberately for this reason). Can you substitute something easier?

    I have a much better regex example anyway. I held a talk about this at the last Vienna.pm meeting. The purpose is to break down a long list of country and city names into pages. This code sample is very DWIMmy and demonstrates several features:

    • Perl source code can be written in UTF-8.
    • Regex and its character classes can take literal characters.


    use utf8;

    # [...]

    if ('ja' eq $self->_language) { # godyûon pagination
            %pages = (
                    all => { label => '[all]', re => qr/.*/msx },
                    0 => { label => '[あ]', re => qr/\A [ぁ-おァ-オ]/msx },
                    1 => { label => '[か]', re => qr/\A [か-ごカ-ゴ]/msx },
                    2 => { label => '[さ]', re => qr/\A [さ-ぞサ-ゾ]/msx },
                    3 => { label => '[た]', re => qr/\A [た-どタ-ド]/msx },
                    4 => { label => '[な]', re => qr/\A [な-のナ-ノ]/msx },
                    5 => { label => '[は]', re => qr/\A [は-ぽハ-ポ]/msx },
                    6 => { label => '[ま]', re => qr/\A [ま-もマ-モ]/msx },
                    7 => { label => '[や]', re => qr/\A [ゃ-よャ-ヨ]/msx },
                    8 => { label => '[ら]', re => qr/\A [ら-ろラ-ロ]/msx },
                    9 => { label => '[わ]', re => qr/\A [ゎ-ゔヮ-ヴ]/msx },
            );
    } else { # latin pagination
            %pages = (
                    all => { label => '[all]', re => qr/.*/msx },
                    0 => { label => '[A-D]', re => qr/\A [A-D]/msx },
                    1 => { label => '[E-H]', re => qr/\A [E-H]/msx },
                    2 => { label => '[I-L]', re => qr/\A [I-L]/msx },
                    3 => { label => '[M-P]', re => qr/\A [M-P]/msx },
                    4 => { label => '[Q-T]', re => qr/\A [Q-T]/msx },
                    5 => { label => '[U-Z]', re => qr/\A [U-Z]/msx },
            );
    }

    This is not the complete code to achieve the result. People who are experienced in using i18n and collation will see the exceptions and edge cases at one glance. I left it away here because I want to concentrate on the topic at hand. Do you want the rest, too?