Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • Seeing as you didn't ask for it. ;-)

    Always use UTF-8 if you possibly can. It's (more-or-less) a superset of everything else, and it's properly detectable.

    If you're looking for interesting encodings, I'd recommend checking out one of the Shift-JIS [wikipedia.org] things. Just for weirdness. Personally, I've little experience of non-western encodings.

    For more concrete use cases to cover with encoding, you should look at:

    • query parameters coming in from browsers
    • POSTed form parameters coming in from a browser.
    • What e
    • Can you tell me more about the command line and environment variable problem? I think I'll have the other ones covered, but I'd like to know how you solved that one. I don't recall reading anything about how Perl will treat those.

      • It's controlled through the -C flag (see perlrun). Here's an example of using U+0100 (Ā) on the command line. The file contains the word "Ādam".

        $ mate ~/Desktop/adam.txt
        $ adam=$(<~/Desktop/adam.txt)
        $ xxd ~/Desktop/adam.txt
        0000000: c480 6461 6d0a                           ..dam.
        $ perl -MDevel::Peek -le 'Dump $ARGV[0]' $adam
        SV = PV(0x801168) at 0x800954
          REFCNT = 1
          FLAGS = (POK,pPOK)
          PV = 0x2044f0 "\304\200dam"\0
          CUR = 5
          LEN = 8
        $ perl -MDevel::Peek -CA -le 'Dump $ARGV[0]' $adam
        SV = PV(0x801168) at 0x800954
          REFCNT = 1
          FLAGS = (POK,pPOK,UTF8)
          PV = 0x2044f0 "\304\200dam"\0 [UTF8 "\x{100}dam"]
          CUR = 5
          LEN = 8

        Actually, my problem was with Java, but the same principle applies. :) I've just noticed that this doesn't work for environment variables. That's a shame.

        $ export adam
        $ perl -MDevel::Peek -le 'Dump $ENV{adam}'
        SV = PVMG(0x80a4c0) at 0x800b40
          REFCNT = 1
          FLAGS = (SMG,RMG,POK,pPOK)
          IV = 0
          NV = 0
          PV = 0x2057d0 "\304\200dam"\0
          CUR = 5
          LEN = 8
          MAGIC = 0x2057e0
            MG_VIRTUAL = &PL_vtbl_envelem
            MG_TYPE = PERL_MAGIC_envelem(e)
            MG_LEN = 4
            MG_PTR = 0x205800 "adam"
        $ perl -MDevel::Peek -CA -le 'Dump $ENV{adam}'
        SV = PVMG(0x80a4c0) at 0x800b40
          REFCNT = 1
          FLAGS = (SMG,RMG,POK,pPOK)
          IV = 0
          NV = 0
          PV = 0x2057d0 "\304\200dam"\0
          CUR = 5
          LEN = 8
          MAGIC = 0x2057e0
            MG_VIRTUAL = &PL_vtbl_envelem
            MG_TYPE = PERL_MAGIC_envelem(e)
            MG_LEN = 4
            MG_PTR = 0x205800 "adam"

        This is all on perl 5.8.8, BTW. It may be fixed in later versions.