Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

+ -

  Comment: Re:My advice… (Score 1) on 2009.09.25 6:15

It's controlled through the -C flag (see perlrun). Here's an example of using U+0100 (Ā) on the command line. The file contains the word "Ādam".

$ mate ~/Desktop/adam.txt
$ adam=$(<~/Desktop/adam.txt)
$ xxd ~/Desktop/adam.txt
0000000: c480 6461 6d0a                           ..dam.
$ perl -MDevel::Peek -le 'Dump $ARGV[0]' $adam
SV = PV(0x801168) at 0x800954
  REFCNT = 1
  FLAGS = (POK,pPOK)
  PV = 0x2044f0 "\304\200dam"\0
  CUR = 5
  LEN = 8
$ perl -MDevel::Peek -CA -le 'Dump $ARGV[0]' $adam
SV = PV(0x801168) at 0x800954
  REFCNT = 1
  FLAGS = (POK,pPOK,UTF8)
  PV = 0x2044f0 "\304\200dam"\0 [UTF8 "\x{100}dam"]
  CUR = 5
  LEN = 8

Actually, my problem was with Java, but the same principle applies. :) I've just noticed that this doesn't work for environment variables. That's a shame.

$ export adam
$ perl -MDevel::Peek -le 'Dump $ENV{adam}'
SV = PVMG(0x80a4c0) at 0x800b40
  REFCNT = 1
  FLAGS = (SMG,RMG,POK,pPOK)
  IV = 0
  NV = 0
  PV = 0x2057d0 "\304\200dam"\0
  CUR = 5
  LEN = 8
  MAGIC = 0x2057e0
    MG_VIRTUAL = &PL_vtbl_envelem
    MG_TYPE = PERL_MAGIC_envelem(e)
    MG_LEN = 4
    MG_PTR = 0x205800 "adam"
$ perl -MDevel::Peek -CA -le 'Dump $ENV{adam}'
SV = PVMG(0x80a4c0) at 0x800b40
  REFCNT = 1
  FLAGS = (SMG,RMG,POK,pPOK)
  IV = 0
  NV = 0
  PV = 0x2057d0 "\304\200dam"\0
  CUR = 5
  LEN = 8
  MAGIC = 0x2057e0
    MG_VIRTUAL = &PL_vtbl_envelem
    MG_TYPE = PERL_MAGIC_envelem(e)
    MG_LEN = 4
    MG_PTR = 0x205800 "adam"

This is all on perl 5.8.8, BTW. It may be fixed in later versions.

Read More 8 comments
Comments: 8
+ -

  Comment: My advice… (Score 1) on 2009.09.25 4:11

Seeing as you didn't ask for it. ;-)

Always use UTF-8 if you possibly can. It's (more-or-less) a superset of everything else, and it's properly detectable.

If you're looking for interesting encodings, I'd recommend checking out one of the Shift-JIS things. Just for weirdness. Personally, I've little experience of non-western encodings.

For more concrete use cases to cover with encoding, you should look at:

  • query parameters coming in from browsers
  • POSTed form parameters coming in from a browser.
  • What encoding command line arguments and environment variables use. Yes, this caught me out earlier this week.
  • Getting the right encoding from the database.

There are a lot of places where you need to consider byte to character conversion (and vice versa).

Sorry if I'm rambling over a bunch of places you've already covered! Character encoding always seems to affect me... (e.g. the fact I can't use a proper ellipsis character in this comment box!)

-Dom

Read More 8 comments
Comments: 8
+ -

  Comment: Effective Java (Score 1) on 2009.09.10 11:27

by Dom2 on 2009.09.10 11:27 (#70542)
Attached to: Java Tutorial Fail
I think you'll find that this is suitably frowned upon in Josh Bloch's Effective Java . If you don't have a copy around, I'd really recommend picking up a copy if you're going to be coding Java at all.

You might also like his Java Puzzlers, which is entertaining if less essential reading.

And why not look into Scala whilst you're on the JVM? I find it very Perlish in many ways. :)

Read More 5 comments
Comments: 5
+ -

  Comment: Re:I find SF too messy (Score 1) on 2009.08.17 12:38

by Dom2 on 2009.08.17 12:38 (#70112)
Attached to: Whither SourceForge?
I use both github and google code for one of my projects. It works, and I like both, but if I were starting again from today, I'd be a lot more tempted to settle on just github. Purely from the consistency of having a single site for users.
Read More 14 comments
Comments: 14
+ -

  Comment: Re:Java does (almost) the same (Score 1) on 2009.07.10 3:57

by Dom2 on 2009.07.10 3:57 (#69408)
Attached to: Str and Buf -- I think I get it now
Very true about the 16 bit character. Thankfully, it's less of a problem for me right now.
Read More 14 comments
Comments: 14
+ -

  Comment: I like zsh (Score 1) on 2009.07.09 13:53

by Dom2 on 2009.07.09 13:53 (#69403)
Attached to: Tricks for .profile
Uniqifying your path becomes:

  typeset -U PATH

Read More 1 comments
Comments: 1
+ -

  Comment: Re:Java does (almost) the same (Score 1) on 2009.07.09 13:47

by Dom2 on 2009.07.09 13:47 (#69402)
Attached to: Str and Buf -- I think I get it now
There are (at least) two things wrong with Java's encoding support:
  1. No way to avoid UnsupportedEncodingException, even for UTF-8, which is guaranteed to be present.
  2. The concept of a "system default encoding" is flawed and leads to bugs in portability. You should be forced to always specify an encoding.
Read More 14 comments
Comments: 14
+ -

  Comment: Re:Ooops (Score 1) on 2009.07.02 15:54

by Dom2 on 2009.07.02 15:54 (#69229)
Attached to: Guess Who Loses: Test::More::subtest versus Test::XML
Sorry for the delay, but Test::XML 0.08 is now up, which fixes this issue. If you have any further issues, please let me know!
Read More 4 comments
Comments: 4
+ -

  Comment: Ooops (Score 1) on 2009.06.30 1:33

I plead guilty. It was a severe case of cargo-cult coding at the time. I'll try and roll a new release for you, as soon as I can.
Read More 4 comments
Comments: 4
+ -

  Comment: Download? (Score 1) on 2009.05.02 2:15

by Dom2 on 2009.05.02 2:15 (#68417)
Attached to: Bad download archive hall of shame: Quartz
Funny, because I use maven, I rarely notice what downloads are like these days. You might prefer ivy.

But you're absolutely right, no top-level-directory is a pain.

Read More 1 comments
Comments: 1