Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • by Juerd (1796) on 2007.03.23 21:10 (#53952) Homepage
    What a coincidence. I was planning on writing exactly that, this weekend, inspired by Mutt's send_charset option.

    I was going to name it Encode::First, and duplicate Encode's encode interface, but with a colon (or perhaps comma) separated list of encodings, of which the first that supports all codepoints will be used. It would return a two-element list: encoding and byte string.

    Typical usage would be:

            my ($enc, $buf) = encode_first('us-ascii:iso-8859-1:iso-8859-15:utf-8', $string);

    This would encode "2.5" as ascii, "2½" as latin1, "€ 2,50" as latin9, and "€ 2½" as utf-8.

    I was also considering optimizing "iso-8859-1:utf8" by simply trying utf8::downgrade with FAIL_OK (ignoring the return value), and then examining the UTF8 flag. This would be the default for when the specified encoding list was empty or undef.

    I'd be delighted if you would use this interface and module name; it would save me some trouble, while giving me exactly what I've been wanting all week. :)
    • Oh yeah, I like that interface. Maybe I'll suggest an utility function that takes the string and array reference to return the best encoding, and also provide an encode() compatible function just as you described. Thanks!