Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

miyagawa (1653)

miyagawa
  (email not shown publicly)
http://bulknews.vox.com/
AOL IM: bulknews (Add Buddy, Send Message)

Journal of miyagawa (1653)

Friday March 23, 2007
07:37 PM

Module name wanted

[ #32781 ]

I want a name for my new module, that automatically detects the best, conservative encodings to be used in Email messages, from the strings.

It'll be useful to encode email message in iso-2022-jp if all content are in Japanese, iso-2022-kr for Korean etc. Gmail does it by default: http://mail.google.com/support/bin/answer.py?ctx=gmail&hl=en&answer=22841

I'm thinking of Encode::Email::Best and Encode::Mail::Traditional. Have a suggestion?

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • He looks after Email:: these days, and probably has the best idea of where it would fit.
    • Well I was thinking about Email:: namespace at first, but the actual code wouldn't do anything specific with Email messages actually.

      It tries to encode the messages into a narrow-to-wide certain set of encodings and see if all characters are safely encoded, using Encode:: and possibly with Dan's Encode::InCharset.

      Anyway I'll think about it more.
  • What a coincidence. I was planning on writing exactly that, this weekend, inspired by Mutt's send_charset option.

    I was going to name it Encode::First, and duplicate Encode's encode interface, but with a colon (or perhaps comma) separated list of encodings, of which the first that supports all codepoints will be used. It would return a two-element list: encoding and byte string.

    Typical usage would be:

            my ($enc, $buf) = encode_first('us-ascii:iso-8859-1:iso-8859-15:utf-8', $string)
    • Oh yeah, I like that interface. Maybe I'll suggest an utility function that takes the string and array reference to return the best encoding, and also provide an encode() compatible function just as you described. Thanks!
  • It seems to me that email is just what you want to use the module for. I don’t see how the module’s operation actually has anything whatsoever to do with email. “Best” doesn’t really say anything; maybe Encode::MinCharsetPicker?

    (Btw, I’d have the module only suggest the minimal applicable charset, but not actually do the encoding itself (or only if you ask for it by way of a convenience function). Probably the main function should simply take a list of encodings and the

    • As said in the other comment replies, the actual code doesn't have anything to do with email, other than the default "list of encodings known to be safe in emails" are almost specific to email (which is the point of this module) obviously.

      I'd probably make two functions, one is compatible as encode() (and does encoding itself) and other one like detect_best_encoding(), which returns the name of the encoding but doesn'nt encode itself.
      • The easiest way to detect the "best" encoding would be to just encode it, with a CHECK argument to make it fail if impossible. Why create a utility function to throw away the encoded string, if the user can easily choose to do so himself?

        my ($enc) = encode_first(...);

        Or, have you found another efficient way of finding a suitable encoding?
        • Yes, I was thinking of the exact same logic, as well as using charset tables like the one used in Encode::InCharset. I prefer the easiest, if not the most efficient, so I guess that'll be same as what you described.

          The reason we want the encoding itself back it that we'd like to use it in the Email header. If we return the encoded string only, the caller doesn't know which encoding it's actually encoded in.
        • The reason I suggested that sort of interface is that some APIs expect to receive character strings that they will then encode themselves; XML serialisers come to mind. In such a case, giving the caller an encoded string is pretty useless.

      • Oh, and a list of several email-safe country-specific encodings is of course more common than latin1:utf8, and would make a better default.
  • I don't really have a good name for your module, but I'm going to put an image in your head, the image I see when I read about your intentions and cut out all the fluff.

    What you would appear to be wanting to do, is to find an as small as possible character set that contains every single character in your text. That appears to be related to finding a minimum size geometric shape that contains every vertex in a set. Terms that spring to mind are minimal enclosing circle [google.com] or rectangle [google.com] the latter is also