Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • I was actually having a conversation yesterday about various mail handling modules. I'm curious to what support Mail::Box has/needs in respect to creating/parsing non-ASCII email headers and bodies. I have some modules I did for $company which parse/create non-ASCII headers, bodies and attachment names, and will then convert between encoding (since it was used for a web-based email system, so emails coming in as ISO-2022-JP could be displayed as UTF-8... for example).

    Anyways, I was looking for a place whe

    • There's code in SpamAssassin3 that I wrote that does this, if Mail::Box doesn't.

      As far as SpamAssassin (and other similar projects I work on) is concerned, one thing we need is access to both the parsed and the unparsed headers and body. Because spam (and virus) detection requires that sort of thing.
      • Character decoding and encoding of header fields is long on the wishlist, and I may actually do some work on that this week. However, it has quite some implications, especially the way you specify which encoding you want to be used... Besides, I waited for Perl to settle a bit on unicode.

        So: decoding is simple to integrate, encoding is much harder. Next to the existing folded() and unfolded() version of the field's data, you will also see a separate decoded() method.So you will be able to get the data from the field in any way you like.

        • However, it has quite some implications, especially the way you specify which encoding you want to be used...

          I don't understand what you mean by this. The email specifies what encoding you wish to use, and all you have to do is decode to unicode (utf-8 in perl terms). Or if you want to write an email, you can simply specify it as a recognised encoding format. If the encoding format isn't recognised there's little you can do. It's also pretty easy to determine if you should encode something. If it's got ch
        • It is really easy to get from

          Subject:
          =?iso-8859-1?Q?Re:_Re:=5FContact=5Fde=5FJames=5Fou=5Fde=5FPhilip=5Fun= 5Fa?=        =?iso-8859-1?Q?mericain=5F=E0=5Fparis=5Fpour=5Fles=5Fintimes....=5F=5F__?=

          to a unicode string, but it is quite hard to do the reverse, where the type of encoding is autogically chosen, but also configurable, as well as the character-set which is used.

          For your own program, it is easy to implement, but a general CPAN module has quite some extra needs... certainly i