Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

Matts (1087)

Matts
  (email not shown publicly)

I work for MessageLabs [messagelabs.com] in Toronto, ON, Canada. I write spam filters, MTA software, high performance network software, string matching algorithms, and other cool stuff mostly in Perl and C.

Journal of Matts (1087)

Friday September 12, 2003
05:15 PM

Parsing emails

[ #14679 ]

There are many many modules on CPAN to parse emails. So of course I wrote my own.

I've had this email parser written for a while now, and while it deals nicely with RFC compliant mails I've wanted for a long time to add support for all those annoying RFC deviations - binary headers, binary body parts, etc.

So this week I wrote a test suite, bumped up the perl requirement to 5.8, and started hacking on a test suite that would include lots of problematic emails. Once I'd got the tests together it became fairly easy (though not very easy!). So now I have a mail parser that nicely handles binary headers and decodes them to UTF-8 (as well as doing the same for MIME encoded headers). Unfortunately I can't reveal exactly how it's done as it's proprietary technology (it involves a lot of heuristics, and a lot of evil test emails that we see). But it's been an interesting project.

Interestingly this system now displays more broken emails as the original author intended than any email client I have access to. Perl++!

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • Are you going to release this magic beast?
    • Re:Oooh... (Score:3, Insightful)

      Unfortunately I don't think so. I can ask about it, but the heuristics are probably considered IP.

      The other tragic thing is the API sucks. I wish I could start again with a sane API, but now I have so many applications that use the code I think I'd have to start again with a new module name.
      • Simon's Email::MIME never really got much past the "Design the first part of an API" stage. Perhaps that would be the right vehicle to usurp and make go.