Slash Boxes
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

rjbs (4671)

  (email not shown publicly)
AOL IM: RicardoJBSignes (Add Buddy, Send Message)
Yahoo! ID: RicardoSignes (Add User, Send Message)

I'm a Perl coder living in Bethlehem, PA and working Philadelphia. I'm a philosopher and theologan by training, but I was shocked to learn upon my graduation that these skills don't have many associated careers. Now I write code.

Journal of rjbs (4671)

Tuesday November 27, 2007
06:44 PM

email::folder woes (part n)

[ #34991 ]

I mumbled something about Email::Folder hating me, today, but I was too busy to explain, and I promised that I'd write down my annoyances later. I'd love to fix these problems soon, but for now it's easier to just grumble about them, and it will make me feel better.

To print all threads in a maildir, very naively, I might write something like this:

my $maildir = Email::Folder->new('./Maildir/');

while (my $email = $maildir->next_message) {
  my $subject = $email->header('subject');
  next if $subject =~ /^re:/i;
  print "$subject\n";

Great! There are all the non-reply subjects, more or less. They're not in order, though, and I want to see them in order. Email::Folder's iterator is not ordered, and there is no uniform way to request that it be ordered. To get messages in order, we'll need to get them all and then sort. That's not such a bad obstacle, really.

my $maildir = Email::Folder->new('./Maildir/');

# the sort isn't interesting
my @emails = sort { ... } $maildir->message;

for my $email (@emails) {
  my $subject = $email->header('subject');
  next if $subject =~ /^re:/i;
  print "$subject\n";

Now, the problem here is that we've now loaded every email at once. They're loaded as Email::Simple objects, which means the entire message content is loaded into memory at once, so if I had a huge maildir, I now have a huge perl process.

Email::Folder provides a bless_message method, which is used to create the Email::Simple objects. Each time the Email::Folder object's next_message method is called, the Email::Folder::Reader (subclassed for the storage medium) gets the message content from the underlying storage and returns it as a string. Email::Folder then passes it to bless_message, which by default passes it to Email::Simple. It's being passed around as a string, meaning that we're copying the full text of each (possibly huge) message a few times before returning the object and throwing away the raw string.

It would be easy to make the Maildir reader return filehandles, but bless_message also needs to be replaced to handle them. Then the problem is that if you try to do this:

my $folder = Email::Folder::MessagesFromFH->new('mbox'); will be hosed, because you will get a Email::Folder::Mbox, which reads messages out as strings. You need to either write a bless_message that handles strings and filehandles, or you need to override new to prevent anything that won't use the right reader.

All I wanted to do was implement a cooler version of frm!

Hopefully I will wake up fresh in the morning and feel energized to actually do something constructive, rather than just whine.

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
More | Login | Reply
Loading... please wait.
  • Other than the sorting, wouldn't this be pretty much trivial with a bit of grep? Something like:

            grep '^Subject: ' $(find Maildir -type f) | grep -v '^Subject: re:'

    Yeah, continuation lines would take a bit more, but mail's easy as long as you don't touch MIME.
    • My example was *radically* simplified for the point of demonstrating the headache, it was not the entire program I wanted to produce.
      • Sorry, I was just guessing from what you posted and your description of a "better 'frm'." But still, so long as you only care about headers, it seems like Email::Foo would be more trouble than it's worth.
        • You are wrong.

          I don't want to match header-like content in bodies, or in the headers of subparts. I need to match wrapped headers. I will need to decode MIME-encoded headers. I will need to parse RFC822 date fields.

          Email isn't simple.
          • Email isn't simple.

            That said, I just today had to install some code that used your helpful modules in order to make it more simple. Thanks a whole bunch. You make email easier.

            -- Douglas
            • Thanks! While I am more a maintainer than an author on many or most of the email modules under my name on the CPAN, knowing that they save people work is a nice motivator to keep doing my own work on them.
          • Just trying to make a helpful suggestion, not question your intelligence or piss in your oatmeal. Ah, well...
            • I'm sorry, I don't mean to come off crabby, but "Can't You Just? []" is a common refrain around email programming, and basically always leads to horrible problems due to the mistaken belief that email is just some headers and maybe a body. I have grown bitter and grumpy whenever someone says to use something non-email-specific to do email stuff.

              Maybe this is my brain's way of telling me that I'm done with email and should move on to something that's always fun, like the web.
              • My bad. I've spent a bit of time with email (but obviously not as much as you have), which probably made me overconfident. Plus, for personal projects like this, I have a strong bias toward 80% or 90% solutions.

                Anyways, enjoy that always-fun web. (X|HT)ML seems just the thing to make email look simple.