Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

Saturday July 03, 2004
02:18 PM

Extracting email addresses, again

[ #19667 ]

I figured there must be an easier way to extract email addresses then sorting through a file full of emails. The problem is more process than technology. I want to be involved in the process as little as possible without letting a program possibly clobber a bunch of data.

Since I use PINE, I went looking for some way to pipe messages to external programs, and indeed it has one. I needed to enable the enable-unix-pipe-cmd command, and then with the | key I can send the entire message to whatever program I like.

I created this little program I named "f" to extract the From address and store it in a file. I'm not ready to let it directly add the address to my white list, so for I leave the addresses in a different file.

#!/usr/bin/perl
 
while( <> )
    {
    next unless /^From:/;
    chomp;
    s/^From:\s+//; s/.*<(.*@.*)>.*/$1/;
    if( open my( $fh ), ">> $ENV{HOME}/mail/fix-addresses" )
        {
        print $fh "$_\n";
        }
    else
        {
        print "$_: Problem: $!";
        }
    last;
    }

I named the program "f" so I wouldn't have to type a lot when I have to tell PINE which program to use. PINE does remember the last program I specified, but once I quit, it forgets it. Oh well.

Now, I don't have to separate the falsely tagged spam then deal with them later. I can extract the From address right away, then add them to the white list later. If I wanted to get fancy, there would be a database somewhere in all of this, but I have real work to do.

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • The following bit still irks me :)
    s/.*<(.*@.*)>.*/$1/;
    It's one of the things that constantly annoys me, like a pebble in the shoe, when I write sed scripts. What I believe it really should phrased as is
    $_ = $1 if /<(.*@.*)>/;
    • "Should" is strong phrasing. If that's the way you like to do things in sed, that's fine, but around these parts there is more than one way to do it.

      But this is open source, so if you decide to use the script, you can change it to anything that you like. :)
      • What I'm saying is that I can't do it this way in sed, so I'm forced to repeat myself: "find anything followed by the bit I want followed by anything and replace it by the bit I want". At times, I've pined for a crop() function (in sed more so than in Perl, of course, but the Perl verbiage can get old as well).
  • I use a program called the "little brother database" (lbdb in Debian) to handle this for me.
  • To get round 'Pine' only remembering one pipe command, and forgetting that when you quit, you can use its print functionality instead.

    The definition of a printer in 'Pine' can just be a command to pipe stuff to; obviously you're supposed to put commands like lpr in there, but there's no reason why you can't set up anything else as a printer.

    When I was a 'Pine' user I had gvim as my default printer.

    Smylers

    • Ah, very cool indeed. Thanks :)

      I never thought of that because my PINE machine does not have a printer that I can use, so I never bothered to look into that.