Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

Friday July 02, 2004
07:00 PM

Extracting From: addresses

[ #19659 ]

Because I am too stupid to pull this out of my shell history or make an alias for it, I post the one-liner I use to extract From addresses from email that my spam blocker incorrectly flags as spam. Those addresses end up in my white list.

grep ^From: fix | perl -pe 's/From:\s+//; s/.*<(.*@.*)>.*/$1/' | sort | uniq >> ~/.procmail/goodfile

I suppose that is rather unixy, those other programs have Perl implementations somewhere.

Now I'll probably forget how to find this post, too.

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • No need for grep when you're using Perl, nor for substitutions when you're not using sed.
    perl -lne '/From:\s+<(.*@.*)>/ and print $1' | sort -u >> ~/.procmail/goodfile
    • perl -lne '/From:\s+<(.*@.*)>/ and $u{$1}++; END { print for sort keys %u}' inputfile
      --
      • Randal L. Schwartz
      • Stonehenge
    • You can't do the substitution in one step because the pattern does not always match, and even the pattern that you use will mostly fail rather than mostly succeed.

      The address lines look like:

      From: "Fred Flinstone" <fred@example.com>
      From: barney@example.com

      I may not need the grep, but it sure makes things easier to figure out when there is a problem. When I try to do such things in one big step, I usually find that I miss something special about the input then have to waste a lot of time figuring o

      • Ah. Well, you can still do something like
        perl -lne 's/^From:\s+// or next; print /<(.*@.*)>/ ? $1 : $_' | sort -u >> ~/.procmail/goodfile
        That keeps the "conceptual grep" separate from the matching, also. :)
        • This one doesn't work either, unless I type the email into standard input.

          Not to be mean, but there really are good reasons for not trying to be too clever. My kludgey looking one liner works, it is easy to understand, and is even faster. Remember, test any code that you post because once it gets online, it lives on forever. :)

          Just for giggles, I timed these against a file of over 1,000 email. I added the filename as a command line argument after your perl invocation, and on average, it took about 0.25
          • In the cases where speed is a concern, this is a very sed-able task. Jibing about the lack of filename was superfluous, btw.
            • I'm sorry you think I was "jibing", but not everyone is going to realize why your one-liner just sits there appearing to do nothing. Leaving out the input is not a minor detail. :)
              • Yes, I realized that the jibe wasn't entirely undeserved after I posted that. I should have been consistent and left out the output redirection as well. Whether the data source and destination are relevant is a matter of view, but specifying one and not the other makes no sense.