Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • Seriously folks, Java is a nice language and all, but why not use the right tool for the job? As demonstrated most effectively by Professor Hoffman, it's quite cumbersome to parse text files using Java. Now with Perl, you can do something like this:

    perl -ne "print unless /^192\.(9|18|29)\./o||/\.(gif|jpg|css|GIF|JPG|CSS) HTTP/o" < access-log > clean-log

    Heck, you could even make it your .sig. Not to mention that it runs faster than the Java version. The regex solution is even a tad bit faster tha

    • I sent John the one-liner and he was nice enough to test it himself. Results follow (emphasis mine)...

      From: John Hoffmann
      Date: Tue Sep 17 15:32:30 2002 (PDT)
      To: Brad Choate
      Subject: Re: Java vs. Perl

      Brad,

      Thanks, you were the second person to write, but the first guy couldn't offer an
      optimization. Just ran your one liner on 578 Meg file and it took half the time
      of the java
      .

      %timex perl -ne "print unless /^192\.(9|18|29)\./o || /\.(gif|jpg|css|GIF|JPG|CSS) HTTP/o" < developer.20020916.raw > develo
      • Those two o's above, as in /.../o, seem quite useless because the regex's are constant. Is there a reason for them?

        --
        /-\
        • I guess not :) Silly me, I thought /o always helped when using the same regex pattern in a loop such as this. And I hadn't thought about specifying the non-capturing syntax, also suggested in this thread. The final result:

          perl -ne "print unless /^192\.(?:9|18|29)\./||/\.(?:gif|jpg|css|GIF|JPG|CSS) HTTP/" < input > output

          The fastest of all so far... any other improvements?

          • Try the following... if you are much more likely to have gif, jpg, css file than local files switch the regexps around and try:

            perl -ne 'print unless /\.(?:gif|jpg|css|GIF|JPG|CSS) HTTP/||/^192\.(?:9|18|29)\./;' input > output