Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • Seriously folks, Java is a nice language and all, but why not use the right tool for the job? As demonstrated most effectively by Professor Hoffman, it's quite cumbersome to parse text files using Java. Now with Perl, you can do something like this:

    perl -ne "print unless /^192\.(9|18|29)\./o||/\.(gif|jpg|css|GIF|JPG|CSS) HTTP/o" < access-log > clean-log

    Heck, you could even make it your .sig. Not to mention that it runs faster than the Java version. The regex solution is even a tad bit faster tha

    • Hoffmann responds... (Score:2, Informative)

      by bschoate (202) on 2002.09.17 17:45 (#12974) Homepage
      I sent John the one-liner and he was nice enough to test it himself. Results follow (emphasis mine)...

      From: John Hoffmann
      Date: Tue Sep 17 15:32:30 2002 (PDT)
      To: Brad Choate
      Subject: Re: Java vs. Perl

      Brad,

      Thanks, you were the second person to write, but the first guy couldn't offer an
      optimization. Just ran your one liner on 578 Meg file and it took half the time
      of the java
      .

      %timex perl -ne "print unless /^192\.(9|18|29)\./o || /\.(gif|jpg|css|GIF|JPG|CSS) HTTP/o" < developer.20020916.raw > developer.20020916.perl

      real 52.51
      user 27.28
      sys 6.89

      %java LogParse /usr/netgenesis/logs/developer.20020916.raw /usr/netgenesis/logs/developer.20020916.tmp
      Processing Time: 107 Seconds

      File sizes matched perfectly.

      -rw-rw-r-- 1 hoffie 96187583 Sep 17 15:22 developer.20020916.perl
      -rw-rw-r-- 1 hoffie 584267226 Sep 17 15:05 developer.20020916.raw
      -rw-rw-r-- 1 hoffie 96187583 Sep 17 15:11 developer.20020916.tmp

      The java programmer who wrote the LogParse class wants to try JDK 1.4 with
      regular expressions and the new IO classes to see the result. I'll see what we
      can do to publish a round two of the optimized Perl and the new Java.

      -John
      • The java programmer who wrote the LogParse class wants to try JDK 1.4 with regular expressions and the new IO classes to see the result. I'll see what we can do to publish a round two of the optimized Perl and the new Java.
        Oddly enough, I can't see how that would make it go any faster than Java's non-regex solution. It seems like it would only lose ground!
        --
        • Randal L. Schwartz
        • Stonehenge
      • Those two o's above, as in /.../o, seem quite useless because the regex's are constant. Is there a reason for them?

        --
        /-\
        • I guess not :) Silly me, I thought /o always helped when using the same regex pattern in a loop such as this. And I hadn't thought about specifying the non-capturing syntax, also suggested in this thread. The final result:

          perl -ne "print unless /^192\.(?:9|18|29)\./||/\.(?:gif|jpg|css|GIF|JPG|CSS) HTTP/" < input > output

          The fastest of all so far... any other improvements?

          • Try the following... if you are much more likely to have gif, jpg, css file than local files switch the regexps around and try:

            perl -ne 'print unless /\.(?:gif|jpg|css|GIF|JPG|CSS) HTTP/||/^192\.(?:9|18|29)\./;' input > output

      • Wouldn't an immediate retraction be in order, showing the perl one-liner and java 100-liner side by side with corrected timings? (and a note that the benchmark was designed for the purposes of java advocacy).