NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.
All the Perl that's Practical to Extract and Report
Stories, comments, journals, and other submissions on use Perl; are Copyright 1998-2006, their respective owners.
Made my day (Score:1)
Seriously folks, Java is a nice language and all, but why not use the right tool for the job? As demonstrated most effectively by Professor Hoffman, it's quite cumbersome to parse text files using Java. Now with Perl, you can do something like this:
perl -ne "print unless /^192\.(9|18|29)\./o||/\.(gif|jpg|css|GIF|JPG|CSS) HTTP/o" < access-log > clean-logHeck, you could even make it your .sig. Not to mention that it runs faster than the Java version. The regex solution is even a tad bit faster tha
Hoffmann responds... (Score:2, Informative)
From: John Hoffmann
Date: Tue Sep 17 15:32:30 2002 (PDT)
To: Brad Choate
Subject: Re: Java vs. Perl
Brad,
Thanks, you were the second person to write, but the first guy couldn't offer an
optimization. Just ran your one liner on 578 Meg file and it took half the time
of the java.
%timex perl -ne "print unless
real 52.51
user 27.28
sys 6.89
%java LogParse
Processing Time: 107 Seconds
File sizes matched perfectly.
-rw-rw-r-- 1 hoffie 96187583 Sep 17 15:22 developer.20020916.perl
-rw-rw-r-- 1 hoffie 584267226 Sep 17 15:05 developer.20020916.raw
-rw-rw-r-- 1 hoffie 96187583 Sep 17 15:11 developer.20020916.tmp
The java programmer who wrote the LogParse class wants to try JDK 1.4 with
regular expressions and the new IO classes to see the result. I'll see what we
can do to publish a round two of the optimized Perl and the new Java.
-John
Reply to This
Parent
Re:Hoffmann responds... (Score:2)
Re:Hoffmann responds... (Score:1)
Those two o's above, as in /.../o, seem quite useless because the regex's are constant. Is there a reason for them?
/-\
Re:Hoffmann responds... (Score:1)
I guess not :) Silly me, I thought /o always helped when using the same regex pattern in a loop such as this. And I hadn't thought about specifying the non-capturing syntax, also suggested in this thread. The final result:
perl -ne "print unless /^192\.(?:9|18|29)\./||/\.(?:gif|jpg|css|GIF|JPG|CSS) HTTP/" < input > outputThe fastest of all so far... any other improvements?
Re:Hoffmann responds... (Score:1)
Try the following... if you are much more likely to have gif, jpg, css file than local files switch the regexps around and try:
perl -ne 'print unless /\.(?:gif|jpg|css|GIF|JPG|CSS) HTTP/||/^192\.(?:9|18|29)\./;' input > output
Re:Hoffmann responds... (Score:1)