NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.
All the Perl that's Practical to Extract and Report
Stories, comments, journals, and other submissions on use Perl; are Copyright 1998-2006, their respective owners.
Made my day (Score:1)
Seriously folks, Java is a nice language and all, but why not use the right tool for the job? As demonstrated most effectively by Professor Hoffman, it's quite cumbersome to parse text files using Java. Now with Perl, you can do something like this:
perl -ne "print unless /^192\.(9|18|29)\./o||/\.(gif|jpg|css|GIF|JPG|CSS) HTTP/o" < access-log > clean-logHeck, you could even make it your .sig. Not to mention that it runs faster than the Java version. The regex solution is even a tad bit faster than testing individual values using the index function. Go figure.
Those Sun engineers should find better things to do with their time.
Reply to This
Hoffmann responds... (Score:2, Informative)
From: John Hoffmann
Date: Tue Sep 17 15:32:30 2002 (PDT)
To: Brad Choate
Subject: Re: Java vs. Perl
Brad,
Thanks, you were the second person to write, but the first guy couldn't offer an
optimization. Just ran your one liner on 578 Meg file and it took half the time
of the java.
%timex perl -ne "print unless
Re:Hoffmann responds... (Score:2)
Re:Hoffmann responds... (Score:1)
Those two o's above, as in /.../o, seem quite useless because the regex's are constant. Is there a reason for them?
/-\
Re:Hoffmann responds... (Score:1)
I guess not :) Silly me, I thought /o always helped when using the same regex pattern in a loop such as this. And I hadn't thought about specifying the non-capturing syntax, also suggested in this thread. The final result:
perl -ne "print unless /^192\.(?:9|18|29)\./||/\.(?:gif|jpg|css|GIF|JPG|CSS) HTTP/" < input > outputThe fastest of all so far... any other improvements?
Re:Hoffmann responds... (Score:1)
Try the following... if you are much more likely to have gif, jpg, css file than local files switch the regexps around and try:
perl -ne 'print unless /\.(?:gif|jpg|css|GIF|JPG|CSS) HTTP/||/^192\.(?:9|18|29)\./;' input > output
Re:Hoffmann responds... (Score:1)
Re:Made my day (Score:1)
It is faster still if you use non-capturing parens. i.e. change (9|18|29) to (?:9|18|29), ditto for the parens around gif|jpg etc. And the 'o' modifier should be removed.
/-\
Re:Made my day (Score:1)