use Perl Log In
Java vs. Perl
It seems the older Perl gets, the more willing people are to believe that it sucks, without any reasonable facts. davorg writes "You may have seen the article Can Java technology beat Perl on its home turf with pattern matching in large files? that there has been some debate about on both #perl and comp.lang.perl.misc today. One of the biggest criticisms of the article was that the author hasn't published the Perl code that he is comparing his Java with."
"I emailed the author (found his email address thru a Google search) and pointed out the unfairness of this comparison. With half an hour I got a reply from him including the Perl code. So here it is. Feel free to optimise it."
#!/home/hoffie/bin/perl
@sunIPs=("192\\.9\\.","192\\.18\\.","192\\.29\\.");
@f ileext=("\\.gif","\\.jpg","\\.css","\\.GIF","\\.JPG","\\.CSS");
$filename="$ARG V[0]";
open(IN,$filename) || die "cannot open $ARGV[0] for reading: $!";
open(OUT,">$filename.out") || die "cannot open $filename.out for writing: $!";
LINE: while(<IN>) {
foreach $fileext (@fileext) {
next LINE if ($_ =~/$fileext HTTP/);
}
foreach $sunIP (@sunIPs) {
next LINE if ($_ =~/^$sunIP/);
}
print OUT;
}
Java vs. Perl
More |
Login
| Reply
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.

loc (Score:1)
I think it would have been less deceptive if the Java code used regular expressions making it more of a fair test.
Re:loc (Score:1)
Re:loc (Score:1)
Awful coding in both examples.
For each potential pattern, he's doing a separate check, be it with indexOf(), or with the regex.
At least in the perl example, the patterns to be skipped are all up at the front of the program, and adding a new exclusion is just a matter of pushing to the arrays.
[code]
my @sunIPs = qw(192\.9\. 192\.18\. 192\.29\.);
my @fileext = qw(\.gif \.GIF \.jpg \.JPG \.css \.CSS);
my $filename = shift;
open...yada..yada...
my $pattern = join('|',@fileext) . "|" . join('|',@sunIPs);
Re:loc (Score:1)
My Kingdom for an edit button...
Re:loc (Score:1)
Make that one line for Perl :-)
perl -ne'/(?i:gif|jpg|css) HTTP/|/^192\.(9|18|29)\./||print' filename/-\
Right Tool For The Job (Score:1)
Results of my benchmark:
crappie Hoffie perl: 106.4 seconds
reasonably optimal perl: 13.7 seconds
egrep -vi -f hoffie.egrep: 1.1 seconds
where hoffie.egrep contains:
(^(192\.9|192\.18|192\.27))|((\.gif|\.jpg|\.css) HTTP)
The test data was a file of 1,200,000 lines, of which about half hit the regex.
Hypothesis: In any problem where a grep solution is signif
Re:Right Tool For The Job (Score:1)
For cheap thrills, I started a golf thread: Golf thread [develooper.com] which includes both a gawk and egrep version. The egrep version was "only" three times faster than the Perl version. :-( To write a 100-line Java program
to solve such a trivial problem seems to me like killing an ant with a sledgehammer.
egrep -v '\.(gif|GIF|jpg|JPG|css|CSS) HTTP|^192\.(9|18|29)\.' inf >egawk '!/\.(gif|GIF|jpg|JPG|css|CSS) HTTP|^192\.(9|18|29)\./{print}' inf >a
perl -ne'/^192\.(?:9|18|29)\./||/\.(?:gif|GIF|jpg|JPG|css|CSS) HTT
/-\
Code should be obvious from problem description (Score:2)
Why should he publish the code; it's not like there's more than one way to do it in Perl or anything...
J. David works really hard, has a passion for writing good software, and knows many of the world's best Perl programmers
Re:Code should be obvious from problem description (Score:1)
Because he is saying that the Java code took this many seconds and the Perl code took this many seconds more. And he is publishing the Java code. In order to be able to reproduce the results, his Perl code has to be made available, too.
Re:Code should be obvious from problem description (Score:1)
Um, yes. .... It's a joke, you see. :)
(There's always more than one way to do it in Perl.)
J. David works really hard, has a passion for writing good software, and knows many of the world's best Perl programmers
Uhmmm. the java code isn't using regex's (Score:1)
This kind of think irks me. The author didn't even try to compare apples to apples. He compared fixed string indexing to perl regexes. Furhter, the code structure was fundementally different.
What a joke.
It doesn't matter if the Java is using regexes (Score:2)
"the equivalent code in perl should not be using Perl Regular expressions, instead using index()."
What I think you were saying is that the Java code should have been using regexes, but if you were saying that without regexes (a relatively recent addition to Java), Perl's should have also been excluded, then I would have to disagree. If I am going to compare, say, the performance of C and Java, I can't argue that Java isn't allowed to use OO features because C lacks them. If I use both Perl and Prolog to
Re:It doesn't matter if the Java is using regexes (Score:1)
Precompile regexes (Score:1)
First thing to say is that the author is comparing substring matches with regex matches. Someone already posted code to convert the Perl version to substring matches.
Second, this code:
@fileext=("\\.gif","\\.jpg","\\.css","\\.GIF","\\.JPG","\\.CSS");
... /$fileext HTTP/);foreach $fileext (@fileext) {
next LINE if ($_ =~
}
recompiles the regex every time it's evaluated. Something like this is better, methinks
Re:Precompile regexes (Score:2)
and don't forget the 'o' flag on there
-matt
Re:Precompile regexes (Score:1)
a reply? (Score:2)
From the technical viewpoints:
(1) Perl code not published
(2) input data not published
(3) comparing apples to oranges
(4) the Perl code is very slow for
Benchmarks (Score:1)
freddo [netfirms.com]
Apples and Oranges (Score:1)
For speed I'm offering (untested, just jotted down quickly):
#/usr/bin/perl -n
for $e (qw/gif jpg css GIF JPG CSS/) {
next if index($_, "$e HTTP") != -1
}
print, next if substr($_, 0, 4) eq '192.';
next if substr($_, 4, 2) eq '9.';
next if substr($_, 4, 3) eq '1
Re:Apples and Oranges (Score:1)
> next if index($_, "$e HTTP") != -1
next LINE if index($_, "$e HTTP") != -1
and
> print, next if substr($_, 0, 4) eq '192.';
print, next if substr($_, 0, 4) ne '192.';
oh, for an edit interface... but you get the idea.
marcel
Lines of code? (Score:1)
while () { print unless
Of course, you could match his argument conventions precisely, but why bother? This form is the normal Perl way to do it, and the author's Perl and Java arguments were already different.
I haven't benchmarked this one-liner, but I bet it's faster than the author's Perl version, and likely faster than the Java code as well. It might be a
Deven
"Simple things should be simple, and complex things should be possible." - Alan Kay
Re:Lines of code? (Score:1)
As someone said somewhere (petdance iirc), when making optimized solutions, test. It's something a lot of people seem to not be doing in this thread (either here or in davorg's journal). If you're going to make it more efficient, you might as well make it produce the same results.
At work, I produced a shiny new version of a previous routine. I couldn't really benchmark them though: the previous version processed much less data due to a bug in its impleme
---ict / Spoon
Made my day (Score:1)
Seriously folks, Java is a nice language and all, but why not use the right tool for the job? As demonstrated most effectively by Professor Hoffman, it's quite cumbersome to parse text files using Java. Now with Perl, you can do something like this:
perl -ne "print unless /^192\.(9|18|29)\./o||/\.(gif|jpg|css|GIF|JPG|CSS) HTTP/o" < access-log > clean-logHeck, you could even make it your .sig. Not to mention that it runs faster than the Java version. The regex solution is even a tad bit faster tha
Hoffmann responds... (Score:2, Informative)
From: John Hoffmann
Date: Tue Sep 17 15:32:30 2002 (PDT)
To: Brad Choate
Subject: Re: Java vs. Perl
Brad,
Thanks, you were the second person to write, but the first guy couldn't offer an
optimization. Just ran your one liner on 578 Meg file and it took half the time
of the java.
%timex perl -ne "print unless
Re:Hoffmann responds... (Score:2)