Slash Boxes
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
More | Login | Reply
Loading... please wait.
  •      foreach $fileext (@fileext) {
             next LINE if ($_ =~ /$fileext HTTP/);
         foreach $sunIP (@sunIPs) {
             next LINE if ($_ =~ /^$sunIP/);
    Yeah, it's almost always possible to beat bad Perl written by people who don't understand that regexes need to be compiled.
    • Randal L. Schwartz
    • Stonehenge
  • Except that he's not really pattern matching. He's using Java's index-like method. And he's "unrolled" his loops within the read-loop.

    His perl is idiomatic (except for the spurious =~'s) and looks just like any novice would have written it.

    If I had enough data I might take a crack at unrolling it and making it quicker. Like any "benchmark" though, the code can *always* be manipulated to favor one over the other.
  • Simple first pass at it: using qr//:

    @f ileext=("\\.gif","\\.jpg","\\.css","\\.GIF","\\.JPG","\\.CSS");
    $filename="$ARG V[0]";
    open(IN,$filename) || die "cannot open $ARGV[0] for reading: $!";
    open(OUT,">$filename.out") || die "cannot open $filename.out for writing: $!";

    # compile once
    $fileext = join '|', @fileext;
    $fileext = qr/(?:$fileext) HTTP/;
    $sunIPs = join '|', @sunIPs;
    $sunIPs = qr/^(?:$sunIPs)/;

    LINE: while(<IN>) {

  • Well, he's not using Java regexes so it's not a fair comparison. But anyway, I'd like to point out that the second edition of Mastering Regular Expressions is fantastic. It goes into great detail on the new features and relative speeds of the regular expression engines in all the languages, and is generally very cool indeed.
  • my $filename = shift;
    open(IN,$filename) || die "cannot open $filename for reading: $!";
    open(OUT,">$filename.out") || die "cannot open $filename.out for writing: $!";

    while ( <IN> ) {
        next if /\.(gif|jpg|css|GIF|JPG|CSS) HTTP/;
        next if /192\.(9|18|29)\./;
        print OUT;
    Not sure about the execution speed of the regexes, but it's a damn sight easier to read.


    • Tests? Should be /^192 and it should go faster if you use /o on the regexen.

      But, yeah, much easier to read, much faster to write, and much better.

      Hmm. I think I should ask pudge to make <tt> text a different colour.
        ---ict / Spoon
      • it should go faster if you use /o on the regexen.

        No it won't. The /o only applies to regexes that are based on variables, as in:

        my $pattern = "192\.(whatever)";
        if ( $foo =~ /$pattern/o )
        That's the ONLY time that /o applies.


    • Without wishing to wave the golf stick, may I commend
      #!perl -pi.out
      $_ = '' unless /(?:\.(?:gif|jpg|css|GIF|JPG|CSS)[ ]HTTP |
      to the house?
  • What underutilized lacky has enough time to worry about making a program that runs in 283 seconds BUT TAKES 5 MINUTES TO WRITE into a program that runs in 137 seconds BUT TAKES 15-30 minutes to write. If the program could be rewritten so that it runs under 10 seconds (my attention span), THEN the extra effort *might* be worth it. This program is likely to be run from a batch job so that a hyoo-mon isn't likely to be at the terminal waiting for it to finish.

    Java cuts into my beer-drinking time.

  • Uh... (Score:4, Insightful)

    by jhi (318) <> on 2002.09.16 10:45 (#12881) Homepage Journal
    (As pointed out by many, already...)

    (1) The Perl code is really bad. Just replacing the "loop-over-each-line-recompiling-the-regex-each-time" by moving the loop invariant regex to the front of
    the while speeds things up.
    (2) Using qr speeds things up further.
    (3) Moving the sunIPs testing before the fileext
    testing speeds things up further.
    (4) Inlining the 192. and HTTP speeds things up.
    Hey, the Java code inlines those strings.

    And after all that is done, we're still comparing apples and oranges: the Java code doesn't do regular expressions. If someone has the time, they might want to ape precisely what the Java code is doing, using index() and so forth, and then measure that.

    I hope someone will write a polite expose of all the things that are wrong (*) with this article, and both post it to whatever forum/editors, and the author. Mind, be polite, professional, and helpful.

    (*) Let me see...
    (a) comparing apples and oranges
    (b) the Perl code not published in the article
    (c) the Perl code is very bad
    (d) the input data not available

    I won't comment on the Java code itself, I'll leave that to people who do more Java, except that noting that it inlines the filtering data, as opposed to the Perl code which at least has it cleanly separated into variable.