Slash Boxes
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
More | Login | Reply
Loading... please wait.
  • I have no idea what you've been smoking, because there's no way you should be getting that kind of discrepency. Either you're using a very old version of ruby, your interpreter is broken, or you've been sniffing glue again.

    For my results, I used ruby 1.6.7 and perl 5.8.0 on Mandrake 9. I took the sample text you gave in your journal entry and copied it over and over until I ended up with a 2.4 MB file. I used "bzip-0.21" as the target. Hopefully, I didn't screw up the logic.

    I've provided the exact b

    • Oops - those were the benchmarks against the 48k file. Here are the benchmarks against the 2.4mb file:


      djberge>/usr/local/bin/ruby ruby_bench.rb
            user     system      total        real
      original:270.560000   2.740000 273.300000 (273.258903)
      optimized:134.710000   1.890000 136.600000 (136.578120)


      Benchmark: timing 1 iterations of original...
        original: 129 wallclock sec

    • My reply is actually to the original post, not the first reply, but I couldn't find a link to comment on that.

      Anyway, I actually was able to get the benchmark on the Perl test significantly lower by doing 2 things:
         I precompiled the regexp
         I joined the relevant search fields using an (assumedly) unused char (^A) and searched on that.

      On my box that put the average from around 37 secs. to around 26 secs. (Using djberg96's benchmark version of the script).

      use Benchmark;
      use strict;
      • Yep. Precompiling the regex and joining the fields to be searched shaved a couple of seconds off the Perl script.

    • Since the 'optimized' ruby code doesn't short-circuit testing the rest of the fields on a successful match, you could make the perl a little more perlish also:
      $count++ if grep m{$target}, (split /\|/) [0,3,6..8]
      Or use List::Util::first instead of grep (though it may only be an improvement on bigger arrays).

      I'm using perl 5.6.1 and ruby 1.6.8 and getting ruby about twice as slow as perl.

      • You're right - I forgot to short circuit. I'm not sure how that helps Ruby's case, though. Add a "break" after the "count += 1" line. It didn't seem to improve performance significantly for me, though.
        • The match only occurs in 1 out of every 6 lines (using your target and the sample data in the top post here), so you'd only see at most about a 16% benefit (if that). If the match occurred early in the string on more lines, there might be more benefit.

          I just installed Ruby today, and have been poking through online docs earlier, and couldn't find a 'break' or 'last' statement. Is there such a thing? The best I could come up with was throwing an exception and catching it outside that loop. I still need to g

          • I still need to get a Ruby book...

            Visit rubycentral [] or ruby-doc [].

            The first link is an online version of Programming Ruby, aka "The Pickaxe". You can still buy that book at the store, if you prefer paper.

            • I had looked through the online book, but couldn't find a 'break' at first. I finally did find 'break', 'next', and 'redo' in the Expressions section; I had been looking in the Iterators section.

              I was poking through the bookstore and the only Ruby book there was Sam's "Learn Ruby in 21 days". I can't recommend it, as it had no mention of 'break', 'next', or 'redo', nor the IO.foreach method in your example (and it was a thick book).

    • OK. Tried your version of the Ruby script. My ruby is version 1.6.8 on an Athlon 500mHz system running FreeBSD 4.7-STABLE. Used the /usr/ports/INDEX file that the sample data I originally posted came from; it's 3MB in size. However, I didn't use any language specific benchmark modules; I wanted to compare apples to apples. Anyway, here's the results:

      [ayeka:~/portfinder] buck> repeat 5 time ruby pftest2.rb ruby
      68.394u 2.119s 1:10.56 99.9% 4+1346k 0+0io 0pf+0w
      69.770u 2.258s 1:12.08 99.9% 4+1346k 0+0
      • Ruby:

        127.33user 2.35system 2:10.31elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (222major+301minor)pagefaults 0swaps


        126.28user 1.74system 2:08.32elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (374major+164minor)pagefaults 0swaps

        Perhaps it's a NetBSD issue? Seems unlikely, but based on the results you're getting versus what I'm getting, I'd consider it a possibility at least. At least I cut it down to x2 instead of x4!

        Please consider posting to the mail

        • Perhaps it's a NetBSD issue? (Buck: FreeBSD even :) ) Seems unlikely, but based on the results you're getting versus what I'm getting, I'd consider it a possibility at least. At least I cut it down to x2 instead of x4!

          Please consider posting to the mailing list with this info (
          I'd like to try these on my TiBook with OSX 10.2.3 first, though I don't expect much of a change. Is the mailing list archived somewhere where I can research before posting anything?

          By the way,

          • You can find the archives at

            There's a gateway between the mailing list and comp.lang.ruby, so you can search via deja (or your local news serve) and get everything from the mailing list that way.

            FreeBSD even :) - Oops. Probably not the first time I made that mistake. Probably won't be the last. :-P