Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

Journal of jjore (6662)

Friday July 03, 2009
04:00 PM

Perl 5.10.0, (not-so) giaganto-times faster than Ruby 1.8.6

[ #39223 ]

Earlier today I wanted to edit a lot of source code to add some Emacs hints about the tab stop. I originally wrote it in perl and it works fine, takes two seconds to edit 150MB of text across 4851 files.

time find . -type f -print0 | xargs -0 /opt/perl-5.10.0/bin/perl tab-width
 
real        0m1.966s
user        0m1.340s
sys         0m0.354s

Here's the Perl 5:

undef $/;
my $RX = qr/
    (?<start>
        ^(?<indent>.*)Local\ variables:.*\n
    )
    (?<variables>
        (?:
            \k<indent>.*\n
        )*
    )
    (?<end>
        \k<indent>End:
    )
/mix;
 
for my $fn ( @ARGV ) {
    next if $fn =~ m{/.git};
 
    open my $fh, '+<', $fn
        or die "Can't open $fn: $!";
    binmode $fh
        or die $!;
 
    my $src = <$fh>;
    next unless $src =~ s{$RX}{$+{start}$+{variables}$+{indent}tab-width: 8\n$+{end}};
 
    seek $fh, 0, 0
        or die $!;
    truncate $fh, 0
        or die $!;
    print { $fh } $src
        or die $!;
    close $fh
        or die "Can't close $fh: $!";
}

I need to know Ruby better so I wrote a Ruby version of the same program. I don't know how long it takes to run. I started it awhile ago. In the time it's taking Ruby to run this, I got up to have some Pizza and Beer, then wrote this blog post.

time find . -type f -print0 | xargs -0 /opt/ruby-1.8.6/bin/ruby tab-width.rb
 
real    31m19.763s
user    18m7.957s
sys    12m5.596s

Here's the Ruby:

RX = Regexp.new(
    "
    (
        ^(.*)Local\\ variables:.*\\n
    )
    (
        (
            \\2.*\\n
        )*
    )
    (
        \\2End:
    )",
    Regexp::EXTENDED | Regexp::MULTILINE
)
 
ARGV.each do |fn|
    next if fn =~ %r{/\.git}
 
    open( fn, 'r+' ) do |fh|
        fh.binmode
 
        src = fh.read
        next unless src.sub!( RX ) {|m| "#{$1}#{$3}#{$2}tab-width: 8\n#{$5}" }
 
        fh.seek 0, IO::SEEK_SET
        fh.truncate 0
        fh.puts src
    end
end

Update: It's been over half an hour now.
Update: aha, done now
Update: got the .sub!() wrong. Fixed. Presumably I'm waiting another 30m. Going for a bike ride instead of waiting.
Update: didn't go biking. Asked #ruby-lang for help. (?>) and (?!) to control the ruby regexp engine seem to just trigger stack overflows
Update: this next big chunk.

So I talked to helpful people on #ruby-lang like larsch, rpag, and Aria. It turns out one major mistake was mis-using Regexp::MULTILINE. It's equivalent to perl's /s, not perl's /m. The docs don't actually talk about this so I had to guess. I guessed wrong.

Now I find Ruby 1.8.6 is only 3x slower while 1.9 is equivalent to Perl 5.10.0

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.