Slash Boxes
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

rurban (7989)

  reversethis-{ta.yar-x} {ta} {nabrur}

cygwin maintainer for perl, parrot, clisp, postgresql, ... and some perl modules (perl-libwin32, perl-Win32-GUI). Has also some hairy CPAN packages: B::C, B::Generate, C::DynaLib, B::Debugger ...

Journal of rurban (7989)

Sunday September 21, 2008
03:10 PM

oplines - win-win memory AND speed

[ #37504 ]

I've already wrote that some time ago, forgot where, probably p5p, but got no responses.
Today I tried it again on irc #p5p and ended writing a simple statistic script and the beginning of the OPLINES branch.

The patch is working but the tests results look irreal. Uploaded to

My 1st testscript is poor and fails mostly (, but I will fix it and run over more files to get better stats.
My 2nd is better:

#! perl


=head1 NAME

oplines - ops per line + nextstate win stats for TRY_OPLINES patch




Move cop_line from COP to BASEOP, and reduce the need for nextstate
ops, which will be an overall win in memory and speed for typical
undense code, less than 4 ops per line.

A cop has 5 ptrs more than a BASEOP, so the memory win will be like
    4 ops/line avg.
    90% nextstate COP win per lines

=> on 32bit: 4*4=16 byte per line. for 10k src => 200-160k=40k memory win.
      + 4k runtime win (need less nextstate cops)

      on 64bit you try.

The unknown factors:
    a) typical # of ops per line
    b) nextstate win:
        * typical # of nextstate cops per 1000-line file.
            minus # of really needed nextstate cops (lexcops) per 1000-line file.

2008-09-21 21:41:06 rurban


use Config;
use lib ".";

my ($sumfiles, $sumlines, $sumops, $sumnextstates, $sumlexstates);
my ($files, $lines, $ops, $nextstates, $lexstates);

open PM, ">";
while () { print PM "$_"; };
close PM;

for my $file (@ARGV) {
    $s = `$^X -c -MB_Stats $file`;
    my @s = split /\t/, $s;
    if (@s > 4 and $s[0] =~ /\d+/) {
            ($files, $lines, $ops, $nextstates, $lexstates) = @s;
            $sumfiles += $files;
            $sumlines += $lines;
            $sumops += $ops;
            $sumnextstates += $nextstates;
            $sumlexstates += $lexstates;
print "files: $sumfiles\n";
print "lines: $sumlines\n";
print "ops: $sumops\n";
my $opsratio = $sumlines ? $sumops/$sumlines : 0;
my $copratio = $sumnextstates/($sumfiles+$sumlexstates);
print "ops/line: ",sprintf("%0.2f",$opsratio),"\n";
print "cops: ",sprintf("%0.2f%",$copratio), " (lex+filecops=",
            $sumfiles+$sumlexstates," / nextstates=$sumnextstates)\n";
my $runtimewin = $sumnextstates - ($sumfiles+$sumlexstates);
my $opsize = 3*$Config{ptrsize}+4+$Config{intsize};
my $copsize = $opsize + 4*$Config{ptrsize} + 8;
my $memwin = ($Config{ptrsize} * $copsize * $runtimewin) # win the cops
    - ($sumops * $Config{intsize}); # minus the added line_t cop_line
print "memory win: $memwin byte (",
    ($Config{ptrsize} * $copsize * $runtimewin)," - ",($sumops * $Config{intsize}),")\n";
print "runtime win: $runtimewin ops ",sprintf("%0.2f%",($runtimewin*100/$sumops))," ($sumnextstates - ",$sumfiles+$sumlexstates,") \n";

use B::Utils qw(walkallops_simple);
use B qw(OPf_PARENS);
my ($files, $lines, $ops, $nextstates, $lexstates);

sub count_ops {
        my $op = shift;
        $ops++; # count also null ops
        if ($op->isa('B::COP')) {
        $lexstates++ if ($$op and (($op->flags != 1)
                                    or $op->label));

    ($files, $lines, $ops, $nextstates, $lexstates) = (0,0,0,0,0);
    ($oldfile, $oldlines) = ("",0);
    $files = scalar keys %INC;
    for (values %INC) {
            open IN, ") { $lines++; }; close IN;
    print "$files\t$lines\t$ops\t$nextstates\t$lexstates\n";

So how is the practice?

As it looks like an avg sample has 0.5-2 ops/line (but pod needs to be skipped), and about 100-200 times more pure linecops than really needed cops (block entry, new files). So the win seems to be dramatic (8% speed win for the op traversal). But I have to inspect the cops more firmly now.

./oplines *.pl
files: 245
lines: 85756
ops: 59263
ops/line: 0.69
cops: 11.23% (lex+filecops=494 / nextstates=5549)
memory win: 652628 byte (889680 - 237052)
runtime win: 5055 ops 8.53% (5549 - 494)

$ ./ $(find lib ext -name \*.pm)
files: 12051
lines: 5042732
ops: 6629648
ops/line: 1.31
cops: 39.15% (lex+filecops=15334 / nextstates=600266)
memory win: 76429440 byte (102948032 - 26518592)
runtime win: 584932 ops 8.82% (600266 - 15334)


The parser emits lots of nextstate cops to track linenumbers: These cops can be omitted and the lines stored in the current op PL_op, not PL_curcop anymore.
The parser can be simplified a lot for the PL_curcop cases.
Also find_cops() is not needed anymore in most cases, where just the line # is needed, but it is still needed for getting the current filename.

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
More | Login | Reply
Loading... please wait.
  • What sort of relative wall-clock speed win is this going to result in for typical non-io-bound code?

    0.01% ?

    0.1% ?

    1% ?

    • It's ten percent fewer ops, but the nextstate op is reasonably slim on its own -- about the same size as the and and or ops, slightly smaller than enter and leave, and significantly smaller than everything else in pp_hot.c. By rough estimate, without accounting for memory use, cache flushing, and other external features, a 3% improvement would thrill me.

  • Someone put this on the corehackers wiki []; any interest in creating a GitHub branch to continue this work?

    • I believe it was in my github perl as oplines branch, but got lost.

      Only my local-op branch is there.