Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

rurban (7989)

rurban
  {rurban} {at} {x-ray.at}
http://rurban.xarch.at/

cygwin maintainer for perl, parrot, clisp, postgresql, ... and some perl modules (perl-libwin32, perl-Win32-GUI). Has also some hairy CPAN packages: B::C, B::Generate, C::DynaLib, B::Debugger ...

Journal of rurban (7989)

Sunday September 21, 2008
03:10 PM

oplines - win-win memory AND speed

I've already wrote that some time ago, forgot where, probably p5p, but got no responses.
Today I tried it again on irc #p5p and ended writing a simple statistic script and the beginning of the OPLINES branch.

The patch is working but the tests results look irreal. Uploaded to http://rurban.xarch.at/software/perl/oplines1.tar.gz

My 1st testscript is poor and fails mostly (http://pasta.test-smoke.org/50, but I will fix it and run over more files to get better stats.
My 2nd is better:


#! perl

=pod

=head1 NAME

oplines - ops per line + nextstate win stats for TRY_OPLINES patch

=head1 SYNOPSIS

    oplines.pl

=head1 DESCRIPTION

Theory:

Move cop_line from COP to BASEOP, and reduce the need for nextstate
ops, which will be an overall win in memory and speed for typical
undense code, less than 4 ops per line.

A cop has 5 ptrs more than a BASEOP, so the memory win will be like
following:
    4 ops/line avg.
    90% nextstate COP win per lines

=> on 32bit: 4*4=16 byte per line. for 10k src => 200-160k=40k memory win.
      + 4k runtime win (need less nextstate cops)

      on 64bit you try.

The unknown factors:
    a) typical # of ops per line
    b) nextstate win:
        * typical # of nextstate cops per 1000-line file.
            minus # of really needed nextstate cops (lexcops) per 1000-line file.

2008-09-21 21:41:06 rurban

=cut

use Config;
use lib ".";

my ($sumfiles, $sumlines, $sumops, $sumnextstates, $sumlexstates);
my ($files, $lines, $ops, $nextstates, $lexstates);

open PM, "> B_Stats.pm";
while () { print PM "$_"; };
close PM;

for my $file (@ARGV) {
    $s = `$^X -c -MB_Stats $file`;
    my @s = split /\t/, $s;
    if (@s > 4 and $s[0] =~ /\d+/) {
            ($files, $lines, $ops, $nextstates, $lexstates) = @s;
            $sumfiles += $files;
            $sumlines += $lines;
            $sumops += $ops;
            $sumnextstates += $nextstates;
            $sumlexstates += $lexstates;
    }
}
print "files: $sumfiles\n";
print "lines: $sumlines\n";
print "ops: $sumops\n";
my $opsratio = $sumlines ? $sumops/$sumlines : 0;
my $copratio = $sumnextstates/($sumfiles+$sumlexstates);
print "ops/line: ",sprintf("%0.2f",$opsratio),"\n";
print "cops: ",sprintf("%0.2f%",$copratio), " (lex+filecops=",
            $sumfiles+$sumlexstates," / nextstates=$sumnextstates)\n";
my $runtimewin = $sumnextstates - ($sumfiles+$sumlexstates);
my $opsize = 3*$Config{ptrsize}+4+$Config{intsize};
my $copsize = $opsize + 4*$Config{ptrsize} + 8;
my $memwin = ($Config{ptrsize} * $copsize * $runtimewin) # win the cops
    - ($sumops * $Config{intsize}); # minus the added line_t cop_line
print "memory win: $memwin byte (",
    ($Config{ptrsize} * $copsize * $runtimewin)," - ",($sumops * $Config{intsize}),")\n";
print "runtime win: $runtimewin ops ",sprintf("%0.2f%",($runtimewin*100/$sumops))," ($sumnextstates - ",$sumfiles+$sumlexstates,") \n";

__DATA__
use B::Utils qw(walkallops_simple);
use B qw(OPf_PARENS);
my ($files, $lines, $ops, $nextstates, $lexstates);

sub count_ops {
        my $op = shift;
        $ops++; # count also null ops
        if ($op->isa('B::COP')) {
                $nextstates++;
        $lexstates++ if ($$op and (($op->flags != 1)
                                    or $op->label));
        }
}

CHECK {
    ($files, $lines, $ops, $nextstates, $lexstates) = (0,0,0,0,0);
    ($oldfile, $oldlines) = ("",0);
    walkallops_simple(\&count_ops);
    $files = scalar keys %INC;
    for (values %INC) {
            open IN, ") { $lines++; }; close IN;
    }
    print "$files\t$lines\t$ops\t$nextstates\t$lexstates\n";
}
1;

So how is the practice?

As it looks like an avg sample has 0.5-2 ops/line (but pod needs to be skipped), and about 100-200 times more pure linecops than really needed cops (block entry, new files). So the win seems to be dramatic (8% speed win for the op traversal). But I have to inspect the cops more firmly now.

./oplines *.pl
files: 245
lines: 85756
ops: 59263
ops/line: 0.69
cops: 11.23% (lex+filecops=494 / nextstates=5549)
memory win: 652628 byte (889680 - 237052)
runtime win: 5055 ops 8.53% (5549 - 494)


$ ./oplines.pl $(find lib ext -name \*.pm)
files: 12051
lines: 5042732
ops: 6629648
ops/line: 1.31
cops: 39.15% (lex+filecops=15334 / nextstates=600266)
memory win: 76429440 byte (102948032 - 26518592)
runtime win: 584932 ops 8.82% (600266 - 15334)

IMPLEMENTATION

The parser emits lots of nextstate cops to track linenumbers: These cops can be omitted and the lines stored in the current op PL_op, not PL_curcop anymore.
The parser can be simplified a lot for the PL_curcop cases.
Also find_cops() is not needed anymore in most cases, where just the line # is needed, but it is still needed for getting the current filename.

Sunday September 14, 2008
10:35 AM

Some more parrot scripts - fix svn ps

A one-liner which fixes wrong svn properties


perl t/distro/file_metadata.t 2| \
    perl -ne'system (substr($_,3)) if /^#\s+svn ps /'

05:50 AM

Some more parrot test scripts - remake

remake: (this version for trunk)

#!/bin/sh
args=$*
if [ "${args:0:3}" = "all" ]; then
    # make all parrot_utils perl6.exe languages installable test codetest
    args="all parrot_utils perl6.exe languages installable ${args:3}"
fi
if test -f Makefile; then
    if $(grep reconfig Makefile >/dev/null); then
        make reconfig $args
    else
        make clean realclean && perl Configure.pl && make $args
    fi
else
    perl Configure.pl && make $args
fi

remake for cygwin070patches, where it is easier


#!/bin/sh
if test -f Makefile; then
    make reconfig $*
else
    perl Configure.pl && make $*
fi

05:46 AM

Some parrot test scripts - testvm

testvm:


#!/bin/bash
# Test a parrot branch on some of my remote machines (vm's or whatever)
# must be started in the root build_dir of the branch

declare -a vm_name
declare -a vm_dir

# which branch? trunk cygwin070patches gsoc_pdd09 exceptionmagic ...
base=$(basename `pwd`)

# define my various vm's by name and dir with the parrot tree
n=0
# freebsd7 with gcc-4.2 and llvm-2.3
vm_name[$n]=freebsd
# on freebsd I test only one branch: trunk or cygwin070patches or whatever
vm_dir[$n]=/usr/src/perl/parrot
let n+=1
# define llvm conf_args: cc and link
vm_name[$n]=freebsd
vm_dir[$n]=/usr/src/perl/parrot
vm_conf[$n]="--cc=llvm-gcc --link=llvm-ld"
let n+=1
# on debian 4 I test trunk and cygwin070patches
# gcc-4.1.2
vm_name[$n]=debian
vm_dir[$n]=/usr/src/perl/parrot/$base
let n+=1
#vm_name[$n]=gentoo-vm
#vm_dir[$n]=/usr/src/perl/parrot
#let n+=1
#vm_name[$n]=fedora
#vm_dir[$n]=/usr/src/perl/parrot
#let n+=1
#vm_name[$n]=ubuntu
#vm_dir[$n]=/usr/src/perl/parrot
#let n+=1
#vm_name[$n]=centos
#vm_dir[$n]=/usr/src/perl/parrot
#let n+=1
#vm_name[$n]=solaris
#vm_dir[$n]=/usr/src/perl/parrot
#let n+=1

if [ ! -f Configure.pl ]; then
        echo "$0 must be run a parrot build_dir. Configure.pl not found"
        exit
fi
if [ -f Makefile ]; then
        make clean realclean
fi
find -name \*.exe -o -name \*.bak -o -name \*~ -o -name \*.stackdump -delete

n=0
while [ -n "${vm_name[${n}]}" ]
do
        if [ -z "${1}" -o "${1}" = "${vm_name[${n}]}" ]; then
                echo "rsync -avzC --delete --exclude=.svn . ${vm_name[${n}]}:${vm_dir[${n}]}/"
                rsync -avzC --delete --exclude=.svn . "${vm_name[${n}]}:${vm_dir[${n}]}/"

                echo "ssh ${vm_name[${n}]} cd ${vm_dir[${n}]}; perl Configure.pl ${vm_conf[${n}]} && make all parrot_utils perl6 installable languages smoke smolder_test languages-smoke"
                ssh ${vm_name[${n}]} "cd ${vm_dir[${n}]}; perl Configure.pl && make all parrot_utils perl6 installable languages smoke smolder_test languages-smoke"
        fi

        let n+=1
done

Sunday September 07, 2008
04:16 AM

More parrot languages - java

Working over the http://svn.perl.org/viewvc/parrot/branches/cygwin070patches/docs/pdds/draft/pdd3 0_install.pod?view=markup plan to make parrot and its languages installable (and do a make with an already installed parrot), I fixed and tested all of the included languages. Looking deeper at dotnet, which converts a .NET .exe or .dll assembly to a parrot library (pir or pbc), I saw the similarities to java.
See http://www.jnthn.net/papers/2006-cam-net2pir-dissertation.pdf for Jonathan Worthington's paper describing it.

So I thought, why not try to rewrite (i.e. copy & paste + tags-query-replace) dotnet to jvm.

Both bytecodes look very similar, the .NET bytecode has a few extra specialities, both can be converted from the stack-based vm to a register vm via some perl5 SRM compiler, which is currently used in dotnet and WMLScript.

Sun's Hotspot compiler source which is available at http://openjdk.java.net/ shows that Sun took a similar path with the bytecode table description. In the perl5 bytecode compiler we have an opcode table with references to c and perl code for special ops and types (http://code.google.com/p/perl-compiler/source/browse/trunk/bytecode.pl.
In parrot we have a simple ini-style list of ops, with arguments and return type description in the target format (which is PIR) and some simple source template to expand the intermediate stack and temp. locations. http://svn.perl.org/viewvc/parrot/trunk/languages/dotnet/src/translation.rules?v iew=markup
With Hotspot Sun invented an adl format ("Architecture Description Language") to describe the ops. This also has a cost attribute for each op which enables an optimizing compiler, if static or JIT. See "hotspot\src\share\vm\adlc\Doc\Syntax.doc" and
"hotspot\src\cpu\i486\vm\i486.ad"

With a class2pbc (JVM to Parrot) converter we could use all the existing java libraries out there.
However, Jonathan's net2pbc dotnet converter currently works only for about 50% of the .NET assemblies.

Currenly with jvm I am stuck at opcode "iinc" 0x84 which increments the local integer variable on the current thread-local frame within the stack-based vm ("increment an int lexical"). Our SRM "compiler" takes stack arguments and converts it to our registers.
However I'm not sure how it deals with stack temporaries, so-called stack frame variables. And besides those stack frame vars the jvm also uses temporary int variables heavily, which are usually stored in registers if possible.

Note that usually closures and class methods store their lexical vars on the C stack right above its code, so that a return to an uplevel function/method automatically cleans up the stack with its code and vars, which is much faster than the perl5 pad layout, where the lexical vars are kept in seperate arrays.
It could be that the java vm keeps its stack frame lexicals on the so-called "C stack" as C and lisp do it, or on the heap as perl5 does it with its PAD arrays.

Saturday August 23, 2008
11:56 AM

cygwin parrot-0.7.0-1 released

After several days of hacking and testing I could release the cygwin package for parrot-0.7.0 with a lot of patches:
See http://code.google.com/p/cygwin-rurban/source/browse/trunk/release/parrot/

Problems:
The new exception code only affected dotnet, which I fixed with #58176-dotnet-exceptions.patch

Most problems came from languages which were not included in my huge #56554-make-install-lang.patch yet. make languages installable and add the actions to their makefiles.
I finished this treatment now for all langs, just a few are still misbehaving and I gave up on m4, pipp, tcl, pheme and forth.

I described it at http://www.perlfoundation.org/parrot/index.cgi?parrot_installation

Interesting is

$ parrot-forth
Could not find non-existent sub _config
current instr.: ' init' pc 944 (forth.pir:11)
called from Sub 'main' pc 1033 (forth.pir:55)

which looks like that pbc_to_exe already solved the _config bootstrap problem, but fails.

$ parrot-pipp.exe
Parrot VM: Can't stat /usr/src/perl/parrot/parrot-0.7.0-1/build/languages/pipp/s
rc/common/pipplib.pbc, code 2.
Unable to append PBC to the current directory
current instr.: 'parrot;Pipp;__onload' pc 55 (src/common/pipp.pir:95)
called from Sub 'parrot;Pipp;pipp' pc -1 ((unknown file):-1)

$ parrot-pheme.exe
"load_bytecode" couldn't find file 'compilers/tge/TGE/Rule.pbc'
current instr.: 'parrot;TGE;__onload' pc 19 (TGE.pir:94) called from Sub 'parrot;Pheme::AST::Grammar;__onload' pc 6901 (languages/pheme/lib/ASTGrammar.pir:5)
called from Sub 'parrot;Pheme::Compiler;main' pc -1 ((unknown file):-1)

Sunday August 17, 2008
01:37 PM

cygwin parrot-0.7.0-1 is near

I had to switch from a simple self-cooked build system to quilt (http://savannah.nongnu.org/projects/quilt) to manage my yet unapplied parrot patches because they were trampling over each other.
I don't want to switch to git yet.

So, quilt applied says:
#57476-pdb-version.patch
#57546-tags-xemacs.patch
39742-installed-conflict.patch
56544-install_files.patch
57006-opengl-cyg.patch -p0
58034-config_args.patch
56554-make-install-lang.patch
56998-cygdll_versioning.patch -p0

quilt series says additionally:
56996-fhs-runtime.patch
57548-CONDITIONED_LINE_enh.patch
51944-README_cygwin.patch

So the next release will have some of the old patches applied -
0.6.4 Locally applied patches:
          [perl #51944] [DOCS] Cygwin Readme
          [perl #56562] [PATCH] root.in: add cygwin importlib
          [perl #56544] [PATCH] install_files.pl
          [perl #56558] [PATCH] pdb rename to parrot_pdb
          [perl #56998] [TODO] rename cygwin dll to cygparrot.dll
          [perl #57006] [PATCH] add cygwin opengl config quirks
          [perl #57110] [PATCH] ncurses for cygwin
          [perl #57112] [PATCH] postgres for cygwin
          [perl #57114] [PATCH] urm RealBin issue
          [perl #57296] [TODO] make install -C languages

but several new ones, which were enhancements of the old way to make parrot build to proper installables.
0.7.0 - Locally applied patches:
          [perl #39742] [BUG] installed conflict
          [perl #51944] [DOCS] Cygwin Readme
          [perl #56544] [PATCH] install_files.pl
          [perl #56998] [PATCH] rename cygwin dll to cygparrot$MAJOR_$MINOR_$PATCH.dll
          [perl #57006] [PATCH] add cygwin opengl config quirks
          [perl #56554] [TODO] make install -C languages
          [perl #58034] [TODO] config_args
          [perl #56996] [TODO] FHS runtime paths

These patches of mine are not stable enough:
56996-fhs-runtime.patch
    Still working on library.c getting the
    interpreter INTERPINFO_RUNTIME_PREFIX or CONFIG_HASH and check
    for the new "installed" key if present.
57548-CONDITIONED_LINE_enh.patch
    works fine, but too early. needs some feedback for this.

I'm also working on a draft/pdd30_install.pod.

chromatic said, that I should get the contributor license agreement sent to the foundation, but this letter still needs a stamp.

My current patches are at http://code.google.com/p/cygwin-rurban/source/browse/#svn/trunk/release/parrot/p atches, the commit at http://code.google.com/p/cygwin-rurban/source/detail?r=7

I have one blocking test: t/pmc/namespace_65.pir which also failed for others, see
http://rt.perl.org/rt3/Ticket/Display.html?id=57824
http://rt.perl.org/rt3/Ticket/Display.html?id=57668 and mine at
http://rt.perl.org/rt3/Ticket/Display.html?id=58040

Friday August 01, 2008
11:15 AM

cygwin ports and patches also at google code

My collection of official and inofficial cygwin package patches and scripts is now also at google code: http://code.google.com/p/cygwin-rurban/

Browse the cygwin Perl build and patches (a custom build-system):
http://code.google.com/p/cygwin-rurban/source/browse/#svn/trunk/release/perl

Browse the cygwin Parrot patches:
http://code.google.com/p/cygwin-rurban/source/browse/#svn/trunk/release/parrot

Monday July 28, 2008
04:49 PM

http://code.google.com/p/perl-compiler

B::C is now at http://code.google.com/p/perl-compiler

With full SVN history since I took over,
and the current issues in the tracker.

Sunday July 27, 2008
12:53 PM

parrot and perl6 on cygwin

The parrot and perl6 packages had been updated for cygwin.

Packaging was a major struggle because from my limited point of view there are still some major architectural hurdles running a self-hosting rakudo perl6.exe and the languages.
The test-suite from within the source directory works just fine. And working within the source directory also.

It will get problematic when you start to try a make install, which is not yet supported. Now I know why. In parrot we have a global _config hash, just like in the perl5 module Config.pm.
But parrot is more self-containing, i.e. perl6.exe already contains this hash in a frozen state.

And there are even two different binaries: perl6.exe which only works inside the source dir, and installable_perl6.exe which accesses /usr/lib/parrot/include/... and not /runtime/parrot/include

The problem is that the runtime subdirs are mapped to /usr/lib/parrot, not all required files are installed on make reallyinstall, and the worst,
that some installable_* exe files still try to access /usr/runtime/parrot/include/config.pir (the global hash which is already linked into the binary), using a non-FHS compliant path (it should be /usr/lib/parrot/include/config.pir at least)

The build system is quite clever linking to a seperate install_config.o for those installables.
But some important functions like .include or load_bytecode still try to access /usr/lib/parrot/include/config.pir even if the hash is already loaded. And _config is only required to get the lib_path, to be able to traverse the dirs. A typical chicken-and-egg bootstrapping problem.

I wonder how to fix this the easiest way.
1. Maybe I just missed some trick and it should work ootb right now.
2. check if a a global _config hash exists and use it in load_bytecode and include and avoid loading config.pir in _config() if so.

This has a API limitation.
a. _config() is a function, not a hash, and
b. _config has no sideeffects, it just returns a hash into a local $P0 e.g. so you never know how to access the frozen hash at install_config.o.

If _config would be a global hash, you could just check for it, and avoid loading the file with the definition of _config. Since _config() is a global function I see no major problem changing the API from a global function to a global hash.
Maybe that's what interpinfo .INTERPINFO_RUNTIME_PREFIX is for. Haven't found the idea behind that yet.

Note: All this is just needed so that frozen states don't do unnecessary file accesses to files in wrong lib_paths (/usr/runtime/parrot). Once _config is initialized, the lib_path is correct and no wrong stats are done.
And runtime/parrot/include is also gone.
But this patch is still hardcoded into some libs. BAD!

A basic module system, like require would also help. Then I would just say .require 'config'

The idea to remove to formerly interpreter global config hash was re-entrancy (as explained by particle on irc). At least it worked before.
The current idea is to freeze the sub _config() and not the hash. So when the frozen _config is linked it is already available and find_sub('_config') can be used to check its existance and avoid unnecessary attempts to find 'include/config.pir' in a non-existing lib_path.

Tickets: http://rt.perl.org/rt3/Ticket/Display.html?id=56996
http://rt.perl.org/rt3/Ticket/Display.html?id=57236

There are more major hurdles for installable parrot languages.
make install e.g. is missing for the languages also, and make test-installable to test against a installable_$lang.exe without accessing the build_dir, only accessing an already installed parrot.

http://rt.perl.org/rt3/Ticket/Display.html?id=56554 contains some info, but most info is in the cygwin parrot source package, the cygport file and the patches. The source is now online at http://code.google.com/p/cygwin-rurban/source/browse/#svn/trunk/release/parrot