I've already wrote that some time ago, forgot where, probably p5p, but got no responses.
Today I tried it again on irc #p5p and ended writing a simple statistic script and the beginning of the OPLINES branch.
The patch is working but the tests results look irreal. Uploaded to http://rurban.xarch.at/software/perl/oplines1.tar.gz
My 1st testscript is poor and fails mostly (http://pasta.test-smoke.org/50, but I will fix it and run over more files to get better stats.
My 2nd is better:
#! perl
=pod
=head1 NAME
oplines - ops per line + nextstate win stats for TRY_OPLINES patch
=head1 SYNOPSIS
oplines.pl
=head1 DESCRIPTION
Theory:
Move cop_line from COP to BASEOP, and reduce the need for nextstate
ops, which will be an overall win in memory and speed for typical
undense code, less than 4 ops per line.
A cop has 5 ptrs more than a BASEOP, so the memory win will be like
following:
4 ops/line avg.
90% nextstate COP win per lines
=> on 32bit: 4*4=16 byte per line. for 10k src => 200-160k=40k memory win.
+ 4k runtime win (need less nextstate cops)
on 64bit you try.
The unknown factors:
a) typical # of ops per line
b) nextstate win:
* typical # of nextstate cops per 1000-line file.
minus # of really needed nextstate cops (lexcops) per 1000-line file.
2008-09-21 21:41:06 rurban
=cut
use Config;
use lib ".";
my ($sumfiles, $sumlines, $sumops, $sumnextstates, $sumlexstates);
my ($files, $lines, $ops, $nextstates, $lexstates);
open PM, "> B_Stats.pm";
while () { print PM "$_"; };
close PM;
for my $file (@ARGV) {
$s = `$^X -c -MB_Stats $file`;
my @s = split
if (@s > 4 and $s[0] =~
($files, $lines, $ops, $nextstates, $lexstates) = @s;
$sumfiles += $files;
$sumlines += $lines;
$sumops += $ops;
$sumnextstates += $nextstates;
$sumlexstates += $lexstates;
}
}
print "files: $sumfiles\n";
print "lines: $sumlines\n";
print "ops: $sumops\n";
my $opsratio = $sumlines ? $sumops/$sumlines : 0;
my $copratio = $sumnextstates/($sumfiles+$sumlexstates);
print "ops/line: ",sprintf("%0.2f",$opsratio),"\n";
print "cops: ",sprintf("%0.2f%",$copratio), " (lex+filecops=",
$sumfiles+$sumlexstates," / nextstates=$sumnextstates)\n";
my $runtimewin = $sumnextstates - ($sumfiles+$sumlexstates);
my $opsize = 3*$Config{ptrsize}+4+$Config{intsize};
my $copsize = $opsize + 4*$Config{ptrsize} + 8;
my $memwin = ($Config{ptrsize} * $copsize * $runtimewin) # win the cops
- ($sumops * $Config{intsize}); # minus the added line_t cop_line
print "memory win: $memwin byte (",
($Config{ptrsize} * $copsize * $runtimewin)," - ",($sumops * $Config{intsize}),")\n";
print "runtime win: $runtimewin ops ",sprintf("%0.2f%",($runtimewin*100/$sumops))," ($sumnextstates - ",$sumfiles+$sumlexstates,") \n";
__DATA__
use B::Utils qw(walkallops_simple);
use B qw(OPf_PARENS);
my ($files, $lines, $ops, $nextstates, $lexstates);
sub count_ops {
my $op = shift;
$ops++; # count also null ops
if ($op->isa('B::COP')) {
$nextstates++;
$lexstates++ if ($$op and (($op->flags != 1)
or $op->label));
}
}
CHECK {
($files, $lines, $ops, $nextstates, $lexstates) = (0,0,0,0,0);
($oldfile, $oldlines) = ("",0);
walkallops_simple(\&count_ops);
$files = scalar keys %INC;
for (values %INC) {
open IN, ") { $lines++; }; close IN;
}
print "$files\t$lines\t$ops\t$nextstates\t$lexstates\n";
}
1;
So how is the practice?
As it looks like an avg sample has 0.5-2 ops/line (but pod needs to be skipped), and about 100-200 times more pure linecops than really needed cops (block entry, new files). So the win seems to be dramatic (8% speed win for the op traversal). But I have to inspect the cops more firmly now.
files: 245
lines: 85756
ops: 59263
ops/line: 0.69
cops: 11.23% (lex+filecops=494 / nextstates=5549)
memory win: 652628 byte (889680 - 237052)
runtime win: 5055 ops 8.53% (5549 - 494)
$
files: 12051
lines: 5042732
ops: 6629648
ops/line: 1.31
cops: 39.15% (lex+filecops=15334 / nextstates=600266)
memory win: 76429440 byte (102948032 - 26518592)
runtime win: 584932 ops 8.82% (600266 - 15334)
IMPLEMENTATION
The parser emits lots of nextstate cops to track linenumbers: These cops can be omitted and the lines stored in the current op PL_op, not PL_curcop anymore.
The parser can be simplified a lot for the PL_curcop cases.
Also find_cops() is not needed anymore in most cases, where just the line # is needed, but it is still needed for getting the current filename.
A one-liner which fixes wrong svn properties
perl t/distro/file_metadata.t 2| \
perl -ne'system (substr($_,3)) if
remake: (this version for trunk)
#!/bin/sh
args=$*
if [ "${args:0:3}" = "all" ]; then
# make all parrot_utils perl6.exe languages installable test codetest
args="all parrot_utils perl6.exe languages installable ${args:3}"
fi
if test -f Makefile; then
if $(grep reconfig Makefile >/dev/null); then
make reconfig $args
else
make clean realclean && perl Configure.pl && make $args
fi
else
perl Configure.pl && make $args
fi
remake for cygwin070patches, where it is easier
#!/bin/sh
if test -f Makefile; then
make reconfig $*
else
perl Configure.pl && make $*
fi
testvm:
#!/bin/bash
# Test a parrot branch on some of my remote machines (vm's or whatever)
# must be started in the root build_dir of the branch
declare -a vm_name
declare -a vm_dir
# which branch? trunk cygwin070patches gsoc_pdd09 exceptionmagic
base=$(basename `pwd`)
# define my various vm's by name and dir with the parrot tree
n=0
# freebsd7 with gcc-4.2 and llvm-2.3
vm_name[$n]=freebsd
# on freebsd I test only one branch: trunk or cygwin070patches or whatever
vm_dir[$n]=/usr/src/perl/parrot
let n+=1
# define llvm conf_args: cc and link
vm_name[$n]=freebsd
vm_dir[$n]=/usr/src/perl/parrot
vm_conf[$n]="--cc=llvm-gcc --link=llvm-ld"
let n+=1
# on debian 4 I test trunk and cygwin070patches
# gcc-4.1.2
vm_name[$n]=debian
vm_dir[$n]=/usr/src/perl/parrot/$base
let n+=1
#vm_name[$n]=gentoo-vm
#vm_dir[$n]=/usr/src/perl/parrot
#let n+=1
#vm_name[$n]=fedora
#vm_dir[$n]=/usr/src/perl/parrot
#let n+=1
#vm_name[$n]=ubuntu
#vm_dir[$n]=/usr/src/perl/parrot
#let n+=1
#vm_name[$n]=centos
#vm_dir[$n]=/usr/src/perl/parrot
#let n+=1
#vm_name[$n]=solaris
#vm_dir[$n]=/usr/src/perl/parrot
#let n+=1
if [ ! -f Configure.pl ]; then
echo "$0 must be run a parrot build_dir. Configure.pl not found"
exit
fi
if [ -f Makefile ]; then
make clean realclean
fi
find -name \*.exe -o -name \*.bak -o -name \*~ -o -name \*.stackdump -delete
n=0
while [ -n "${vm_name[${n}]}" ]
do
if [ -z "${1}" -o "${1}" = "${vm_name[${n}]}" ]; then
echo "rsync -avzC --delete --exclude=.svn . ${vm_name[${n}]}:${vm_dir[${n}]}/"
rsync -avzC --delete --exclude=.svn . "${vm_name[${n}]}:${vm_dir[${n}]}/"
echo "ssh ${vm_name[${n}]} cd ${vm_dir[${n}]}; perl Configure.pl ${vm_conf[${n}]} && make all parrot_utils perl6 installable languages smoke smolder_test languages-smoke"
ssh ${vm_name[${n}]} "cd ${vm_dir[${n}]}; perl Configure.pl && make all parrot_utils perl6 installable languages smoke smolder_test languages-smoke"
fi
let n+=1
done
Working over the http://svn.perl.org/viewvc/parrot/branches/cygwin070patches/docs/pdds/draft/pdd
See http://www.jnthn.net/papers/2006-cam-net2pir-dissertation.pdf for Jonathan Worthington's paper describing it.
So I thought, why not try to rewrite (i.e. copy & paste + tags-query-replace) dotnet to jvm.
Both bytecodes look very similar, the
Sun's Hotspot compiler source which is available at http://openjdk.java.net/ shows that Sun took a similar path with the bytecode table description. In the perl5 bytecode compiler we have an opcode table with references to c and perl code for special ops and types (http://code.google.com/p/perl-compiler/source/browse/trunk/bytecode.pl.
In parrot we have a simple ini-style list of ops, with arguments and return type description in the target format (which is PIR) and some simple source template to expand the intermediate stack and temp. locations. http://svn.perl.org/viewvc/parrot/trunk/languages/dotnet/src/translation.rules?
With Hotspot Sun invented an adl format ("Architecture Description Language") to describe the ops. This also has a cost attribute for each op which enables an optimizing compiler, if static or JIT. See "hotspot\src\share\vm\adlc\Doc\Syntax.doc" and
"hotspot\src\cpu\i486\vm\i486.ad"
With a class2pbc (JVM to Parrot) converter we could use all the existing java libraries out there.
However, Jonathan's net2pbc dotnet converter currently works only for about 50% of the
Currenly with jvm I am stuck at opcode "iinc" 0x84 which increments the local integer variable on the current thread-local frame within the stack-based vm ("increment an int lexical"). Our SRM "compiler" takes stack arguments and converts it to our registers.
However I'm not sure how it deals with stack temporaries, so-called stack frame variables. And besides those stack frame vars the jvm also uses temporary int variables heavily, which are usually stored in registers if possible.
Note that usually closures and class methods store their lexical vars on the C stack right above its code, so that a return to an uplevel function/method automatically cleans up the stack with its code and vars, which is much faster than the perl5 pad layout, where the lexical vars are kept in seperate arrays.
It could be that the java vm keeps its stack frame lexicals on the so-called "C stack" as C and lisp do it, or on the heap as perl5 does it with its PAD arrays.
After several days of hacking and testing I could release the cygwin package for parrot-0.7.0 with a lot of patches:
See http://code.google.com/p/cygwin-rurban/source/browse/trunk/release/parrot/
Problems:
The new exception code only affected dotnet, which I fixed with #58176-dotnet-exceptions.patch
Most problems came from languages which were not included in my huge #56554-make-install-lang.patch yet. make languages installable and add the actions to their makefiles.
I finished this treatment now for all langs, just a few are still misbehaving and I gave up on m4, pipp, tcl, pheme and forth.
I described it at http://www.perlfoundation.org/parrot/index.cgi?parrot_installation
Interesting is
$ parrot-forth
Could not find non-existent sub _config
current instr.: ' init' pc 944 (forth.pir:11)
called from Sub 'main' pc 1033 (forth.pir:55)
which looks like that pbc_to_exe already solved the _config bootstrap problem, but fails.
$ parrot-pipp.exe
Parrot VM: Can't stat
rc/common/pipplib.pbc, code 2.
Unable to append PBC to the current directory
current instr.: 'parrot;Pipp;__onload' pc 55 (src/common/pipp.pir:95)
called from Sub 'parrot;Pipp;pipp' pc -1 ((unknown file):-1)
$ parrot-pheme.exe
"load_bytecode" couldn't find file 'compilers/tge/TGE/Rule.pbc'
current instr.: 'parrot;TGE;__onload' pc 19 (TGE.pir:94) called from Sub 'parrot;Pheme::AST::Grammar;__onload' pc 6901 (languages/pheme/lib/ASTGrammar.pir:5)
called from Sub 'parrot;Pheme::Compiler;main' pc -1 ((unknown file):-1)
I had to switch from a simple self-cooked build system to quilt (http://savannah.nongnu.org/projects/quilt) to manage my yet unapplied parrot patches because they were trampling over each other.
I don't want to switch to git yet.
So, quilt applied says:
#57476-pdb-version.patch
#57546-tags-xemacs.patch
39742-installed-conflict.patch
56544-install_files.patch
57006-opengl-cyg.patch -p0
58034-config_args.patch
56554-make-install-lang.patch
56998-cygdll_versioning.patch -p0
quilt series says additionally:
56996-fhs-runtime.patch
57548-CONDITIONED_LINE_enh.patch
51944-README_cygwin.patch
So the next release will have some of the old patches applied -
0.6.4 Locally applied patches:
[perl #51944] [DOCS] Cygwin Readme
[perl #56562] [PATCH] root.in: add cygwin importlib
[perl #56544] [PATCH] install_files.pl
[perl #56558] [PATCH] pdb rename to parrot_pdb
[perl #56998] [TODO] rename cygwin dll to cygparrot.dll
[perl #57006] [PATCH] add cygwin opengl config quirks
[perl #57110] [PATCH] ncurses for cygwin
[perl #57112] [PATCH] postgres for cygwin
[perl #57114] [PATCH] urm RealBin issue
[perl #57296] [TODO] make install -C languages
but several new ones, which were enhancements of the old way to make parrot build to proper installables.
0.7.0 - Locally applied patches:
[perl #39742] [BUG] installed conflict
[perl #51944] [DOCS] Cygwin Readme
[perl #56544] [PATCH] install_files.pl
[perl #56998] [PATCH] rename cygwin dll to cygparrot$MAJOR_$MINOR_$PATCH.dll
[perl #57006] [PATCH] add cygwin opengl config quirks
[perl #56554] [TODO] make install -C languages
[perl #58034] [TODO] config_args
[perl #56996] [TODO] FHS runtime paths
These patches of mine are not stable enough:
56996-fhs-runtime.patch
Still working on library.c getting the
interpreter INTERPINFO_RUNTIME_PREFIX or CONFIG_HASH and check
for the new "installed" key if present.
57548-CONDITIONED_LINE_enh.patch
works fine, but too early. needs some feedback for this.
I'm also working on a draft/pdd30_install.pod.
chromatic said, that I should get the contributor license agreement sent to the foundation, but this letter still needs a stamp.
My current patches are at http://code.google.com/p/cygwin-rurban/source/browse/#svn/trunk/release/parrot/
I have one blocking test: t/pmc/namespace_65.pir which also failed for others, see
http://rt.perl.org/rt3/Ticket/Display.html?id=57824
http://rt.perl.org/rt3/Ticket/Display.html?id=57668 and mine at
http://rt.perl.org/rt3/Ticket/Display.html?id=58040
My collection of official and inofficial cygwin package patches and scripts is now also at google code: http://code.google.com/p/cygwin-rurban/
Browse the cygwin Perl build and patches (a custom build-system):
http://code.google.com/p/cygwin-rurban/source/browse/#svn/trunk/release/perl
Browse the cygwin Parrot patches:
http://code.google.com/p/cygwin-rurban/source/browse/#svn/trunk/release/parrot
B::C is now at http://code.google.com/p/perl-compiler
With full SVN history since I took over,
and the current issues in the tracker.
The parrot and perl6 packages had been updated for cygwin.
Packaging was a major struggle because from my limited point of view there are still some major architectural hurdles running a self-hosting rakudo perl6.exe and the languages.
The test-suite from within the source directory works just fine. And working within the source directory also.
It will get problematic when you start to try a make install, which is not yet supported. Now I know why. In parrot we have a global _config hash, just like in the perl5 module Config.pm.
But parrot is more self-containing, i.e. perl6.exe already contains this hash in a frozen state.
And there are even two different binaries: perl6.exe which only works inside the source dir, and installable_perl6.exe which accesses
The problem is that the runtime subdirs are mapped to
that some installable_* exe files still try to access
The build system is quite clever linking to a seperate install_config.o for those installables.
But some important functions like
I wonder how to fix this the easiest way.
1. Maybe I just missed some trick and it should work ootb right now.
2. check if a a global _config hash exists and use it in load_bytecode and include and avoid loading config.pir in _config() if so.
This has a API limitation.
a. _config() is a function, not a hash, and
b. _config has no sideeffects, it just returns a hash into a local $P0 e.g. so you never know how to access the frozen hash at install_config.o.
If _config would be a global hash, you could just check for it, and avoid loading the file with the definition of _config. Since _config() is a global function I see no major problem changing the API from a global function to a global hash.
Maybe that's what interpinfo
Note: All this is just needed so that frozen states don't do unnecessary file accesses to files in wrong lib_paths (/usr/runtime/parrot). Once _config is initialized, the lib_path is correct and no wrong stats are done.
And runtime/parrot/include is also gone.
But this patch is still hardcoded into some libs. BAD!
A basic module system, like require would also help. Then I would just say
The idea to remove to formerly interpreter global config hash was re-entrancy (as explained by particle on irc). At least it worked before.
The current idea is to freeze the sub _config() and not the hash. So when the frozen _config is linked it is already available and find_sub('_config') can be used to check its existance and avoid unnecessary attempts to find 'include/config.pir' in a non-existing lib_path.
Tickets: http://rt.perl.org/rt3/Ticket/Display.html?id=56996
http://rt.perl.org/rt3/Ticket/Display.html?id=57236
There are more major hurdles for installable parrot languages.
make install e.g. is missing for the languages also, and make test-installable to test against a installable_$lang.exe without accessing the build_dir, only accessing an already installed parrot.
http://rt.perl.org/rt3/Ticket/Display.html?id=56554 contains some info, but most info is in the cygwin parrot source package, the cygport file and the patches. The source is now online at http://code.google.com/p/cygwin-rurban/source/browse/#svn/trunk/release/parrot