Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

ethan (3163)

ethan
  tassilo.von.pars ... AMrwth-aachen.de

Being a 25-year old chap living in the western-most town of Germany. Stuying communication and information science and being a huge fan of XS-related things.

Journal of ethan (3163)

Thursday September 29, 2005
12:56 AM

CPAN detergent needed

There were times when a pathetic addition to the CPAN was merely a minor annoyance, easy to tolerate as such a module did not interfer with one's own work.

Nowadays however, one rotten CPAN egg is capable of making tests of your own modules fail, as can be seen here and here.

Mind you, the offending module isn't used nor mentioned anywhere in the modules whose tests it can make fail. The failures happen because this pathetic piece of shit doesn't adhere to any CPAN packaging standards and its files and directories somehow get merged with those of other modules on tarball-extraction.

And now I am waiting for those who still claim that CPAN's policy should be as liberal as possible when it comes to uploads. No, it shouldn't. Instead, too obvious breaches against CPAN conventions should be denied their upload. Furthermore, those that were uploaded in the past ought to be deleted immediately.

I suppose it's not going to happen. That means that my modules will continue to fail their tests on that particular machine until its admin finally removes the offending files by hand.

Monday May 16, 2005
03:34 AM

Test::LongString

For a long time I didn't quite see the purpose of Test::LongString.

Until I had to compare two binary strings of around 170K with each other and C garbled my terminal when they
turned out to be unequal. Rafael++.

Monday April 18, 2005
02:55 AM

toke.c

I never quite understood why Perl offered no hooks into its lexer and parser. They're contained in the interpreter, the very same program that runs my Perl scripts.

So I snuck a peek at the dreaded toke.c. My initial thought was that it was merely a matter of calling yylex() after initializing a few of the global PL_* variables appropriately. Only that on closer inspection there turned out to be exactly 99 of these global variables involved in the lexing process, including those dealing with the various perl stacks, control OPs and symbol tables.

So what I did was create a C++ class with 99 member variables. Each function in toke.c became a method that no longer works on PL_variable but this->pl_variable instead. Some non-lexer related functions had to be modified thusly, too, such as Perl_init_stacks() and a handful of those Perl_save_*() functions in scope.c. The whole purpose of that was to make the lexer re-entrant.

With these adjustments (and a few hundred #undefs/#defines), the actual XS code is very tiny:

MODULE = Perl::Lexer        PACKAGE = Perl::Lexer
 
Lexer *
Lexer::new ()
    CODE:
    {
        RETVAL = new Lexer();
        RETVAL->Pinit_stacks(aTHX);
    }
    OUTPUT:
        RETVAL
    CLEANUP:
        RETVAL->ME = newSVsv(ST(0));
 
void
Lexer::set_string (SV *line)
    CODE:
    {
        THIS->lex_start(aTHX_ line);
    }
 
void
Lexer::next_token ()
    CODE:
    {
        int tok = THIS->yylex(aTHX);
 
        /* skip empty lines */
        if (tok && THIS->bufptr)
            while (THIS->bufptr == '\n') THIS->bufptr++;
 
        if (tok == 0)
            XSRETURN_EMPTY;
 
        EXTEND(SP, 2);
        ST(0) = sv_2mortal(newSViv(tok));
        ST(1) = sv_2mortal(newSVpv(TOKENNAME(tok), 0));
        XSRETURN(2);
    }
 
void
Lexer::DESTROY ()

And a sample script along with its output looks like this:

use blib;
use Perl::Lexer;
 
my $string = <<'EOS';
$a{1} = 1;
print keys %a;
EOS
 
my $lexer = Perl::Lexer->new;
$lexer->set_string($string);
while (my $l = $lexer->next_token) {
    print $l, " ";
}
print "\n";
 
__END__
$ WORD { THING ; } ASSIGNOP THING ; LSTOP UNIOP % WORD ; ;

A couple of problems still exist: Once the lexer sees a comment, an empty line or a shebang line, it seems to gobble up all characters up to the end of the string and thus finishes scanning. The shebang-line stuff is done in S_find_beginning() in perl.c before parsing even starts. As for empty lines, I suppose they are handled by perl's parser and not its lexer.

The last thing that needs to be done is making the actual attributes belonging to a token available. Ideally, this is just a matter of exposing yylval to the outside world.

Thursday March 31, 2005
06:51 AM

Kwalitee

Lately I poked around a bit at CPANTS. I should maybe say that I am not really believing in this kwalitee thing and am mostly d'accord with what Schwern wrote about it on cpanratings.

Nonetheless I couldn't resist of looking up the scores of my modules. They weren't quite as high as possible because so far I haven't done any pod-coverage in my tests. This can be easily rectified. Then I noticed an annoying thing about Test::Pod and Test::Pod::Coverage: They claim to rely on Test::More. This is bad for some of my modules which are supposed to run on older perls, too, so I only use the plain Test module for their tests. Yet, CPANTS somehow tickled my vanity so I came up with test suites that use Test::Pod and Test::Pod::Coverage respectively without any need for Test::More:

eval "use Test::Pod";
if ($@) {
    print "1..0 # Skip Test::Pod not installed\n";
    exit;
}
 
my @PODS = qw#../blib#;
 
all_pod_files_ok( all_pod_files(@PODS) );

and in a similar fashion for Test::Pod::Coverage.

Of course, this exposes my modules to other potential problems, such as when the Test::Harness protocol changes, and thus makes it more flakey. However, the kwalitee score is now higher which already says something about the value of the kwalitee measurement.

Also, I think this test from CPANTS:

is_prereq
    Shortcoming: This distribution is only required by 2 or less other distributions.
    Defined in: Module::CPANTS::Generator::Prereq

is plain wrong and worthless. Why should it be a good thing that a module is used as a prerequisite by other modules? There's a whole class of modules on the CPAN that provide very high-level functionality and will therefore never be a prerequisite (think of modules such as Mail::Box or MPEG::MP3Play). This applies to all modules that are used to write applications instead of other modules.

Tuesday March 15, 2005
03:24 AM

Increasing turn-around times

What is this with the g++?

I am right now wrapping the fairly huge SndObj library written in C++ into an XS module. I'm done with maybe a third of all the classes and compilation is already painful:

ethan@ethan:~/Projects/dists/sound-object/Sound-Object$ time make
/usr/bin/perl /usr/share/perl/5.8/ExtUtils/xsubpp  -C++ -typemap /usr/share/perl/5.8/ExtUtils/typemap -typemap typemap  Object.xs > Object.xsc && mv Object.xsc Object.c
g++ -c  -I. -D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBIAN -fno-strict-aliasing -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -O2   -DVERSION=\"0.01\" -DXS_VERSION=\"0.01\" -fPIC "-I/usr/lib/perl/5.8/CORE"  -DOSS Object.c
rm -f blib/arch/auto/Sound/Object/Object.so
LD_RUN_PATH="" g++  -shared -L/usr/local/lib Object.o  -o blib/arch/auto/Sound/Object/Object.so   -lsndobj -lpthread
chmod 755 blib/arch/auto/Sound/Object/Object.so
 
real    0m38.023s
user    0m31.130s
sys     0m0.570s

One additional method added to the XS adds roughly one second of compilation time. Once in a while (mostly after rearranging the order of the packages in the XS file), compilation may take up to one minute. I am quite aware that what really slows down C++ compilation is the extensive use of templates. Now, SndObj uses next to no templates (and that includes no string objects, too). The XS stuff uses only one template worthy mentioning, a map of maps for the ref-counting of the Perl objects when a C++ object is constructed on behalf of another one.

Also, this slowness seems to be most significant with the g++. Microsoft's C++ compiler is by an order of quite a few magnitudes faster: It hardly ever needs more than 5 seconds for a far more complex WindowsForms .NET application which uses templates all over the place. This might be due to the use of precompiled headers.

Thursday March 03, 2005
01:56 AM

TIOCLINUX and gpm

For ages I've been looking for a way to make gpm's selection on the console available under X and vice versa. Yesterday I decided that I should take steps to rectify this. Looking at gpm's sources, it all looked straight-forward. With the TIOCLINUX ioctl one can put the current selection into the kernel's selection buffer or make the kernel paste it to a given file-descriptor.

At this point however the idiocy begins: There is no way to get a copy of the kernel's selection buffer. All you can do with it is having the kernel write it to a file-descriptor which has to be attached to a device capable of TIOCLINUX. This is not very helpful. I want to get the actual content of this buffer and pass it to an application such as xclip. But there is no infrastructure in the kernel for that. Even worse, after googling around a little I noticed that a few years ago someone actually provided a patch to console.c which would allow that. For some reason the kernel nitwits rejected it.

Fortunately my kernel is out-of-date anyway so I have to compile a new one. This time I'll make the necessary adjustments to the source, extend gpm a bit to pass the selection to xclip and then hopefully have a more useful mouse.

Tuesday December 14, 2004
03:11 AM

Benchmarking perls

Partly due to the fact that I didn't have anything interesting to do, I wrote a little set of modules to benchmark perls against each other. There already is perlbench on the CPAN but I found the way tests had to be written inconvenient.

According to the results I get, realworld programs seem to get faster on recent perls. Some other things on the other hand are much slower, most notably regexes (almost by a factor 2 when comparing 5.5.4 with a threaded 5.8.6). The results:

Benchmarks     | perl5.5.4 | perl5.6.2 | perl | perl5.8.6 | perl5.8.6th | Weight
---------------+-----------+-----------+------+-----------+------------- +-------
bench/loops    |   1000    |    906    |  814 |    731    |     782     |     70
bench/regex    |   1000    |   1258    | 1978 |   1457    |    1977     |    116
bench/recurse  |   1000    |    961    |  983 |    956    |     991     |     73
bench/wave     |   1000    |    860    |  994 |    894    |     925     |    236
bench/autoload |   1000    |    968    | 1095 |   1078    |    1159     |    139
bench/substr   |   1000    |    947    |  984 |    986    |     957     |    164
bench/mail     |   1000    |    698    |  903 |    748    |     959     |    198
---------------+-----------+-----------+------+-----------+-------------+-- -----
Overall        |   1000    |    910    | 1085 |    960    |    1082     |   1000

The fourth column is the ordinary Debian testing-perl (perl5.8.4 with threads). The benchmarks can be found here. Some tests I consider almost irrelevant, namely autoload which is an OO version of the Fibonacci-number generator where each recursive instance is autoloaded. Also, loops is not interesting because empty loops are unlikely to show up that often. Real world programs are wave which decreases the volume of a 18meg WAV file, mail which walks through a 28meg mailbox and substr which calculates the length of the longest common substring of perldoc -tT perlfunc. It's 647 by the way, would you have guessed?

The rightmost column is the relative amount of time this test took on the leftmost perl. This number is used for calculating the weighted mean in the bottom row. In case the Weight column doesn't add up to 1000, that's due to some unclever rounding of my code.

Finally, the module which does the timing and that is included from each benchmark script is this:

package Perl::Benchmark::Lib;
 
use strict;
use Time::HiRes;
 
use base qw/Exporter/;
use vars qw/$VERSION @EXPORT/;
$VERSION = '0.01';
 
$SIG{__WARN__} = sub {};
$SIG{__DIE__} = sub {};
 
@EXPORT = qw/bench_adjust/;
 
my $corrective = 0;
 
bench_adjust();
 
sub bench_adjust {
    my ($code, $times) = @_;
    if (defined $code) {
    $times ||= 1;
    $corrective += eval sprintf <<EOEVAL, "$code\n" x $times;
my \@td = Time::HiRes::gettimeofday();
%s
tv_interval(\@td);
EOEVAL
    }
    @Perl::Benchmark::Lib::PB_timer = Time::HiRes::gettimeofday();
}
 
sub report_timing {
    print Time::HiRes::tv_interval(\@Perl::Benchmark::Lib::PB_timer) - $corrective;
    print "\n";
}
 
END {
    report_timing();
}
 
1;

It dumps the number of seconds of the runtime to stdout where it is picked up by another module that does all the bookkeepting and eventually spits out the table you see further above with the help of Text::Table. A very nice module, by the way.

Wednesday December 08, 2004
12:40 PM

Two Larries and one Groucho

Compare him with him and this chap.

Tuesday December 07, 2004
02:26 AM

RSS-feeds

Nowadays there are RSS-feeds for virtually every crap conceivable. However, you are doomed when you look for German TV-listings as RSS-feeds. The various TV stations have one, but I'd like to have a feed covering all the 20-ish German stations. I suppose I have to do some website-parsing again (and this is the year 2004, mind you!).

Sunday November 28, 2004
01:54 AM

Will it ever end?

For over a week now I am preparing the next List::MoreUtils release. It takes so long because I am incorporating all stuff from List::MoreUtil into it. For each new function I had to write an equivalent XSUB and some of the turned out to be a bit tricky.

Then I noticed that List::MoreUtil's tests (that I simply took over) are incomplete in that they don't test some of the key characteristis of each function. Some map-like functions pass aliases to the original values to their code argument. Others don't. As it happened, I also noticed that the pure-Perl implementation and the XSUB one sometimes differed with respect to this so I had to make that consistent, too.

The only thing left is now copying the documentation for these functions over from List::MoreUtil. Most probably I will be deeply unhappy with it so I might end up rewriting it.