Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

bart (450)

Journal of bart (450)

Thursday February 07, 2008
06:44 AM

The end of a meme?

Oh no! According to this news article, Duke Nukem Forver will be released at the end of this year. Will this be this the end of a meme? DNF is the prototypical example of eternal vaporware. Computer geeks love to make fun of it.

But, we're not there yet. It might still turn out right. Er, wrong. If not... we'll have to find something else to make fun of.

Tuesday January 29, 2008
07:01 PM

The pain of updating Perl

A few days I decided to upgrade ActivePerl on my laptop. Not the major upgrade to 5.10.0, not yet, I just wanted to have the new GUI version of PPM, just like I already had on my other computer. It's just a minor upgrade between builds of perl 5.8.8, from build 817 to build 822. That should be relatively painless... Not so.

Well, despite the fact that XS modules are binary compatible, the new build refuses to install on top of the older build. That means I'll have to uninstall perl, install the new version, and reinstall every module I had added. Ouch.

I remember having taken a Bundle snapshot with CPAN.pm over a year ago, and it wasn't pretty: installing that bundle resulted in CPAN.pm wanting to reinstall core modules. I didn't want to live through that again, besides, this being Windows, installing through CPAN would probably not be trivial for some modules. So this time, I was going to try to use PPM, and, preferably, automate it.

It's easy to get a list of modules installed with PPM, complete with version numbers into a file, with ppm query * or (is this new?) ppm list. (Oh, fun, apparently the output format has changed.).

But after that, I'm stuck. How the hell do you use that list to install those packages automatically? I'm stumped. I want to:

  • install modules I don't have yet, and
  • upgrade modules that are out of date.

Simple enough. But it looks like having PPM just do that by feeding it that list, simply isn't in the list of supported features.

So I ended up installing most of these modules by hand, list in hand. Well, I tried. It turned out some of the modules were still not properly installed. For example, Crypt::SSLeay was missing its DLL, and Win32::API just didn't work.

So now, days later, I'm still stuck with an incomplete set of reinstalled, and possibly broken, modules. I now just have to install additional modules when I find some script is broken. Oh, joy.

And then, there are still some modules (WWW::Mechanize and HTML::TokeParser::Simple) of which the API had changed, so, with freshly installed (and upgraded) modules, my scripts just didn't work any more. I've had to figure out what changed, and modify the script. Not fun.

I'm not looking forward to upgrading to 5.10.

p.s. I have some vague plans, if necessary, to write a shell script, controlling ppm through the command line, to install or upgrade the whole list.

Thursday January 24, 2008
05:52 PM

Badmouthing Perl

Today, when browsing through the popular sites of the day, I found several sites where the author found it necessary to sneer at Perl, where it wasn't even the subject of the post. And I wasn't even searching for it. Is this the new custom?
  1. GTK Hello World in Six Different Languages

    This is actually the least worrisome of the lot:

    Although I find writing Perl to be painful for everything but processing text files in a terminal, I found the Perl GTK bindings to be relatively straightforward.

  2. "If you don't know how compilers work, then you don't know how computers work"

    Steve Yegge writes:

    You discover that jsdoc is a miserable sod of a Perl script that seg faults on about 50% of your code base, and bear with me here you've vowed never to write another line of Perl, because, well, it's Perl. Pick your favorite reason.

  3. Can Dynamic Languages Scale?

    This is by far the worst of the lot, gratuitous Perl bashing:

    It's as Marx said, lo these many years ago: "From each language, according to its abilities, to each project, according to its needs."

    Oh, except Perl. Perl just sucks, period. :-)

These people don't appear to even know Perl, or at least, don't appear to know it well enough.

Just stop it, please. It's not funny.

Sunday January 20, 2008
04:07 PM

Video game skills

To relax, I occasionally play a little video game like Teagame's TG Motocross 3. I currently don't have any trouble finishing the game until I start to do stunts to gain more points.

My son, who is 7, tried it out, and he doesn't succeed staying on the bike for longer than about 10 seconds. It's totally unplayable to him.

I tried my hand at Fancy Pants Adventures, but this game is unplayable to me. I barely succeed to stay alive for more than 15 seconds. On the other hand, he has no trouble with that game at all, and within the first session (of a few hours, I admit), he succeeded in finishing the complete game more than once.

Different people can be more skilled in one game than in another, while it's the other way around for other people, even when they had never played these games before, and superficially, the games appear to require a similar a skillset. It just goes to show that being good at video games is not a simple one-dimensional skill, one that you could just get a single score at. Neither is the difficulty level of such a game. Am I better at video games than him, or is it the reverse? Actually it's neither. It just is not that cut and dried.

n.b. if you like Fancy Pants (and apparently a lot of peopel do), note that the second game is out: Fancy Pants Adventures World 2

Another game that we recently discovered, and which my son lately really is addicted to, is Falling Sand Game (requires Java). It has no aim at all but to enjoy the experience. There's no score, and it doesn't end.

Thursday January 10, 2008
05:51 PM

comment on MJD's Clubbing someone to death with a loaded Uzi

Mark Jason Dominus is a Perl hacker and book author with a blog. In one of his latest entries,
Clubbing someone to death with a loaded Uzi, he rather harshly critiques other people's (beginners) code. But whenever you do that, you should make sure your replacement code isn't dodgy itself.

There is no way to let people comment on the blog, so I'm posting my remarks here.

He writes:

It could have been written like this:

printf FILE "$LOCATION{$location}\,";
printf FILE "%4s", "$min3\,";
printf FILE "%4s", "$max3\,";
printf FILE "%1s", "$wx3\n";

Eww. There's a few red flags in there:

  • Don't use printf where your data is used as a template. Granted, in this example, the data comes from a hash that is initialized with literal data stored in the script, but projects tend to evolve, and data just is not a template. If ever the data would contain a "%" sign, this code will blow up.
  • Why leave in first joining variables with other characters (such as commas), and then format the result with printf? Put the comma in the template.
  • There's no need for this to be broken up into 4 statements.

In summary: this code could have become:

printf FILE "%s,%3s,%3s,%1s\n", $LOCATION{$location}, $min3, $max, $wx3;

which is quite a bit shorter, and cleaner too, IMO.

But there are still more things wrong with the original code that he didn't discuss. For example:

foreach $location_name (%LOCATION ) {
    $location_code = $LOCATION{$location_name};
    ...

H*ll, if you want to loop over the keys of a hash, at least don't loop over both the keys and the values! Granted, in MJD's replacement code, the loop is gone, so this problem has disappeared too, but this is an major mistake that shouldn't just be skipped over.

foreach $location_name (keys %LOCATION ) {
    ...

Otherwise, if ever one of your hash values is also used as a key in this hash, you'll get noise output.

I am guessing that one reason why this person bothers to loop over a hash trying to find a particular hash item, may be to avoid outputting anything in case the item isn't in the hash. MJD just drops this, and outputs a line anyway. So I'm adding that conditional back in:

printf FILE "%s,%3s,%3s,%1s\n", $LOCATION{$location}, $min3, $max, $wx3
  if exists $LOCATION{$location};

There. Comments? Anything that I overlooked?

Tuesday October 30, 2007
07:19 PM

tinyperl and perlbin (perltobin) (and PAR)

For my job I need to do the same few repetitive actions on about 35 Windows PCs. It's a kind of work that's quite tricky to do using the normal Windows GUI, but that is quite easy to script in Perl. There's still one problem: none of all these PCs has Perl, and I'm not going to install it for a simple, one time task.

So I was thinking of building a binary package from the script. I've always been quite impressed with Graciliano M. P.'s work on TinyPerl. So I decided to try to use it for this task.

I found it works pretty well, though there are a few warts:

  1. The special variable $0 has the value "-e", which makes it impossible for the script to find out its own directory, as I haven't found any other way to find that out
  2. Use of glob makes it crash. I suspect there's an incompatibility between the DLL of File::Glob and tinyperl.exe, because it crashes also when you just use tinyperl.exe instead of plain perl.exe, and not just when running the packed script.
  3. Its perl version is 5.8.0, which is not only quite old, but you know what they say about first releases of programs... that goes for Perl 5.8, too.
  4. It decompresses its lib archive (which is in a separate ZIP file) in the directory the executable is in, which is an activity that's frowned upon ever since Windows XP came out, now 5 years ago... yet all too many programs still try to write to the directory the executable is in.

    At least, PAR decompresses its archive (that is embedded in the executable itself instead of in a separate file) in a directory for temporary files. (Why do these files need to be decompressed to a file anyway? Why can't they be loaded from RAM instead?)

  5. The project appears abandoned, in fact, everything GMPASSOS has put on CPAN hasn't been updated since somewhere in 2005. He hasn't been on Perlmonks in 2 years, either. Where has he gone to? I have no idea...

Well that shatters any hope of getting any bugs fixed, at least, in a timely manner...

But somewhere on this project's homepage was a link to another, related Sourceforge project: PerlBin, from the formerly very active crazyinsomniac AKA PodMaster, who also vanished from the face of this earth...

I find it quite impressive. The project is only a Perl module, with a package of just 16k... and a script, called perltobin. You can allegedly use it for a vast array of platforms, provided you have a C compiler... or you can use the "binary" package for ActivePerl on Windows, either 5.6.x or 5.8.x, which doesn't require a C compiler at all the pre-compiled to-be-linked-in library is included in the distribution.

It has a different set of properties, from the same (limited) point of view (for this project):

  1. glob works, though you need to explicitely invoke File::Glob to include the necessary files that appears to be caused because Module::Dependency doesn't seem to catch it by default.
  2. $0 is set to the path of the executable file, so it can deduce its own file path
  3. Module files are copied into a file tree, not into a ZIP file. At least it doesn't require decompression into temp files, though for distribution it is quite large
  4. Even though the project hasn't been updated since, what? 4 years, it actually still works, combined with the most recent distribution of ActivePerl
  5. You have to manually delete the lib tree when you want to recompile the binary, after an edit.

BTW it appears to be built upon GMPassos' work, but with a different focus, apparently.

And then, there is PAR. I haven't tried it for this project, but even though it has the great advantage of producing just one executable file, it also has the disadvantage of building temp files for the modules when it runs, which makes the behaviour less than professional; just like TinyPerl, but unlike PerlApp/perl2exe (AFAIK). Somehow I always found PAR quite daunting, so this has always been a bit of a showstopper to me.

But I like having the choice between alternatives, and I can dream of a merged project, with a "best of all worlds", and one that doesn't need any tempfiles, as I'm quite sure that it must be possible. I just don't know how, yet.

Saturday October 13, 2007
11:20 AM

Belgian Perl Workshop Oct 27... anybody I know going?

In two weeks time, the first Belgian Perl Workshop is going to take place. There are a few talks that interest me, although the schedule keeps shifting all the time... Are there going to be 2 rooms, or just one? I'm not sure. But that would not be the main reason for me to go, if I go, I'm still undecided.

The main reason to go for me would be maybe to finally meet up with some other Perl people that I may have known online sometimes for many years already.

But it feels to me like I'm one of just a handful of people in Belgium that are actually using Perl and are engaged in the international Perl community. So... Are you going?

Tuesday August 21, 2007
07:58 PM

Installing ActivePerl and MinGW from scratch

As you may well know, in modern releases of ActivePerl 5.8.x, ActiveState has provided automatic support for (free) alternative C compilers to build XS modules in. I've been doing that for a while now... but I had no idea how hard it is these days, to install them both from scratch.

It's now the second time in about a month time, that I'm doing this. This time, I took extra care to carefully remove old installations, so I can safely say I'm doing a blank install. The only thing I already have, is MS nmake, and it's in PATH.

First I'm installing MinGW. Boy has that become easy! Just go to the MinGW download page, get the installer (MinGW-5.1.3.exe, the top link in the table), and run it. Choose what you want installed, and it'll just fetch the archives and install them. Wow. There's no longer a need for a "metre of beer" contest, this is just too simple.

The previous time I installed ActivePerl (build 820) I had a few problems solvable, but problems nevertheless. This time, most (all?) of these glitches seem to have gone. That is good news.

I'm still using Win98 on my secondary computer, the primary being an XP laptop. Last time, I had problems running scripts (PPM, perldoc, CPAN) straight from the command line. It refused to run, the only error message I got was "syntax error", which doesn't sound like a Perl error message. I could still run it using

perl -S PPM

so I think it may have been related to something pl2bat did. I thought the resulting scripts were Unix text files, or mixed half Unix text files, half Windows text files. But that seems to have been fixed, it works now.

Next test: CPAN. The previous time building XS files didn't work out of the box, I got weird C compiler errors. After some hours of digging and finally fixing the problem, I found out on ActiveState's bug tracking system it had already been solved months earlier. What the...? But, looking at the patch list for this build, it looks like Jan Dubois has followed his own advice, as all compiler related values have been disabled in configpm, the Perl script that in the source distribution is used to build Config.pm. That implies that now, at least, MinGW should be able to build XS modules out of the box. Excellent.

Except: the version of CPAN.pm that comes with it (1.9102), is broken. It complains about an unimplemented flock call. So I downgraded CPAN.pm by copying the previous version out of the older distribution (1.7602). Did I mention I use the AS distro, not the MSI file? I do. It's a plain ZIP file with a relocation script. Nice and transparent for cases like this. After undergoing a bit of panicking by CPAN.pm about the .lock file, and manually deleting it a few times, it finally ran.

I had to first manually add the bin directory for MinGW to my PATH, because ActivePerl didn't see my gcc. Perhaps that is better in XP the mechanism to permanently set environment variables is different there.

But then, MinGW can indeed build XS modules out of the box. I tried Text::CSV_XS, HTML::Parser, DBI, and DBD::SQLite as test cases; only DBI had some tests fail but I suppose that is, again, because of the platform. There were some complaints about flock not being implemented.

All in all, things are looking good except that the included CPAN version refuses to work on Win98. But you probably will not have that problem on XP.

Thursday July 12, 2007
05:58 PM

CMS

So you're thinking about using a content management system for a new website? Here's a few thoughts on some popular CMS systems on the hates-software.com site: poofygoof hates software: content management systems.

A few quotes I particularly liked:

Plone
however, after reading further, it appears that plone has its own idea of how databases should work, and doesn't use a SQL backend out of the box. thanks, zopers, but I have better things to do than wank in python.
Drupal
web input methods are _STILL_ the simplistic hack they always were, and the last thing I want to do is handle content generation WITHIN A WEB BROWSER. I want to view it in a browser, but not generate it there. I'll edit XML, even. drupal does not appear to accomodate.
Thursday June 28, 2007
07:30 PM

Functional approach for weighted random picks out of a list

I have been trying to find a good way to pick a random item from a list, where items do not have the same probability of being picked: using weighted probabilities.

A way to visualize the problem, which is also a way to tackle it, is by representing each item by books (or planks) of unequal thickness. You make a pile of all the books, measure the total height of the pile, randomly choose a height between zero and that total height, and see which book lies at that height.

But I feel this is a very clumsy way: you have to assign an order to the items, calculate a running total for the heights of the items that come in front of it to determine at what height each item lies, and all that just to pick one item. If an item is added or removed from the list, you have to start over with all the administrative work of ordering and summing. What a contrast with equally weighted probabilities, where you just have to pick an index number.

So I thought, surely there must a more straightforward, functional way? One that could possibly even work in plain SQL? Because there is such a way to randomly shuffle a list of items with equal probabilities: just add an extra column with a random value (rnd, or rand in Perl, dbms_random.value in Oracle), and sort the rows according to this column. In Oracle:

select dbms_random.value, T.* from mytable T order by 1

All items have an equal chance of winning, i.e. coming out first, or getting the lowest random value, irrespective of the distribution of the randomness, provided the subsequent random values are sufficiently uncorrelated (meaning you can't really predict the next random value based on the previous values, which actually is an illusion in pseudo-random generators) and the probabilities are the same for all, as there is nothing skewing the chance of winning in favor of any of the items. Hence: they must all have an equal chance of winning.

But is there such a way to do the same with weighted probabilities? There must be. So I decided to explore. I am planning on comparing the values of $rand[$i] which is a weighted random function: a random number where the probability distribution depends on the value of the index $i. So I'm dusting off my math skill. As my confidence in them is not too great (I know I'm bound to make plenty of mistakes), I'll use experimental results to verify the results so any blatant mistakes should stand out.

Eventually, by pure luck, I did find a formula that works, but not at first. Here's how I found it.

I'll start with just 2 items. What is the probability that item 2 will win? The way you calculate that is like this: you find out what values X you can possibly have for item 1's random value, determining the probability that the value is X (or, for continuous functions, in a small range around X between X and X+dx, which for small enough values of dx becomes virtually proportional to dx). Next, you have to determine the probability that item 2 beats this. You multiply the two, to calculate the probability of both happening at the same time, and finally, you add all these results for all possible values of X (integrate, in case of continuous functions) to take care of all possible cases: and you end up with the probability of item 2 winning, period.

The common generic representation is a formula

Sum(P(A)*P(B|A))

(the probability of A happening, times the probability of B happening provided A did happen, summed over all possible cases for A) which is, applied here:

Sum(P(x1 between X and X+dx)*P(x2 < X))

To work. My first thought was to just multiply (or divide) rand() by its weight:

$rand[$i] = $weight[$i]*rand;

Surely the result would result in a weighted probability, only, I had no idea what the relation is to the weight factors. With just 2 items, the formula is equivalent to:

Item 1 wins if $rand[1], which is the same result I would get with

$k = $weight[2]/$weight[1];
$rand[1] = rand;
$rand[2] = $k*rand;

With k >= 1 (if this isn't the case, just swap the two items), with X=x1, the chance that x2<X is X/k. Eventually, this leads to a probability of item 2 winning of X/(2*k). This doesn't feel right, with a factor k=2 item 2 has a probability 1/4 of winning, and item 1 has 3/4. That's a ratio of 1/3, not 1/2. Experiments confirm this result.

So let's try again for k<=1. There's an asymmetry for the value of k above or below 1, and the reason for this asymmetry is clipping: with k<1, there's a threshold for $rand[1] above which item 2 cannot possibly lose, and that threshold value is k. So the probability of item 2 winning for a value x for $rand[1] is broken into 2 pieces:

x/k    for 0 <= x <= k
1      for k <= x < 1

Integration over the range 0 to 1 for x yields k**2/(2*k)+(1-k) = 1-k/2. (for k=1/2, this is 3/4)

Time to confirm the results:

use Math::Random::MT qw(rand);
my $k = 2;
my @n = ( 0 , 0 );
for my $i (1 .. 1E5) {
    my @rand;
    $rand[0] = rand;
    $rand[1] = $k*rand;
    $n[index_with_min_value(@rand)]++;
}
use Data::Dumper;
print Dumper \@n;
 
sub index_with_min_value {
    # what index has the lowest value in a list?
    my $ix = 0;
    my $min = $_[0];
    for my $i (1 .. $#_) {
        if($_[$i] < $min) {
            $ix = $i;
            $min = $_[$i];
        }
    }
    return $ix;
}

The result of a test run is:

$VAR1 = [
          75088,
          24912
        ];

I'm using a better random number generator than the one that comes with Perl. The standard random number generator in ActivePerl/Win32 has a meager 15 bits resolution, so you get at most 32000 something different values, and it has a repetition period in the same area. Thus for the above experiment, you'd go through the sequence of all the possible random values 3 times. The generator I chose, an implementation of Mersenne Twister, has a period of 4.315425E6001 which is like forever, and it has (in this implementation) a resolution of 32 bits. By just importing the function rand, it can replace the built-in rand. So I'm hoping it is better. :)

The results are very skewed in favor of the values with the smaller factor: a sample run with 3 items with weight factors (1, 2, 3) produced the result:

$VAR1 = [
          63890,
          22216,
          13894
        ];

Thus item 3 with a weight factor that's 3 times larger, has about a 5 times smaller likeliness of winning.

So I was thinking, what if I picked a distribution that is free of clipping, one that has no upper limit? A nice candidate distribution is the exponentially damped function, a function that you can see a lot in nature:

P[X > x] = exp(-x)

with X > 0 and x > 0, but with no upper bound.

This function's graph looks pretty much alike independent of the actual value of x: P just scales when x shifts.

So I wanted to compare results for probability distributions of exp(-x) for item 1, with exp(-k*x) for item 2. (This time, I decided to let the larger value for x win, because P[x < X] doesn't look as nice as a function.)

The probability for x1 (value for item 1) to be between x and x+dx is exp(-x)*dx. The probability of item 2 beating that (having a bigger value) is exp(-k*x). (A larger k results in a faster decaying probability function.)

Integration of ∫ exp(-k*x)*exp(-x)*dx [0 ∞] yields the value 1/(1+k).

That is nice: with k=2 I get 1/3, and its counterpart, with k=0.5, gets 1/(1.5) = 2/3. Those are the numbers I'm after, as their ratio is 2! But do note that the larger factor yields a smaller probability, so you have to divide by the weight, not multiply.

So how can you get a distribution that's exponential, out of a plain uniform distribution? By using a transforming function: x=fn(rand). With $y=rand; that is plainly following a uniform distribution between 0 and 1, and the required equivalence $y==exp(-$x), we end up with the transforming function $x=-log($y). Since the probability that $y<Y (with Y between 0 and 1) is Y, the probability that ($x=-log($y)) > (X=-log(Y)) is Y == exp(-X). (Note the swapping of the directions: a smaller $y yields a larger $x.) So this is exactly the result I am after, with just one nitpick: the border cases. rand can become zero, which is a value that log() chokes on; but it'll never be exactly 1, which is a value that log() doesn't mind. So I'm reversing the direction for y, and replace -log(rand) with -log(1-rand). For the rest, nothing changes, as the probability for in between values remains unchanged.

That's enough theory, so let's confirm these results through experimenting. As item 2 has a weight of 2, and item 3 a weight of 3:

use Math::Random::MT qw(rand);
my @w = (1, 2, 3);
my @n = (0, 0, 0);
for my $i (1 .. 1E5) {
    my @x;
    $x[$_] = -log(1 - rand)/$w[$_] for 0 .. $#w;
    $n[index_with_min_value(@x)]++;
}
use Data::Dumper;
print Dumper \@n;
 
sub index_with_min_value {
    # what index has the lowest value in a list?
    my $ix = 0;
    my $min = $_[0];
    for my $i (1 .. $#_) {
        if($_[$i] < $min) {
            $ix = $i;
            $min = $_[$i];
        }
    }
    return $ix;
}

Results:

$VAR1 = [
          16621,
          33235,
          50144
        ];

Bingo. As you can see, the second item has twice the chance of winning than the first item, with a weighing factor of 2 vs. 1; and the third item has 3/2 times the chance of winning compared to the second one, with weighing factors 3 vs. 2. It's just perfect.

And I'm sure you can use this safely in (Oracle) SQL.