Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

Shlomi Fish (918)

Shlomi Fish
  shlomif@iglu.org.il
http://www.shlomifish.org/
AOL IM: ShlomiFish (Add Buddy, Send Message)
Yahoo! ID: shlomif2 (Add User, Send Message)
Jabber: ShlomiFish@jabber.org

I'm a hacker of Perl, C, Shell, and occasionally other languages. Perl is my favourite language by far. I'm a member of the Israeli Perl Mongers, and contribute to and advocate open-source technologies. Technorati Profile [technorati.com]

Journal of Shlomi Fish (918)

Thursday April 03, 2008
03:15 PM

How to Best Process the Directory Components of a Path

This is cross-posted here from Israel.pm where I have yet to receive an answer.

I'm trying to process the directory components of a path (as an array) so that:

  1. It will be portable. (Work on Unix, Windows, VMS, etc.)
  2. It will keep the rest of the path components (if any) identical.
  3. It will work on both relative and absolute paths.

If the processing is to keep only the directories after "long-dir" then:

UNIX : /hello/there/long-dir/another/myfile.txt ==> another/myfile.txt
DOS : C:\Hello\There\Long-Dir\Another\myfile.txt ==> another\myfile.txt

UNIX: ./hi/long-dir/another/myfile.txt ==> another/myfile.txt
DOS: .\hi\long-dir\another\myfile.txt ==> another\myfile.txt

To do this I turned to File::Spec and File::Basename and wrote the following code which seems insanely complicated. I marked the place where I do the actual processing using a callback:

use File::Spec;
use File::Basename;

sub _process_filename_dirs
{
    my ($self, $fn, $callback) = @_;

    my $basename = basename($fn);
    my $dirpath  = dirname($fn);

    my ($volume, $directories, $filename) = File::Spec->splitpath($dirpath,
1);

    # The actual manipulation.
    my $dirs = $callback->([File::Spec->splitdir($directories)]);

    my $final_dir =
        File::Spec->catpath(
            $volume, File::Spec->catdir(@$dirs), $filename
        );

    if ($final_dir eq "")
    {
        return $basename;
    }
    else
    {
        return File::Spec->catfile(
            $final_dir, $basename
        );
    }
}

And so far I checked it works only on UNIXes (Linux in my case) and on relative paths.

So my questions are:

  1. Is there a simpler way to do it?
  2. Do Path-Class or File-Fu or a different abstraction provide an easier way to do it?
  3. Is it still buggy?

I should note that this hairiness is not limited to Perl. Common Lisp has a built-in portable path-manipulation abstraction that's also relatively complicated. See the "File and File I/O Chapter" and the a Portable Pathname library chapter

Monday March 17, 2008
04:14 PM

Has-a as Is-a

Recently I've encountered a modularity issue in my code, I had a function like the following

sub _is_event_pass
{
        return ($self->_event->is_ok() ||
                $self->_event->is_skip() ||
                $self->_event->is_todo()
               );
}

As you can see all I'm doing is calling methods on the _event. The right thing to do would have been to move it as method to the class of the _event() that will then use the object's instance itself. Now the problem is that the _event() field can be any of the TAP::Result:: hierarchy of classes

And it wouldn't be a good idea to sub-class and re-bless all of them.

So what to do?

What I eventually did is create an EventWrapper class, that has a field which is the actual object. Then I'm delegating all the methods of the TAP::Result classes that I use to that field. I.e:

sub is_ok
{
        my $self = shift;

        return $self->_tp_result()->is_ok();
}

sub is_todo
{
        my $self = shift;

        return $self->_tp_result()->is_todo();
}

(only I'm auto-generating these methods of-course).

And then I defined the is_pass function there like this:

sub is_pass
{
        my $self = shift;

        return ($self->is_ok() || $self->is_todo() || $self->is_skip());
}

Which works because these methods are delegated.

So ::EventWrapper behaves like TAP::Result ("is-a") while actually only containing it ("has-a"). It's a useful technique.

Of course, I made a good use of the fact that Perl is dynamically-typed and evaluates methods at run-time. If I wanted to do the same in strongly-typed OO languages, then I would have needed to figure out a way to delegate to all the methods of the various different classes in the hiearachy. Perhaps using run-time classes.

Friday March 07, 2008
06:20 AM

Perl-IL Meeting on 16 March 2008

The Israeli Perl Mongers will hold their first meeting for this year on Sunday, 16 March, 2008at 18:30, in Screiber 008 in Tel Aviv University.

The meeting agenda is not final, but includes a presentation by Ran Eilam on "Config::* - The Alenby St. of CPAN", and a fallback "There are too many ways to do it" presentation.

Wednesday February 06, 2008
03:05 PM

Continuation of the IO-Socket-INET6 Story

As a continuation to my previous entry about IO-Socket-INET6, I'd like to note that I received several responses to my message about resuming its maintenance. Eventually, the original author replied too and agreed to give me a co-maintainer status on PAUSE.

So I wrapped things up and uploaded a new version of IO-Socket-INET6 to the CPAN, and it indeed got indexed properly. I already received several error reports from CPAN testers, but I'm not sure I can do anything about them, because the end-hosts don't seem to support IPv6 well.

Next I'd like to increase the module's kwalitee and to look at bug reports.

Now I can be proud that there's now a small part of me in SpamAssassin.

Friday January 18, 2008
11:53 AM

Imported Symbols Trouble

After I udpated my Mandriva Cooker system a few days ago (after the perl package there had been upgraded to 5.10.0), I noticed that SpamAssassin's "spamd" and "spamassassin" started generating the following warnings:

Constant subroutine IO::Socket::INET6::AF_INET6 redefined at
/usr/lib/perl5/5.10.0/Exporter.pm line 66.
at /usr/lib/perl5/vendor_perl/5.8.8/IO/Socket/INET6.pm line 16
Prototype mismatch: sub IO::Socket::INET6::AF_INET6 () vs none at
/usr/lib/perl5/5.10.0/Exporter.pm line 66.
at /usr/lib/perl5/vendor_perl/5.8.8/IO/Socket/INET6.pm line 16
Constant subroutine IO::Socket::INET6::PF_INET6 redefined at
/usr/lib/perl5/5.10.0/Exporter.pm line 66.
at /usr/lib/perl5/vendor_perl/5.8.8/IO/Socket/INET6.pm line 16
Prototype mismatch: sub IO::Socket::INET6::PF_INET6 () vs none at
/usr/lib/perl5/5.10.0/Exporter.pm line 66.
at /usr/lib/perl5/vendor_perl/5.8.8/IO/Socket/INET6.pm line 16

After investigating, I found out that there was a "use Socket6;" call there, which imported the AF_INET6 and PF_INET6 symbols, which were already imported into the IO::Socket::INET6 package by previous Socket and IO::Socket calls.

In order to get rid of the warnings, I eventually decided to only selectively import symbols from Socket6, and imported the necessary symbols minus AF_INET6 and PF_INET6. In order to get IO::Socket::INET6 to pass its tests, I had to fix its tests, which had been broken before any modifications to it started. LeoNerd helped me with that, so thanks! Then I submitted a modified SRPM to Mandriva with a patch and went to sleep happy.

Today, when I woke up and checked my email, I was surprised to discover that about 20 spam mails landed at my inbox. As it turned out, they didn't have the SpamAssassin headers. Running spamc from the command line, yielded an unmodified message, while the standalone spamassassin program worked perfectly. So I started to investigate why spamc and spamd failed.

I ran into some trouble trying to point spamd to a modified IO::Socket::INET6 module, because it was tainted, and won't read the PERL5LIB environment variable (it's documented in perlrun. I had to modify the sources and use "use lib".

I also had a lot of trouble trying to see why IO::Socket::INET6 fails and where. This eventually was resolved by running spamd like this:

perl  -T blib/script/spamd -c -m5 -H --syslog="stderr info"

Then I could see the error clearly. Apparently I missed importing inet_ntop() from the Socket6 module, which was called by some functions that were not ran in the unit tests, and so it was only detected when spamd ran. This is one disadvantage of dynamic, symbolic languages such as Perl, and can be prevented in statically compiled languages such as C and as far as I know, Java as well.

Well, but since the code in question was Perl, I just added tests for the functions calling inet_ntop() and added it to the imports. After installing the modified module, spamd worked again, and I submitted a new patch for inclusion into Mandriva. So the enhancements now are that the warnings have been removed, the tests passing and the test suite made more robust.

As these modifications, may be useful for other platforms besides Mandriva, I sent a message to the maintainer and other designated parties regarding resuming the maintenance of IO::Socket::INET6.

Some other lessons from this:

  1. use MyModule () is your friend. use MyModule; and then using the symbols that are exported by default is more evil.
  2. If you can use the symbols from the original package. If you can't, import them selectively.
  3. Make sure you have a comprehensive test suite with good (preferabaly 100%) test coverage. Test-Driven Development is your friend.

Right now I'm cranky and happy at the same time, because I've spent half-a-day on this bug.

11:14 AM

Transcription of the Perlcast Interview with Tom Limoncelli

I'd like to announce that I finished transcribing the Perlcast Interview with Tom Limoncelli, with some help in correcting mis-transcribed phrases from "fax". There are still some phrases I'm uncertain of in the document, which are marked with "[?]" or "FILL IN" (and I could use some help in correcting), but as a general rule, the transcript should be usable.

I've sent an email about it to Josh McAdams (who publishes Perlcast), but he didn't return to me yet. But you may still enjoy the transcript as it is on the perl.net.au wiki.

On a similar note, I was referred to this talk about "Getting things done" (which I did not watch yet) by a friend, and "Getting Things Done" is mentioned in the Tom Limoncelli interview.

Tuesday January 01, 2008
01:20 PM

Line Count Benchmark

OK, first of all, Happy New (Civil) Year to everybody. Then, I'd like to note that I enjoyed the Israeli 2007 Perl Workshop that I attended yesterday a lot, and would like to thank all the organisers for making it happen. I posted some notes from topics we discussed in the conference to the mailing list, so you may find it interest to read them. I may post a more thorough report later on.

Now, to the main topic of this post. I've been on Freenode's #perl the other day, when we were discussing how to count the number of lines in a file. Someone suggested opening the files, and then using <$fh> and counting the number of lines. Someone else suggested trapping the output of wc -l. Then someone argued that trapping the output of wc -l is non-portable and will cost one in a costy fork. But is it slower?

To check, I created a very large text file using the following command:

locate .xml | grep '^/home/shlomi/Backup/Backup/2007/2007-12-07/disk-fs' | \
xargs cat > mega.xml

Here, I located all the files ending with .xml in my backup and concatenated them together into a file "mega.xml". The statistics for this file are:

$ LC_ALL=C wc mega.xml
195594 1704386 17790746 mega.xml

Then I ran the following benchmark using it:

#!/usr/bin/perl

use strict;
use warnings;

use Benchmark ':hireswallclock';

sub wc_count
{
my $s = `wc -l mega.xml`;
$s =~ /^(\d+)/;
return $1;
}

sub lo_count
{
open my $in, "<", "mega.xml";
local $.;
while(<$in>)
{
}
my $ret = $.;
close($in);
return $ret;
}

if (lo_count() != wc_count())
{
die "Error";
}

timethese(100,
{
'wc' => \&wc_count,
'lo' => \&lo_count,
}
);

The results?

shlomi:~/Download$ perl ../time-various-line-counts.pl
Benchmark: timing 100 iterations of lo, wc...
lo: 18.0495 wallclock secs (16.72 usr + 1.17 sys = 17.89 CPU) @ 5.59/s (n=100)
wc: 3.70755 wallclock secs ( 0.00 usr 0.03 sys + 1.77 cusr 1.91 csys = 3.71 CPU) @ 3333.33/s (n=100)

The wc method wins and is substantially faster. It's probably because wc is written in optimised C, and so counts the lines faster, despite the fact it had forked earlier.

For small files, the pure-Perl version wins. But for large files, wc is better. But naturally, it's not portable, which may be a deal-breaker in some cases.

The lesson of this is that forking processes or calling external is sometimes a reasonable thing to do. (as MJD noted earlier in the link).

Wednesday December 26, 2007
12:12 PM

Israeli Perl Workshop Next Week

This is a reminder that Israeli Perl Workshop for 2007 will take place next week on Monday, 31 Decemeber 2007. If you're planning to attend, please register at the site, and make the payment.

Friday December 21, 2007
08:29 AM

Making Websites Behave using Perl - The yjobs-proxy Story

Here is another cool use for Perl: pre-processing the HTML/JS/CSS markup of poorly-written sites before it reaches the web browser. In this post, I'll tell the story of how I ended up writing the yjobs-proxy markup-transforming proxy using CPAN's HTTP-Proxy to make www.yjobs.co.il work with Firefox on my Linux system.

It all started when I was job-hunting, and was dismayed to discover that there were much fewer Info-Tech job ads in the newspaper's "Wanted Ads" section than there used to be. The section proudly announced that now it has an Internet counterpart - www.yjobs.co.il. But much to my disappointment, it didn't work in my Linux-based, open-source browsers.

I almost immediately thought of writing a Greasemonkey script to whip the JavaScript code there into a shape where it can work with Firefox. Eventually, I started writing it, and looked for a way to inject new declarations of JavaScript functions into the page, to replace the existing and broken ones. I found a way to do that, but it turned out to have some limitations due to the architecture of Greasemonkey and the way it interacts with the page.

After thinking about it for a moment, I realised I could achieve the same thing by transforming the code that Firefox receives from the site into a more agreeable version. So I thought of a transforming proxy. Someone here on use.perl.org mentioned HTTP::Proxy in one of his posts, so I went to check it out and see if it can solve my problems.

Meanwhile, I was distracted and delayed a bit by investigating this X Server bug. But then I resumed to work on the proxy. HTTP-Proxy turned out to be a great way to implement what I had in mind, but I still ran into a few problems. (Which weren't HTTP-Proxy's fault.).

The first one from what I recall was that it refused to filter JavaScript code. As it turned out yjobs sent the "Content-Type:" of the JavaScript code either as "application/x-javascript" or an undefined one, while I used "text/javascript". I ended up filtering them by the .js extension in the path, and by specifying a mime filter of "undef".

Then I ran into a problem where a variable called "Data" was assigned to, but not used anywhere else. As it turned out, my logging proxy, which I used to dump all the traffic, did not log the particular script where it was made use of. Maybe Firefox cached it. After that, I found out where it was used and used the Venkman JavaScript debugger to the problem I had getting it displayed on the page. It was fixed using a JavaScript transformation specific to that particular script.

Another problem I encountered was an original function was called despite the fact I overrided it in the bottom. As it turned out, this was caused because it was invoked before the JS interpreter reached the definition at the end. Like this code:

<html>
<body>
<script>
function mytest()
{
return "FirstFoo";
}

var myvar = mytest();
</script>

<h1 id="put_here">Put Here</h1>

<script>
document.getElementById("put_here").innerHTML = myvar;
</script>

<script>
function mytest()
{
return "Second";
}
</script>
</body>
</html>

This was resolved by transforming the JS code in the original function.

Eventually, I got it working enough. Then I cleaned up the proxy code, and released it for the world's consumption.

My future plans for this proxy, is to investigate a way to implement it as a Firefox extension that will be transform the markup from within Firefox.

A fellow Perl programmer I talked with on AIM that I pointed to the download page, said that "that's nucking futs, man" and then that "oh, it's cool. I just mean, that's pretty crazy. A proxy to make a site work... crazy. and awesome.". :-)

So this is one way Perl has given Power to the People. Hack on!

Tuesday December 18, 2007
03:24 AM

Happy 20th Birthday, Perl!

Today, Perl has its 20th Birthday. I'd like to thank Larry Wall and the rest of the perl and CPAN developers for actively developing such wonderful technology. Happy Birthday!