This is cross-posted here from Israel.pm where I have yet to receive an answer.
I'm trying to process the directory components of a path (as an array) so that:
If the processing is to keep only the directories after "long-dir" then:
UNIX :
/hello/there/long-dir/another/myfile.txt ==> another/myfile.txt
DOS : C:\Hello\There\Long-Dir\Another\myfile.txt ==> another\myfile.txt
UNIX:./hi/long-dir/another/myfile.txt ==> another/myfile.txt
DOS:.\hi\long-dir\another\myfile.txt ==> another\myfile.txt
To do this I turned to File::Spec and File::Basename and wrote the following code which seems insanely complicated. I marked the place where I do the actual processing using a callback:
use File::Spec;
use File::Basename;
sub _process_filename_dirs
{
my ($self, $fn, $callback) = @_;
my $basename = basename($fn);
my $dirpath = dirname($fn);
my ($volume, $directories, $filename) = File::Spec->splitpath($dirpath,
1);
# The actual manipulation.
my $dirs = $callback->([File::Spec->splitdir($directories)]);
my $final_dir =
File::Spec->catpath(
$volume, File::Spec->catdir(@$dirs), $filename
);
if ($final_dir eq "")
{
return $basename;
}
else
{
return File::Spec->catfile(
$final_dir, $basename
);
}
}
And so far I checked it works only on UNIXes (Linux in my case) and on relative paths.
So my questions are:
I should note that this hairiness is not limited to Perl. Common Lisp has a built-in portable path-manipulation abstraction that's also relatively complicated. See the "File and File I/O Chapter" and the a Portable Pathname library chapter
Recently I've encountered a modularity issue in my code, I had a function like the following
sub _is_event_pass
{
return ($self->_event->is_ok() ||
$self->_event->is_skip() ||
$self->_event->is_todo()
);
}
As you can see all I'm doing is calling methods on the _event. The right thing to do would have been to move it as method to the class of the _event() that will then use the object's instance itself. Now the problem is that the _event() field can be any of the TAP::Result:: hierarchy of classes
And it wouldn't be a good idea to sub-class and re-bless all of them.
So what to do?
What I eventually did is create an EventWrapper class, that has a field which is the actual object. Then I'm delegating all the methods of the TAP::Result classes that I use to that field. I.e:
sub is_ok
{
my $self = shift;
return $self->_tp_result()->is_ok();
}
sub is_todo
{
my $self = shift;
return $self->_tp_result()->is_todo();
}
(only I'm auto-generating these methods of-course).
And then I defined the is_pass function there like this:
sub is_pass
{
my $self = shift;
return ($self->is_ok() || $self->is_todo() || $self->is_skip());
}
Which works because these methods are delegated.
So
Of course, I made a good use of the fact that Perl is dynamically-typed and evaluates methods at run-time. If I wanted to do the same in strongly-typed OO languages, then I would have needed to figure out a way to delegate to all the methods of the various different classes in the hiearachy. Perhaps using run-time classes.
The Israeli Perl Mongers will hold their first meeting for this year on Sunday, 16 March, 2008at 18:30, in Screiber 008 in Tel Aviv University.
The meeting agenda is not final, but includes a presentation by Ran Eilam on "Config::* - The Alenby St. of CPAN", and a fallback "There are too many ways to do it" presentation.
As a continuation to my previous entry about IO-Socket-INET6, I'd like to note that I received several responses to my message about resuming its maintenance. Eventually, the original author replied too and agreed to give me a co-maintainer status on PAUSE.
So I wrapped things up and uploaded a new version of IO-Socket-INET6 to the CPAN, and it indeed got indexed properly. I already received several error reports from CPAN testers, but I'm not sure I can do anything about them, because the end-hosts don't seem to support IPv6 well.
Next I'd like to increase the module's kwalitee and to look at bug reports.
Now I can be proud that there's now a small part of me in SpamAssassin.
After I udpated my Mandriva Cooker system a few days ago (after the perl package there had been upgraded to 5.10.0), I noticed that SpamAssassin's "spamd" and "spamassassin" started generating the following warnings:
Constant subroutine IO::Socket::INET6::AF_INET6 redefined at
/usr/lib/perl5/5.10.0/Exporter.pm line 66.
at/usr/lib/perl5/vendor_perl/5.8.8/IO/Socket/INET6.pm line 16
Prototype mismatch: sub IO::Socket::INET6::AF_INET6 () vs none at
/usr/lib/perl5/5.10.0/Exporter.pm line 66.
at/usr/lib/perl5/vendor_perl/5.8.8/IO/Socket/INET6.pm line 16
Constant subroutine IO::Socket::INET6::PF_INET6 redefined at
/usr/lib/perl5/5.10.0/Exporter.pm line 66.
at/usr/lib/perl5/vendor_perl/5.8.8/IO/Socket/INET6.pm line 16
Prototype mismatch: sub IO::Socket::INET6::PF_INET6 () vs none at
/usr/lib/perl5/5.10.0/Exporter.pm line 66.
at/usr/lib/perl5/vendor_perl/5.8.8/IO/Socket/INET6.pm line 16
After investigating, I found out that there was a "use Socket6;" call there, which imported the AF_INET6 and PF_INET6 symbols, which were already imported into the IO::Socket::INET6 package by previous Socket and IO::Socket calls.
In order to get rid of the warnings, I eventually decided to only selectively import symbols from Socket6, and imported the necessary symbols minus AF_INET6 and PF_INET6. In order to get IO::Socket::INET6 to pass its tests, I had to fix its tests, which had been broken before any modifications to it started. LeoNerd helped me with that, so thanks! Then I submitted a modified SRPM to Mandriva with a patch and went to sleep happy.
Today, when I woke up and checked my email, I was surprised to discover that about 20 spam mails landed at my inbox. As it turned out, they didn't have the SpamAssassin headers. Running spamc from the command line, yielded an unmodified message, while the standalone spamassassin program worked perfectly. So I started to investigate why spamc and spamd failed.
I ran into some trouble trying to point spamd to a modified IO::Socket::INET6 module, because it was tainted, and won't read the PERL5LIB environment variable (it's documented in perlrun. I had to modify the sources and use "use lib".
I also had a lot of trouble trying to see why IO::Socket::INET6 fails and where. This eventually was resolved by running spamd like this:
perl -T blib/script/spamd -c -m5 -H --syslog="stderr info"
Then I could see the error clearly. Apparently I missed importing inet_ntop() from the Socket6 module, which was called by some functions that were not ran in the unit tests, and so it was only detected when spamd ran. This is one disadvantage of dynamic, symbolic languages such as Perl, and can be prevented in statically compiled languages such as C and as far as I know, Java as well.
Well, but since the code in question was Perl, I just added tests for the functions calling inet_ntop() and added it to the imports. After installing the modified module, spamd worked again, and I submitted a new patch for inclusion into Mandriva. So the enhancements now are that the warnings have been removed, the tests passing and the test suite made more robust.
As these modifications, may be useful for other platforms besides Mandriva, I sent a message to the maintainer and other designated parties regarding resuming the maintenance of IO::Socket::INET6.
Some other lessons from this:
Right now I'm cranky and happy at the same time, because I've spent half-a-day on this bug.
I'd like to announce that I finished transcribing the Perlcast Interview with Tom Limoncelli, with some help in correcting mis-transcribed phrases from "fax". There are still some phrases I'm uncertain of in the document, which are marked with "[?]" or "FILL IN" (and I could use some help in correcting), but as a general rule, the transcript should be usable.
I've sent an email about it to Josh McAdams (who publishes Perlcast), but he didn't return to me yet. But you may still enjoy the transcript as it is on the perl.net.au wiki.
On a similar note, I was referred to this talk about "Getting things done" (which I did not watch yet) by a friend, and "Getting Things Done" is mentioned in the Tom Limoncelli interview.
OK, first of all, Happy New (Civil) Year to everybody. Then, I'd like to note that I enjoyed the Israeli 2007 Perl Workshop that I attended yesterday a lot, and would like to thank all the organisers for making it happen. I posted some notes from topics we discussed in the conference to the mailing list, so you may find it interest to read them. I may post a more thorough report later on.
Now, to the main topic of this post. I've been on Freenode's #perl the other day, when we were discussing how to count the number of lines in a file. Someone suggested opening the files, and then using <$fh> and counting the number of lines. Someone else suggested trapping the output of wc -l. Then someone argued that trapping the output of wc -l is non-portable and will cost one in a costy fork. But is it slower?
To check, I created a very large text file using the following command:
locate
xargs cat > mega.xml
Here, I located all the files ending with
$ LC_ALL=C wc mega.xml
195594 1704386 17790746 mega.xml
Then I ran the following benchmark using it:
#!/usr/bin/perl
use strict;
use warnings;
use Benchmark ':hireswallclock';
sub wc_count
{
my $s = `wc -l mega.xml`;
$s =~
return $1;
}
sub lo_count
{
open my $in, "<", "mega.xml";
local $.;
while(<$in>)
{
}
my $ret = $.;
close($in);
return $ret;
}
if (lo_count() != wc_count())
{
die "Error";
}
timethese(100,
{
'wc' => \&wc_count,
'lo' => \&lo_count,
}
);
The results?
shlomi:~/Download$ perl
Benchmark: timing 100 iterations of lo, wc...
lo: 18.0495 wallclock secs (16.72 usr + 1.17 sys = 17.89 CPU) @ 5.59/s (n=100)
wc: 3.70755 wallclock secs ( 0.00 usr 0.03 sys + 1.77 cusr 1.91 csys = 3.71 CPU) @ 3333.33/s (n=100)
The wc method wins and is substantially faster. It's probably because wc is written in optimised C, and so counts the lines faster, despite the fact it had forked earlier.
For small files, the pure-Perl version wins. But for large files, wc is better. But naturally, it's not portable, which may be a deal-breaker in some cases.
The lesson of this is that forking processes or calling external is sometimes a reasonable thing to do. (as MJD noted earlier in the link).
This is a reminder that Israeli Perl Workshop for 2007 will take place next week on Monday, 31 Decemeber 2007. If you're planning to attend, please register at the site, and make the payment.
Here is another cool use for Perl: pre-processing the HTML/JS/CSS markup of poorly-written sites before it reaches the web browser. In this post, I'll tell the story of how I ended up writing the yjobs-proxy markup-transforming proxy using CPAN's HTTP-Proxy to make www.yjobs.co.il work with Firefox on my Linux system.
It all started when I was job-hunting, and was dismayed to discover that there were much fewer Info-Tech job ads in the newspaper's "Wanted Ads" section than there used to be. The section proudly announced that now it has an Internet counterpart - www.yjobs.co.il. But much to my disappointment, it didn't work in my Linux-based, open-source browsers.
I almost immediately thought of writing a Greasemonkey script to whip the JavaScript code there into a shape where it can work with Firefox. Eventually, I started writing it, and looked for a way to inject new declarations of JavaScript functions into the page, to replace the existing and broken ones. I found a way to do that, but it turned out to have some limitations due to the architecture of Greasemonkey and the way it interacts with the page.
After thinking about it for a moment, I realised I could achieve the same thing by transforming the code that Firefox receives from the site into a more agreeable version. So I thought of a transforming proxy. Someone here on use.perl.org mentioned HTTP::Proxy in one of his posts, so I went to check it out and see if it can solve my problems.
Meanwhile, I was distracted and delayed a bit by investigating this X Server bug. But then I resumed to work on the proxy. HTTP-Proxy turned out to be a great way to implement what I had in mind, but I still ran into a few problems. (Which weren't HTTP-Proxy's fault.).
The first one from what I recall was that it refused to filter JavaScript code.
As it turned out yjobs sent the "Content-Type:" of the JavaScript code either
as "application/x-javascript" or an undefined one, while I used
"text/javascript". I ended up filtering them by the
Then I ran into a problem where a variable called "Data" was assigned to, but not used anywhere else. As it turned out, my logging proxy, which I used to dump all the traffic, did not log the particular script where it was made use of. Maybe Firefox cached it. After that, I found out where it was used and used the Venkman JavaScript debugger to the problem I had getting it displayed on the page. It was fixed using a JavaScript transformation specific to that particular script.
Another problem I encountered was an original function was called despite the fact I overrided it in the bottom. As it turned out, this was caused because it was invoked before the JS interpreter reached the definition at the end. Like this code:
<html>
<body>
<script>
function mytest()
{
return "FirstFoo";
}
var myvar = mytest();
</script>
<h1 id="put_here">Put Here</h1>
<script>
document.getElementById("put_here").innerHTML = myvar;
</script>
<script>
function mytest()
{
return "Second";
}
</script>
</body>
</html>
This was resolved by transforming the JS code in the original function.
Eventually, I got it working enough. Then I cleaned up the proxy code, and released it for the world's consumption.
My future plans for this proxy, is to investigate a way to implement it as a Firefox extension that will be transform the markup from within Firefox.
A fellow Perl programmer I talked with on AIM that I pointed to the download
page, said that "that's nucking futs, man" and then that "oh, it's cool. I just
mean, that's pretty crazy. A proxy to make a site work... crazy. and
awesome.".
So this is one way Perl has given Power to the People. Hack on!
Today, Perl has its 20th Birthday. I'd like to thank Larry Wall and the rest of the perl and CPAN developers for actively developing such wonderful technology. Happy Birthday!