Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

tsee (4409)

tsee
  reversethis-{gro.napc} {ta} {relleums}
http://steffen-mueller.net/

You can find most of my Open Source Perl software in my CPAN directory [cpan.org].

Journal of tsee (4409)

Sunday August 31, 2008
06:35 AM

Why our involvement in Summer of Code 2008 was a success...

... where "our" means The Perl Foundation organization's, but the subject got too long. Also, keep in mind that this is a very subjective piece of text.

Initially, I was made aware of that there's a Google Summer of Code going on this year again and that there could/should be a Perl-specific organization in it by Eric Wilhelm's short reminder mail to the perl5-porters list. Without re-reading the ensuing thread, I seem to remember that the discussion that started from there was semi-productive and nobody really volunteered. But I guess that's kind of the norm in public discussions.

So Eric just went ahead and pulled it off regardless. He collected proposals, wrote Wiki pages, and announced to other lists. It turned out that collecting proposals was relatively easy. Once he got the word out, the proposals wiki page filled up quite quickly. He went on to talk to the TPF people and the previous GSOC administrators for the TPF. With their help, he tried to pin down what went well and not so well in the past years and devised a scheme which would deal more gracefully with failures similar to those that happened in the past.

As far as I can tell, many problems in the past arose from the fact that central figures either underestimated the huge heap of work involved in managing the GSOC administration or were taken aback by other, non-voluntary obligations.
Essentially, I think, the three most important pieces to that plan for 2008 were

  1. an effective way of recovering from the disappearance of any single person involved,
  2. a good student application template,
  3. an administrator who put it a lot of effort.

What Eric did to reduce the insane work load on the central figure was to add an extra layer of people between the mentors and the admin. He split the organization's involvement in different departments and appointed pilots and backup pilots (see point 1) for each of them. For example, for the Perl 6 branch, Jerry Gay and Will Coleda shared the responsibility. This extra layer of people not only reduced the burden on the admin, they were also chosen to be more experienced with their departments than the admin ever could be. Similarly a backup was sought for each mentor.

Eric et al went on to write the organization's proposal for getting into the program at all. Needless to say, we were accepted, or I wouldn't be writing this. Furthermore, a template for the student applications was prepared. Having this template turned out to be crucial to weed out bad applicants: If they weren't able to read, understand, and answer the questions posed, they likely wouldn't be able to do a GSOC project either. If anything, then because communication is a key ingredient to the success.

Still, guiding the would-be applicants took a lot of effort. This probably was the largest piece of work Eric off-loaded to the departmental pilots as they were listed as the contact persons for applicants.
Next up was finding the right mentors. Here, the Perl6 people really showed their commitment. Finding mentors for the Perl6 and Parrot related projects virtually seemed to take no effort at all.
Generally, there were more than enough mentors available, but getting the names nailed down for some of the propsals took some time.

Once the applications were in shape and ready to be ordered by preference, since we couldn't expect to get them all funded, the most disputatious part of the process started:

Who gets to choose which projects get funding?

Just dwell on that for a bit. It's easy to find good criteria: Probability of success, usefulness to the greatest number of people, probability that the student will stick around after the summer, etc.
But those don't really help much. Depending on who you ask, all three of those criteria could be interpreted in favour or against almost every single application. Needless to say, every
mentor liked his student's application best!
Ideas for a mode of selecting the best applications were batted around IRC for a while and in the end, a somewhat semi-democratic process was chosen.
All mentors got a fixed number of votes they had to spread among the proposals. Given the outcome of that, Eric reserved the right to veto an application
or to move one up/down a bit.

While that may sound arbitrary, it really wasn't. There wasn't a good prediction of the outcome. When every voter has a personal agenda, the democratic process doesn't necessarily produce the
best ranking. However, it was clear that some applications were very good and needed to be included. Having Eric as a fallback fix for this seemed the least problematic solution.
I think this whole selection process was the most fragile step of all, because it had the potential to alienate some of those involved. I guess everybody reading this has an idea of what
confusion, disappointment, and anger do to a volunteer!

We got funding for five projects out of fourteen which (if you ask me) should have been funded. Why not more? Because Google can't fund everything and the core of their algorithm is to spread
the slots according to the number of student applications. Through some extra-haggling, we got a special sixth slot for the Bricolage project.

After the announcement of the accepted projects, a so-called community-bonding period started. During that time, the students were asked to get familiar with
the tools, get to know the people involved, and, if at all possible, make themselves visible in the community.

When the coding period started, things seemed to go reasonably smoothly. Getting reports from the students and mentors was more work that it should have been and Eric, again, had to do a lot of that.
Some students (and mentors) were good at communicating their progress, some weren't. Maybe we should have found a way to make the student's work more visible. What was your impression, dear reader, did you
follow along?

In the end, five out of six projects have been successful. I think that is an extra-ordinarily good result.

Looking towards next year, what needs to be improved over this summer?

Maybe you caught the problem with the master plan I outlined with three bullet points above. Points 1) and 3) contradict each other.
If Eric had disappeared, say, in the student selection process, our involvement in the SOC might have fallen flat on its face. I was told that something
like that had happened before. To my information, Eric put in more than a whole month of full-time work. We can't rely on anybody doing that next year.
So we need to find a way to share more of the burden among more people and make every single person involved non-crucial if at all possible.

Furthermore, I think, we need to increase the size of our student pool. We had many more mentors and projects than applications. We have to find a better way to reach out.
Ideally to universities. This is a bigger problem than just the TPF GSOC involvement, however.
As a minor nit, maybe the project proposals should have been a little more elaborate and glossy. Who wants to rewrite Module::ScanDeps because it's horrid? (That was my proposal, so I can trash-talk it at leasure.) I should have mentioned the shiny glitz of working with the best Perl introspection tool in existence (PPI) or something along those lines. Everybody knows dependency scanning is fun!

Finally, maybe we can find a more efficient application selection scheme. Maybe we can make it so that it isn't necessary than a single person needs to make the final decision.
It's not a fun thing to do.

I doubt many readers are left at this point. Regardless, I'd like to extend my thanks not only to Eric, whose work I have praised enough in the above text,
but also to the mentors for dedicating their spare time and Google for funding the whole program. Most of all, however, I want to congratulate the successful students for
having the stamina to stem their projects. You've done great work and I really hope you'll stay involved.

Thanks for reading.

Steffen

Tuesday July 15, 2008
03:03 PM

YAPC::EU talk on PAR

I'm currently preparing the slides for my "Application deployment and dependency management with PAR" talk at YAPC::EU in Copenhagen.

There's a couple of things on my mind which I want to talk about. However, I realized that my view of what's important and/or interesting may be quite different from what other people think.

Hence, I'd like to give you the opportunity to tell me what you'd care about in the context of the suite of PAR modules. This way, I might get a somewhat better idea of what people other than myself would expect from a talk on the topic. Feel free to post a reply or send me an email (my PAUSE ID is SMUELLER).

Thanks for your input!

Steffen

Friday April 04, 2008
12:42 PM

Class::XSAccessor - A saner, manual AutoXS with more Fast

In the last journal entry, I told a lengthy story about an XS and perl API learning experiment that resulted in a module that scans subroutine opcodes at run-time to find candidates for replacing with an XSUB.

Now, the trouble with that approach is that scanning the op tree for such methods takes a long time. It's a cool idea and maybe even sensible if you have an existing system with a lot of code that is long running and needs to be as fast as possible, but let's face it: No sane person would like to add something as fragile as an op-tree scan to such a system. (Though you have nothing to lose except compilation time.)

So I ripped out the XS code with the cleverish fake currying that generates the getters, put it in its own module, added an implementation of setters (that was really trivial in XS), and uploaded it to CPAN. The result is potentially the fastest accessors you can get for a Perl hash-based object bar hand-rolled XS. And that would only save a C array access, two C struct accesses and perhaps some slight book-keeping.

On a related note, I continued hacking on B::Utils which I used for the original AutoXS hack.

For all readers who'd rather jump from the nearest TV broadcasting tower than look at the modules that start with a B: B::Utils provides convenient tools to inspect the op tree that is generated by the perl compiler. Among those, there is a function called opgrep which - you guessed right - matches conditions against an op tree. These conditions are specified as nested hash structures that resemble the op tree itself.

In the past days, I added a method to the B::OP objects that can dump the op tree as such a pattern for use with op_grep(). This should make it much easier to extend the AutoXS module for more complicated scanning. Additionally, opgrep() can now, while scanning, extract ops from within the op tree that it is traversing. That way, it's no longer necessary to walk the op tree yourself in order to extract information after you've verified that it matches your expectation.

Cheers,
Steffen

Tuesday April 01, 2008
02:32 PM

AutoXS.pm: A learning experiment in XS and perlapi

I'm a relatively long-time reader of the perl5-porters mailing list. Somewhat recently Nicholas Clark posed a few small challenges intended to draw more people into the Perl core development. I thought it was a great idea, but couldn't follow up on it at the time. I said I liked the concept on the #p5p IRC channel and so I thought I should learn a bit more about the Perl core and XS. While not the same, I presume that having strong knowledge about the XS/Perl API would be a jump start to understanding the core.

Skip ahead a few weeks. I have since submitted my thesis, went on vacation, and started a new job. But still no progress on my plan to learn XS. Until yesterday. I was idly playing with the B and B::Utils modules when I had a pretty good idea for an interesting learning and experimentation project: AutoXS.

Essentially, the idea started out with using B to scan a running Perl program for subroutines or methods of a particular type. Typically, the simplest and most recurring methods are accessors for hash-based objects. (Just search CPAN for accessor-generators...) The next step is to replace the identified objects with precompiled XSUBs that accomplish the same task but having been written in C, doing so faster.

For simple accessors, that seems like a simple enough task at first: Write the XS code to access a value in a hash reference which is stored on the stack. Apart from the fact that it took me surprisingly long and a lot of patient help from the friendly people on #p5p to get the XS right (thanks!), this may seem like a simple enough task at first. But where's the hash key coming from? You can't expect the user to pass it in as an argument because that's beside the point. You can't know the key name at XS compile time because that's when the module's built. You currently cannot find the package and name using which the current method/subroutine was called either. So what's the answer? Something like currying. I don't think I need to explain to anyone what that is. But maybe I should mention that it's in C, not Haskell or Perl. C doesn't have currying.

The solution took some time in coming. The XS ALIAS keyword allows for compile time aliases to a single XSUB. The aliases can be distinguished from within the XSUB by means of an int variable whose value can be associated with the aliases. (Bad explanation, I guess, have a look at perlxs for a better one.) This doesn't get us all the way to currying, though. I had a look at the generated C code and realized that I could just write similar code on my own and assign new values of that magical integer to each new alias of the accessor prototype at run time (CHECK time, really, but run time would work, too). Then, all that was left to do was to put the hash key for the new alias into an array indexed with said numbers. Voila - fake currying for that XSUB.

By now, it all actually works. The scanner indentifies quite a few typical read-only accessors. The XSUBs are, according to my crude measurements, between 1.6 and 2.5 times faster than the original accessors. If you're calling those accessor methods in a tight loop, that might actually make a bit of a difference. I wrapped it up in a module, AutoXS, and gave it the best interface ever. That is, none. You just say

use AutoXS::Accessor;

to get the accessor scan for all methods in the current package. More seriously, one could let the user flag eligible methods or even apply the scan globally. But that's not the point. It's just kind of fun that it works at all.

Cheers,
Steffen

Thursday December 27, 2007
12:10 PM

perl 5.12, strict by default

So I did it. I proposed to the perl5-porters that we should enable "use strict" by default for some future code. This may be a little less preposterous than it sounds at first, so please wait with hitting the "reply" button until you read the whole of this.

My proposal basically goes as follows:

  • Add a feature called "strict" to feature.pm.
  • Include that feature into the set of default features which are loaded by "use feature ':5.11'" or even just "use 5.11.0;".
  • Add a special case for the -E switch to perl so strictures aren't enabled by default for one-liners.

I'll include my original rationale here:

Personally, I've always wanted perl to have strictures on by default for my code. I would think that 95% of all code bases which were written in this century and which are of non-negligible size import "strict". I don't use strictures for one-liners, of course, but for anything else it's a must. It seems to me like others have similar views on this. Try posting some code without "use strict" to some newsgroup or forum and ask for help. Make sure not to give out your email address, though.

"use 5.10.0;" already auto-imports feature.pm and loads the 5.10 specific features.

How about having "use 5.11.0;" (or 5.12.0) automatically import strict along with the 5.10+5.11 feature set? Naturally, the -E switch for one-liners should *not* do that.

This would *not* break backwards compatibility. This would not affect one-liners. This would optimize for the common case: If you write enough code to make importing optional features worthwhile, odds are very high you'd be importing "strict" anyway. The 5% who need to disable strictures again can still add a "no strict;" statement.

strictures-correct code has been best-practice for a long time now. Let's make it the default for *new* code.

Just think of strictures as a feature. It just makes sense.

To my surprise, the proposal has been received with very positive feedback. So I wrote the patch.

With some luck, we'll get strictures by default in 5.12! Flame away!

Cheers,
Steffen

Sunday January 14, 2007
10:50 AM

Evil PAR tricks, issue 0: Binary including a binary

In my journal entry from December 22, 2006, I said I'd used a few hacks to package parinstallppdgui into an executable binary using PAR::Packer. I'll explore that further here:

parinstallppdgui is mostly a GUI front-end to parinstallppd. It doesn't use PAR::Dist::InstallPPD internally, but uses the system to run parinstallppd so if the child process bombs out, the GUI process can show a friendly warning to the user (comprised of the STDERR and STDOUT of the child process) instead of crashing. That's all very simple if you have perl on your system, of course.

Now, if you want to package parinstallppdgui or any other Perl script that uses another Perl script into an .exe, you'll quickly find out that without a perl on the target system, these two scripts cannot share the same interpreter. The obvious "solution" is to package both of them up into an .exe separately and ship them both. This has several problems. First, you need to ship two executables instead of one. Second, the first executable won't necessarily know where to look for the second if the user doesn't put them in PATH. Adding a couple of FindBin hacks isn't great at solving this, either! Third, these two executables will have a lot in common - starting with all the core Perl modules.

So I took a slightly better, yet more complicated route. The process is as follows:

  1. Add a special case to the parent's source code for execution from inside a PAR binary:

    if (defined $ENV{PAR_TEMP}) {
            require Config;
            require File::Spec;
            $ENV{PATH} .= (defined $ENV{PATH} ? $Config::Config{path_sep} : '')
                                            . File::Spec->catdir($ENV{PAR_TEMP}, 'inc');
            $ENV{PAR_GLOBAL_TEMP} = $ENV{PAR_TEMP};
    }

    This forces all PAR'd executables that are started by this process to use the same cache area as this process. This might be a dangerous thing to do if both binaries contain different, incompatible versions of some files. But we control everything here, so it should be okay.
  2. Package parinstallppdgui, the parent script:
    pp -o parinstallppdgui.exe parinstallppdgui
  3. Package parinstallppd, the child script:
    pp -o parinstallppd.exe -l expat parinstallppd
    (We include the expat library.)
  4. Remove all files from the child script that are also available from the parent. This makes the child dependent on the parent. Use the pare utility from the PAR::Packer distribution's contrib/ directory for this:
    pare -u parinstallppdgui.exe parinstallppd.exe
  5. Package the parent script again, but this time include the child script as an extra file into the executable:
    pp -o parinstallppdgui.exe -a parinstallppd.exe -l expat parinstallppdgui
  6. Ship the combined binary only.

Cheers,

Steffen

P.S.: A third and even better solution might be to package both of the scripts into the same binary and symlink or copy that binary to parinstallppd.exe and parinstallppdgui.exe and let PAR figure out what to run. This is cool if you have symlinks and sucks if you don't.

Saturday January 06, 2007
08:18 AM

CPANTS Distribution Checker (without PAUSE upload)

So CPANTS was down for a couple of weeks. This morning, I noticed it's back, but so far without the cron job that updated the data nightly.

For me, CPANTS has been a way to check my eighty or so CPAN distributions for packaging shortcomings. Unfortunately, that check happens only after I made the release. I installed Module::CPANTS::Analyse which comes with cpants_lint.pl. Given a distribution file name, it runs the CPANTS metrics against it.

Since Module::CPANTS::Analyse comes with quite a few prerequisites, I set up a simple web service which you can use to check your distributions before uploading. The interface is a little cumbersome, but I think it's worthwhile. Just upload your distribution at http://steffen-mueller.net/cgi-bin/cpants-limbo/check.pl and then visit the result URL given by that script after a few minutes.

The reason for the delay is that I didn't want to install all prerequisites on the hosting account and certainly didn't like running cpants_lint.pl from a CGI process. Thus, my local server fetches distributions-to-scan from there and uploads a result text every three minutes.

Cheers,
Steffen

Friday December 22, 2006
05:19 AM

Windows, the unloved stepchild... (when it comes to Perl)

Perl on Windows has always been a bit of a problematic case. It doesn't usually come with (C) development tools and users are, on average, rather clueless. Additionally, a lot of Perl code is written on *nix-like systems and many authors aren't aware of the portability pitfalls. Interestingly, the most common problem are path separators, which is reasonably simple to work around if you know File::Spec[1] or Path::Class[2].

I have been doing quite a bit of porting work recently and in the process, I have patched many CPAN modules to work on Windows [3]. Barring downright unportable stuff like Linux::*, the toughest problems stem from modules using signals, fork() or some other form of IPC. For example, it's nigh on impossible to make Test::HTTP::Server::Simple work well on Windows because it needs to kill its forked children. That can make the Perl interpreter dump core on Windows.

But I'm digressing. I wasn't going to write about portability but about something else entirely:

Originally, the I-don't-have-a-C-compiler problem on Windows was somewhat solved by ActiveState with the Perl Package Manager (PPM) which comes with their ActivePerl distribution. They provide a variety of CPAN modules in pre-compiled form which you can install using PPM.

I say "somewhat" because that doesn't really satisfy me. Not that they aren't doing swell work, but it's hard for a single organization to keep up as pointed out by Ovid [4]. This is the reason I jumped at the Strawberry Perl project [5] which goes one step further and delivers a working (and free!) C compiler with the Perl distribution.

Strawberry Perl enables me to use the CPAN shell on Windows just as I do on Linux and it works with 90% of all modules. Notable exceptions are those which require external libraries, of course. We are working on solving this issue, but it's not pretty. Until then, it might be tempting to want to use ActiveState's PPM tool to install those problematic modules, but it turns out there is only a very old version of it available outside their Perl distribution. The nice GUI which they introduced in PPM4 isn't available outside ActivePerl at all. It's a valid decision to not make the newer PPM's a free CPAN distribution, of course.

Frankly, this bothered me a little. So I wrote a module to convert PPM packages to PAR distributions [6]. Given the URI of a PPD document, it creates a .par file which you can install using PAR::Dist and without involving any PPM-related software at all. It's really quite simple:

ppd2par --uri http://theoryx5.uwinnipeg.ca/ppms/PDL.ppd -p 5.8

It's still a little more tedious than the nice PPM GUI, of course. So I wrote another tool which automatically appends the installation step [7]:

parinstallppd --uri http://theoryx5.uwinnipeg.ca/ppms/PDL.ppd

In some cases, it's necessary to tell it which implementation (5.6 or 5.8) to use because the repository maintainers do not provide great meta data. But it works fine for me. If you know how much work it can be to compile the PDL module, you will certainly agree! Still, this is not for the faint of heart, so yesterday, I wrote a GUI front-end [8]. It is not a package manager like PPM, but just the beginnings of a simple GUI front-end to the parinstallppd tool.

Since these tools require XML::Parser, which requires expat which is an external dependency, I have built a stand-alone executable from parinstallppdgui and put it on my home page [9]. Getting it to work properly with PAR required some trickery. Perhaps I'll explore that further in a separate journal entry.

So, wrapping it up, you can now enjoy the benefits of a Unixy Perl installation with the added benefit of installing precompiled PPM packages if necessary.

Cheers,
Steffen

  1. http://search.cpan.org/dist/File-Spec
  2. http://search.cpan.org/dist/Path-Class
  3. http://win32.perl.org/wiki/index.php?title=Vanilla_Perl_Problem_Modules
  4. http://perlmonks.org/index.pl?node_id=587162, http://use.perl.org/~Ovid/journal/31899
  5. http://win32.perl.org/wiki/index.php?title=Strawberry_Perl
  6. http://search.cpan.org/dist/PAR-Dist-FromPPD
  7. http://search.cpan.org/dist/PAR-Dist-InstallPPD
  8. http://search.cpan.org/dist/PAR-Dist-InstallPPD-GUI
  9. http://www.steffen-mueller.net/parinstallppd/parinstallppdgui.exe
Sunday December 17, 2006
06:26 AM

Perl Advent Calendar 2006-12-16

I just read today's "Perl Advent Calendar" (http://perladvent.pm.org/2006/16/) entry and found out that in today's entry, Jerrad Pierce introduces one of my modules, Number::WithError (http://search.cpan.org/dist/Number-WithError). Yay!

Now, he certainly has interesting things to say, but to cut down on the dependencies, he suggests commenting out three lines of the module code. Sure, I mean, it's Free Software. You could do that and even release the same module with just this change under a different name! But he goes on to say:

If you'd rather avoid installing the first three, you can do so by commenting out the following lines in v0.06 of Number::WithError with no apparent side-effects

"With no apparent side-effects"?
The test suite is failing all over the place because you have just removed the implementation of the tangent function!

Now, to be fair, he has a point. The dependencies are, at this stage, just there for tangent, which would otherwise be

sub my_tan { sin($_[0]) / cos($_[0]) }
# or rather in the module's context
sub my_tan { CORE::sin($_[0]) / CORE::cos($_[0]) }

Originally, I intended to leverage the implementation of the various less-common trigonometric and hyperbolic functions found in Math::Symbolic and add them to the repertoire of Number::WithError. I haven't done so because I didn't need it at the time. Then I forgot about it. Doh.

If it was just for the implementation of those ten or so functions, it would be questionable whether the dependencies (which are all pure-Perl, at least) are worse than duplicating a couple of functions. But there's more: Gaussian error propagation involves derivatives. Other than that, it's a straightforward formula. Math::Symbolic can compute derivatives for you. Got the idea?

Basically, one would get the implementation of most code in Number::WithError for free by generating the derivatives of the functions using Math::Symbolic and compiling them to Perl subroutines using Math::SymbolicX::Inline during the package BEGIN time. This made particular sense since hand-coding every routine was rather prone to small errors and unless he does the math by hand, it's not necessarily obvious to a programmer reading the code why it does what it does.

Fact is, I didn't do it back then. I don't even remember why. Probably some time constraint or "let's get this working before I make it elegant" thinking. It doesn't matter. I don't think publicly suggesting users to comment out random code is a good idea. I'll forward the bug reports stemming from this if they surface. ;)

Steffen

Thursday December 14, 2006
10:41 AM

What if you find a bug in a module from CPAN?

What do you do if you find a bug in a module from CPAN?

(Supposing it's not your own.)

Personally, I think one should take the following steps:

  • Possibly investigate and come up with a fix if it's not too much to ask.
  • Check the module documentation for suggested methods of bug-reporting and act accordingly.
  • Unless otherwise noted, file a ticket in the RT queue of the module/distribution and take a minute to include all necessary details.

This is really where even most experienced people seem to stop. Unfortunately, that is often not enough. Not all module authors use RT. Some deliberately ignore it, some don't like it, some don't know or understand it. Some have disappeared altogether. If one filed a bug against an actively-maintained module and there was evidence that the author used RT, this would be enough. Otherwise:

  • Keep an eye on the ticket. If it remains untouched for a couple of weeks or months and there are no indications that the distribution is maintained, consider contacting the author/maintainer by other means such as email. Of course, it is important to be tactful. Do not harrass these people. They donated their work and time just as you do.
  • If the author has completely disappeared and the emails bounce, please consider contacting modules@perl.org about it.
  • If it's evident the module is unmaintained, you can then apply your fix (and, if you like, some outstanding tickets) and, with the blessing of modules@perl.org, upload a new distribution to CPAN.

Now, you might say that you don't have the time to do this. Nobody really does. But if you had the time to write a coherent bug report on RT, it would be a waste of your previous effort not to make sure the module actually gets fixed.

Getting in touch with authors can be tedious and time consuming, but most are willing but busy and gladly accept help. If you have a fix, consider donating a couple more minutes to apply it and roll a new distribution.

CPAN is Perl's greatest strength and the abundance and quality of software on CPAN is our best selling point. Let's keep it that way.

Thanks for reading!

Steffen

Update: Now mentioning that it's a good idea to check for the author's preferences concerning bug reports.