Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

Alias (5735)

Alias
  (email not shown publicly)
http://ali.as/

Journal of Alias (5735)

Wednesday June 17, 2009
12:02 AM

Request for Assistance - The most important toolchain bug

[ #39134 ]

Now that my blog is listed on the Ironman planet, Matt Trout has loaned me his chainsaw and suggested that asking for help here is likely to garner a response, because he says so.

So consider this an official request for assistance to help some overloaded developers fix what I consider to be the most important bug in the toolchain right now.

It's a bug in Archive::Extract, and it's probably not that much work, but neither Jos Boumans nor I have the free time right now to fix it.

The bug in question is that when Archive::Extract uses Archive::Tar to unroll a tarball, it uses the wrong API. Instead of using the memory-efficient streamed extraction API to roll the whole tarball out to disk directly, it instead loads the whole thing into memory and unpacks it from there.

It should probably use code similar to the implementation of Ivor's Archive::Tar::Streamed instead.

http://search.cpan.org/~ivorw/Archive-Tar-Streamed-0.03/

This is a big problem because once all the memory inflation and memory copying has happened to allow this loading to, a couple of big pathological distributions on CPAN consume almost the entire 2gig memory limit of the (32-bit) process.

This bug is making the performance of CPAN on Win32 much worse and memory-bloaty, but worse is that it takes CPAN::Mini::Visit over the process limit and crashes it, which also means that this bug is currently blocking work on the GreyPAN scanner experiments (Perl::Metrics2), the META.yml database ORDB::CPANMeta, the permissions-aware replacement for the rather unreliable CPANTS dependency graph (CPANTS::Weight, my unified CPANDB SQLite index, and the sorely-needed accuracy fixes for the Top 100 website.

Improving almost all these things require both accurate and 100% complete coverage of minicpan in order to give answers that are good enough to swap out the original first-generation implementations, and this one relatively approachable bug is preventing the ability to reliably reach 100% coverage.

Because this bug also disproportionately impacts Win32 and is a core module, this bug is also very important for the July release of Strawberry Perl, as well as the Perl 5.10.1 release.

If anyone out there has a few hours to attack this bug and fix it, your efforts will have a huge knock-on effect on the quality of many other parts of the CPAN ecosystem.

If you are able to help us out, you can find Jos (kane), myself (Alias), or other that can point you in the right direction in #toolchain on irc.perl.org.

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • I don't do IRC, so I'm asking here instead. Besides, I gather that this will be interesting information, for other people who are interested in this project.

    How do you propose we fix this?

    • Archive::Tar itself ALREADY contains the required functionality according to Jos, in the ->iter interface.

      What's missing is that Archive::Extract isn't actually using the streaming API, it's using the in-memory API.

      So Archive::Extract needs to recognise when the Archive::Tar version is high enough to support ->iter and then preferentially use that.

      • Urm, yeah, apparently I succeeded in skipping over one of the modules: Archive::Extract [cpan.org]: I immediately forgot about it after I read your post, likely because I think I wouldn't have done it that way.

        Well, anyway, it's clear now what you expect to be done.

  • Hi, I posted a patch in #toolchain that integrates Archive::Tar->iter with tests