Dear CPAN Testers,
As per several long discussion threads on the perl-qa and cpan-testers-discuss mailing lists, and per the plan I posted the other day, today I took steps to upgrade CPAN Testers tools to be less annoying to the CPAN author community:
Released Test::Reporter 1.50 -- authors will no longer be copied on reports (the send() method ignores any arguments) -- this is a key step to move us to a centrally administered "opt-in" notification approach
Released CPAN::Reporter 1.17 -- failures during PL/make stages will be graded UNKNOWN; removed "cc_author" option and related code; added checks to discard failures due to "Build -j3" and "Makefile out of date" errors.
Created a CPANPLUS repository branch -- PL/make failures will be graded UNKNOWN; "dontcc" option has been removed; output buffer will be included with NA reports. Jos Boumans committed to review and incorporate the CPANPLUS changes in the next release.
All testers should upgrade at least Test::Reporter. Users of CPAN::Reporter should upgrade to 1.17.
Please test these out manually before turning them loose with automated smoke testers.
If anyone would like to explore or test the CPANPLUS branch, it can be found here:
As always, let me know what bugs you find and I'll try to address them as soon as I can.
I know Barbie is hard at work on a notification program based on the CPAN Testers database so author notification should resume soon.
There have been some mega-email threads about CPAN Testers on the perl-qa mailing list that started with a question about the use of 'exit 0' in Makefile.PL.
I want to sum up a few things that I took away from the conversations and propose a series of major changes to CPAN Testers. Special thanks to an off-list (and very civil) conversation with chromatic for triggering some of these thoughts.
Type I and Type II errors
In statistics, a Type I error means a "false positive" or "false alarm". For CPAN Testers, that's a bogus FAIL report. A Type II error means a "false negative", e.g. a bogus PASS report. Often, there is a trade-off between these. If you think about spam filtering as an example, reducing the chance of spam getting through the filter (false negatives) tends to increase the odds that legitimate mail gets flagged as spam (false positives).
Generally, those involved in CPAN Testers have taken the view that it's better to have a false positives (false alarms) than false negatives (a bogus PASS report). Moreover, we've tended to believe -- without any real analysis -- that the false positive *ratio* (false FAILs divided by all FAILs) is low.
But I've never heard a single complaint about a bogus PASS report and I hear a lot of complaints about bogus FAILS, so it's reasonable to think that we've got the tradeoff wrong. Moreover, I think the downside to false positives is actually higher than for false negatives if we believe that CPAN Testers is primarily a tool to help authors improve quality rather than a tool to give users a guarantee about how distributions work on any given platform.
False positive ratios by author
Even if the aggregate false positive ratio is low, individual CPAN authors can experience extraordinarily high false positive ratios. What I suddenly realized is that the higher the quality of an author's distributions, the higher the false positive ratio.
Consider a "low quality" author -- one who is prone to portability errors, missing dependencies and so on. Most of the FAIL reports are legitimate problems with the distribution.
Now consider a "high quality" author -- one who is careful to write portable code, well-specified dependencies and so on. For this author, most of the FAIL reports only come when a tester has a broken or misconfigured toolchain The false positive ratio will approach 100%.
In other words, the *reward* that CPAN Testers has for high quality is increased annoyance from false FAIL reports with little benefit.
Repetition is desensitizing
From a statistical perspective, having lots of CPAN Testers reports for a distribution even on a common platform helps improve confidence in the aggregate result. Put differently, it helps weed out "outlier" reports from a tester who happens to have a broken toolchain.
However, from author's perspective, if a report is legitimate (and assuming they care), they really only need to hear it once. Having more and more testers sending the same FAIL report on platform X is overkill and gives yet more encouragement for authors to tune out.
So the more successful CPAN Testers is in attracting new testers, the more duplicate FAIL reports authors are likely to receive, which makes them less likely to pay attention to them.
When is a FAIL not a FAIL?
There are legitimate reasons that distributions could be broken such that they fail during PL or make in ways that are not the fault of the tester's toolchain, so it still seems like valuable information to know when distributions can't build as well as when they don't pass tests. So we should report on this and not just skip reporting. On the other hand, most of the false positives that provoke complaint are toolchain issues during PL or make/Build.
Right now there is no easy way to distinguish the phase of a FAIL report from the subject of an email. Removing PL and make/Build failures from the FAIL category would immediately eliminate a major source of false positives in the FAIL category and decrease the aggregate false positive ratio in the FAIL category. Though, as I've shown, while this may decrease the incidence of false positives for high quality authors, the false positive ratio is likely to remain high.
It almost doesn't matter whether we reclassify these as UNKNOWN or invent new grades. Either way partitions the FAIL space in a way that makes it easier for authors to focus on which ever part of the PL/make/test cycle they care about.
What we can fix now and what we can't
Some of these issues can be addressed fairly quickly.
First, we can lower our collective tolerance of false positives -- for example, stop telling authors to just ignore bogus reports if they don't like it and find ways to filter them. We have several places to do this -- just in the last day we've confirmed that the latest CPANPLUS dev version doesn't generate Makefile.PL's and some testers have upgraded. BinGOs has just put out CPANPLUS::YACSmoke 0.04 that filters out these cases anyway if testers aren't on the bleeding edge of CPANPLUS. We now need to push testers to upgrade. As we find new false positives, we need to find new ways to detect and suppress them.
Second, we can reclassify PL/make/Build fails to UNKNOWN. This won't break any of the existing reporting infrastructure the way that adding new grades would. I can make this change in CPAN::Reporter in a matter of minutes and it probably wouldn't be hard to do the same in CPANPLUS. Then we need another round of pushing testers to upgrade their tools. We could also take a decision as to whether UNKNOWN reports should be copied to authors by default or just sent to the mailing list.
However, as long as the CPAN Testers system has individual testers emailing authors, there is little we can do to address the problem of repetition. One option is to remove that feature from Test::Reporter and reports will only go to the central list. With the introduction of an RSS feed (even if not yet optimal), authors will have a way to monitor reports. And from that central source, work can be done to identify duplicative reports and start screening them out of notifications.
Once that is more or less reliable, we could restart email notifications from that central source if people felt that nagging is critical to improve quality. Personally, I'm coming around to the idea that it's not the right way to go culturally for the community. We should encourage people to use these tools, sign up for RSS or email alerts, whatever, because they think that quality is important. If the current nagging approach is alienating significant numbers of perl-qa members, how can we possibly expect that it's having a positive influence on everyone else?
Some of these proposal would be easier in CPAN Testers 2.0, which will provide reports as structured data instead of email text, but if "exit 0" is a straw that is breaking the Perl camel's back now, then we can't ignore 1.0 to work on 2.0 as I'm not sure anyone will care anymore by the time it's done.
What we can't do easily is get the testers community to upgrade to newer versions of the tools. That is still going to be a matter of announcements and proselytizing and so on. But I think we can make a good case for it, and if we can get the top 10 or so testers to upgrade across all their testing machines then I think we'll make a huge dent in the false positives that are undermining support for CPAN Testers as a tool for Perl software quality.
I'm interested in feedback on these ideas. In particular, I'm now convinced that the "success" of CPAN Testers now prompts the need to move PL/make fails to UNKNOWN and to discontinue copying authors by individual testers. I'm open to counter-arguments, but they'll need to convince me of a better long-run solution to the problems I identified.
First, I'd like to thank The Perl Foundation for sponsoring my trip to the hackathon. I'd also like to thank Linpro for providing a wonderful venue as well as on-site food and refreshments throughout the hackathon. This is my wrap-up summary so they have a sense of what their contributions helped make possible.
Project 1 -- Test::Reporter transport plugins
I worked with rjbs on adding "transport" plugins to Test::Reporter to provide better options for how reports are sent to CPAN Testres. The existing transports (Mail::Send and Net::SMTP) were extracted into separate modules (Test::Reporter::Transport::Mail::Send, etc.). Plus we created three new transports:
Test::Reporter::Transport::HTTPGateway -- provides HTTP to email transport via a the new Test::Reporter::HTTPGateway module
Test::Reporter::Transport::Net::SMTP::TLS -- provides TLS and authenticated SMTP (a long-standing wishlist item)
Test::Reporter::Transport::File -- reports saved as files in a directory for offline CPAN Testing and manual submission
Unfortuately, the HTTPGateway transport is only a temporary step towards a better CPAN Testers as the gateway still just relays to email. But people can set up their own gateways for now or perhaps a central server will be deployed somewhere.
Using transport plugins for all of these with a consistent API makes it
easy for clients like CPAN::Reporter to switch between them based on
configuration settings. E.g., in
Project 2 -- CPAN Metabase for CPAN Testers 2.0
The big issue with CPAN Testers "1.0" (what we currently have) is that it uses email to firstname.lastname@example.org for submissions and the only archive for reports is effectively the perl.cpan.testers newsgroup. For a while, I and others were talking about creating CPAN Testers 2.0 to move to HTTP submission and a centralized, indexed, searchable database.
I'd hoped to interest people in Oslo in thinking through what it might require or even working on some pieces of it. What I found was that there are a lot of areas where people are looking to collect 'meta' information about CPAN distributions, each with their own approach and APIs for gathering, storing or publishing in a very application-specific way.
That got me and rjbs started designing a more general solution, which we've called a CPAN 'metabase'. This would work for CPAN Testers 2.0 or for other projects that want to store/publish distribution-level information.
After a few days of work, we achieved:
General design of a system (class hierarchy, architecture, etc.) to meet at least initial envisioned needs with the flexibility to expand over time
A simple, working proof-of-concept -- with filesystem-based storage and indexing and the ability to store and retrieve a simple "fact" (a text string) via a web client.
Next steps are to refine, standardize and document the classes and APIs; to add additional capabilities to the proof-of-concept; and to get a 0.01 release to CPAN for community feedback.
Moral support for Adam Kennedy's time-travelling efforts to build the April Strawberry Perl
Feedback on TAP diagnostic semantics and the implication for downstream consumers like CPAN::Reporter
Figured out a hack with the Berkeley DB libraries to help Tux build DB_File on Strawberry Perl
Participated in a best practice discussion/argument on environment variables, 'xt' directories, etc.
Things that came up that I didn't get around to working on
CPAN::Reporter::Smoker -- a couple people (Gabor and someone else?) said it would be helpful if C::R::S could take a specific list for a work queue instead of trying to smoke only the latest things on CPAN. Now that someone asked for it, that gets bumped up on my priority list
Salve brought up using boilerplating as a way of discussing and later disseminating best practices so I showed him a boilerplate tool in my repository (not yet released). He and rjbs thought it had promise given easy customization so I'm going to try to finish the documentation and get a 0.01 release out soon
Schwern brought up the CPAN::PERL5INC taint bug in recent devel versions of CPAN.pm
Various people were working on codifying and releasing modules for various parts of the PAUSE/CPAN tools. rjbs has a work project along these lines and brian d foy's work to index BackPAN is driving similar efforts. There was some discussion of making it a YAPC::NA hackathon project
Other things I learned (not necessarily Perl)
git is awesome. I had been looking for a project to force me to live with it and get over the learning curve. Since Test::Reporter lives in git and rjbs knew it already, I got my chance. rjbs and I were routinely developing simultaneously on the same code base, merging back and forth several times a day and despite the huge flux in our early development code, it "just worked"
The best part was that during the flight home, we continued hacking, using a git repository on a usb drive to exchange and merge our branches. We'd just walk it between our seats about every hour or so
Developing Perl on a Ubuntu virtual machine on Windows is a vastly better, faster experience than native Perl on Windows. (It would be interesting for someone to figure out what's behind that)
rjbs showed me the wonders of screen (and screen + irssi). I don't know how I could have gone so long without learning how cool it is.
Norwegians are friendly. The organizers, Linpro staff and people on the streets were welcoming and helpful whenever we need it
To Pobox (via rjbs) for a temporary mailbox for testing TSL and authenticated SMTP
To the BBC for backstage.bbc.co.uk swag
To jonasbn for being a sounding board as rjbs and I drew crazy metabase design diagrams
To rjbs for putting in a tremendous effort in a short period of time on projects that he wasn't even focusing on before the hackathon and for teaching me a lot of new tools
We came up with some good ideas for a general metabase for CPAN distributions, which we think can become the back-end not just for CPAN Testers 2.0 but for other tools as well that are all doing essentially the same thing in collecting distributed information about distributions.
However, we realized that it would probably be a lot of work, so we wanted to start with something tangible that could be successful quickly. So we added 'HTTP' as a valid transport() option for Test::Reporter and RJBS created Test::Reporter::HTTPGateway to be a remote server that can gateway HTTP transport to email@example.com. We've done some limited testing and it appears to be working well. (More testing tomorrow.)
The documentation could still be improved -- but essentially, if you install the latest dev releases of Test::Reporter (1.38_01) and CPAN::Reporter (1.14_01), then you can use this config option in your
and assuming you're running T::R::HTTPGateway at that URL it should work.
In the process, I wound up using some code that I had started working on a couple months ago and never finished to support other transports like Net::SMTP::TLS, so while that hasn't been tested yet (maybe tomorrow), it's possible that Net::SMTP::TLS may actually be working as well, or could be made to work with little additional effort.
That's the news from Oslo.
As an additional note, I want to particularly thank TPF for sponsoring my travel and Linpro for providing excellent facilities for the hackathon and also some very friendly and very helpful hosts. Also, I want to send a big thank you to Salve for taking the lead in organizing the event in the first place!
I recently installed Strawberry Perl on a virtual machine. Since Strawberry Perl doesn't (yet) include libwin32 from the start that was one of the first distributions I installed from CPAN.
I was annoyed to find that it aborted testing on Win32::FileSecurity. Why? Because my partion was FAT32, not NTFS. Ironically -- it warns that it will fail when not on NTFS, but does nothing to test for it. (c.f. Win32::FsType() -- though maybe that didn't exist when the test was first written)
While that's an easy fix, it highlighted a bigger issue that I have with libwin32. Because it bundles multiple modules as separate distributions with recursive Makefiles, that single failure stopped tests, so I couldn't even see if the rest of libwin32 worked correctly. (So, yes, hack-fix Win32::FileSecurity, continue testing, all tests pass, install, post bug and patched test.pl file to RT, whee...)
I wish all the individual modules in libwin32 could be broken out into separate CPAN distributions and all of them were just added to Bundle::libwin32 instead. That way, some strange failure of one module wouldn't stop the rest from installing.
Moreover, I wonder if having separate distributions would make it easier to distribute responsibility for tackling the growing RT queue and let individual modules be fixed and released on a faster cycle.
So that's my challenge to the maintainers and the lazyweb -- help us divide and conquer libwin32!
I'm proud of this new config option I added to CPAN.pm: trust_test_report_history.
When this option is set and the latest version of CPAN::Reporter is installed, then CPAN.pm won't run tests on a distribution that has already passed tests on that platform and perl, but will use the last test result for that distribution instead.
Where this comes in handy is "build_requires" prerequisites. If a build_requires prereq is not installed, CPAN.pm will download, build and test it and then include it in PERL5LIB (if it passes its tests) for the rest of that session. But in any future session, the next time a distribution has that prerequisite, CPAN.pm does the exact same thing all over again. But with "trust_test_report_history", the test is only run once.
This may not have a big impact for day-to-day use, but it should save a lot of time and processor cycles for smoke testing.
About a year and a half ago, I wrote CPAN::Reporter, to bring CPAN Testers reporting to good-old CPAN.pm. Since then, some intrepid individuals have built automated smoke testers upon CPAN::Reporter, all writing their own programs to do so.
CPAN.pm itself added an experimental smoke command to test recent uploads, but it requires XML::LibXML and doesn't test distributions in isolation -- all tested distributions and their dependencies keep getting added to PERL5LIB until things explode.
Yesterday, I released the first, alpha version of CPAN::Reporter::Smoker to provide a better-behaved, more turn-key approach to smoke testing with CPAN::Reporter.
(... configure CPAN and CPAN::Reporter as usual
$ perl -MCPAN::Reporter::Smoker -e start
This initial version still has some limitations (e.g. no developer versions smoked), but I'm ready for anyone interested to try it out and start sending feedback -- compliments, suggestions, or complaints all welcome.
If you have or can install Inline::C, I'd greatly appreciate your help testing IO-CaptureOutput-1.05_53.
I've recently adopted IO::CaptureOutput, which is a wonderful tool for capturing program output to STDOUT or STDERR without ties and regardless of whether the output comes from perl, XS or programs in a subprocess.
However, the tests for XS use Inline::C and the C code was found to have portability problems (i.e. segfault on some Win32 platforms). At least one fix for Microsoft Visual C++ (MSVC) then broke on someone else's Linux platform.
(Aside: the fact that it it this hard to portably print the equivalent of "Hello World" to STDOUT and STDERR just astonishes me.)
My latest attempt at updating the C code now uses different code for MSVC and other compilers and now I want to test this as far and as wide as I can.
So if you have installed Inline::C and something that can send test reports (e.g. CPAN::Reporter), please test IO-CaptureOutput-1.05_53. For example, from the CPAN shell:
cpan> test DAGOLDEN/IO-CaptureOutput-1.05_53.tar.gz
Thank you very much,
I'm helping David Cantrell with Devel::AssertLib -- particularly with Win32 portability.
I have a branch that passes tests on my Linux and Strawberry Perl platforms, and I've added code that I *think* should work with ActiveState and MSVC, but I don't have that platform set up anywhere.
I'd greatly appreciate if someone(s) would download the branch from my repository, try running tests and let me know what happens:
If it fails and you can patch it to work, that would be wonderful as well.
Thank you very much
I'm pleased at announce that CPAN::Reporter 1.00 has been released and should soon be appearing on a CPAN mirror near you. Now, anyone with CPAN and a relatively modern version of Perl can contribute to CPAN Testers.
Version 1.00, combined with CPAN.pm 1.92 or better, provides several major enhancements over the 0.XX series. From the Changes file:
Added support for reporting for *.PL and make/Build stages; bumped CPAN.pm prerequisite to 1.9203 to take advantage of this support
Added support for the forthcoming Test::Harness 3.00
Changed the name and format of the history file of sent reports to track history by PL/make/test phase. Old history.db will be automatically upgraded to new reports-sent.db.
Moved 'cc_author' and 'send_duplicates' options from interactive to advanced (manual) configuration; defaults are strongly recommended
Bumped Test::Reporter prereq to 1.34 for transport() method and set default transport to Net::SMTP on all platforms
Anyone using an older version of CPAN::Reporter is strongly encouraged to upgrade.
Getting started with CPAN::Reporter is easy. From the CPAN shell:
cpan> install CPAN::Reporter
cpan> reload cpan
cpan> o conf init test_report
A note to module authors -- CPAN::Reporter is getting smarter about handling prerequisite failures and unsupported Perl versions or platforms. But there are best practices you can follow to make this easier. See Author Notes on the CPAN Testers Wiki for more details.