Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

Alias (5735)

Alias
  (email not shown publicly)
http://ali.as/

Journal of Alias (5735)

Wednesday February 11, 2009
12:21 AM

Announcing the "CPANTS Heavy 100" index

[ #38454 ]

With the success of ORLite and ORDB::CPANTS I've finally managed to achieve something I've wanted for years, a cheap and well encapsulated way to screw around with CPAN graph data.

This has been possible for a long time, but I think I've finally found the solution that can do it in a Closed Problem way, and with each piece separately being a working, completed and published module.

This means I don't have to look after some random script on a website somewhere, and it lets everyone else take my work and maintain it for me :)

By combining ORDB::CPANTS with Algorithm::Dependency, this also means that now I can finally achieve a dependency-weighting engine for the CPAN dataset that is self-updating and requires basically no maintenance.

This in turn gives me the opportunity to fix one of the CPAN artifacts that I've disliked for a long time, the Phalanx 100. What I dislike about it the most is that it is just so arbitrary.

It's in the right solution area, but it is ultimately edited by humans, and it isn't updated in real-time (so it doesn't respond to CPAN usage trends).

So my plan is to "upgrade" the Phalanx 100 into a range of "Top 100" indexes that are automatically-generated, updated daily, and can be used as the basis for optimising and prioritising QA work.

I hope to release one of these new indexes every few days, with supporting code released to CPAN shortly after. As this list of lists starts to grow, I'd like to create a dedicated website ( which I'll notionally call http://top100.cpan.org/ ) to hold all the indexes.

To kick off the indexes, I'll start with the "CPANTS Heavy 100".

This is an index containing the 100 CPAN distributions with the largest dependency chains. These represent excellent sample cases for testing scenarios relating to typical large scale Perl applications in the wild.

This index, however, makes no judgement whatsoever about any of the members of the index being good, bad, or otherwise. It is purely a naive graph calculation (which is why this list is dominated by plugins for other things that are themselves heavy).

CPANTS Heavy 100
748 Task-POE-All
276 MojoMojo-Formatter-RSS
271 MojoMojo-Formatter-Amazon
269 MojoMojo-Formatter-Emote
266 MojoMojo
216 Task-Padre-Plugin-Deps
211 Task-BeLike-RJBS
206 Parley
205 Foorum
203 Angerwhale
200 Task-Padre-Plugins
198 Task-Catalyst-Tutorial
196 Task-Email-PEP-All
191 Jifty-Plugin-ModelMap
189 Jifty-Plugin-Authentication-Bitcard
188 Jifty-Plugin-GoogleAnalytics
188 CommitBit
187 JiftyX-ModelHelpers
187 Jifty-Plugin-JapaneseNotification
186 Jifty
185 Reaction
181 Buscador
179 Rose-DBx-Garden-Catalyst
171 Module-CPANTS-Site
170 Task-CatInABox
168 Egg-Release-Authorize
164 Egg-Plugin-SessionKit
162 Catalyst-Controller-HTML-FormFu
159 Osgood-Server
158 Catalyst-Example-InstantCRUD
158 App-CamelPKI
157 Egg-Release-DBIC
151 Catalyst-Controller-Atompub
149 Task-SOSA
144 Padre-Plugin-CSS
142 Padre-Plugin-Perl6
142 Padre-Plugin-AcmePlayCode
142 CatalystX-CRUD-YUI
140 App-HistHub
139 Egg-Plugin-Crypt-CBC
138 Apache-SWIT-Security
138 Handel-Storage-RDBO
137 Egg-Release-DBI
137 Apache-SWIT
137 Test-Apocalypse
136 Egg-Release-XML-FeedPP
136 Egg-Plugin-Cache-UA
136 Egg-Release-JSON
136 Egg-Release-Mail
135 ShipIt-Step-Manifest
135 ShipIt-Step-DistClean
135 ShipIt-Step-ApplyYAMLChangeLogVersion
135 Egg-Plugin-Authen-Captcha
135 Egg-Plugin-Net-Ping
134 Catalyst-Helper-AuthDBIC
134 Dist-Joseki
134 Egg-Plugin-LWP
134 Egg-Plugin-Net-Scan
134 Egg-View-TT
134 Egg-Model-Cache
134 Egg-Model-FsaveDate
134 Egg-Plugin-Log-Syslog
133 Egg-Release
133 Padre-Plugin-HTML
133 Padre-Plugin-PerlCritic
132 Task-Email-PEP-NoStore
132 Padre-Plugin-InstallPARDist
131 CatalystX-ListFramework-Builder
131 Padre-Plugin-XML
130 HTML-FormFu-Model-DBIC
129 Catalyst-Authentication-Credential-OpenID
129 DBIx-Class-HTML-FormFu
129 Padre-Plugin-PAR
128 Padre-Plugin-Encrypt
128 Devel-ebug-HTTP
128 Padre-Plugin-JavaScript
127 Padre-Plugin-HTMLExport
127 Padre-Plugin-SpellCheck
127 Padre-Plugin-ViewInBrowser
127 Padre-Plugin-PerlTidy
127 Padre-Plugin-Alarm
126 DBIx-Class-FromValidators
126 Padre-Plugin-Parrot
126 Padre-Plugin-CommandLine
126 Padre-Plugin-Vi
126 Padre-Plugin-Encode
125 Padre
124 Catalyst-Controller-LeakTracker
124 cnutt-feed
124 Catalyst-Model-HTML-FormFu
124 Catalyst-Controller-DBIC-API
123 DBIx-Class-Schema-PopulateMore
123 Catalyst-Authentication-Store-KiokuDB
122 Catalyst-Plugin-Session-Store-KiokuDB
121 KiokuX-User
121 Pod-Browser
121 Titanium
121 Data-Conveyor
120 Rubric-Entry-Formatter-Markdown
119 Handel

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • First, I'm glad to see you doing this sort of thing. Automated CPAN analysis is good to have. I'd like to correct a few notes on the Phalanx 100, though. First, consider why the Phalanx 100 was created. The Phalanx project was an attempt to increase test coverage in the most-used modules on CPAN, so that Ponie would have a good test base to work with. The Phalanx 100 was created by analysis of CPAN download logs for a one-month period from one mirror. We figured that would be a good enough estimate of
    --

    --
    xoa

    • At the time that the Phalanx 100 was created, my specific beef was that it didn't appear to factor in dependencies.

      So while we got a list of 100 modules, they weren't ACTUALLY the most 100 used, just the 100 most in some other sense.

      I do, however, appreciate that they were based on usage data, as opposed to dependency data. And I totally plan to start factoring that into some of the indexes, once I've got the basic naive ones working.

      • I guess I take issue with your "beef" because it was never intended for your use. We didn't make any assertions as to how the data should be used, so it's not fair for you to say it's not what you want.

        Our feeling on dependencies was that dependencies would have to get downloaded, too, and so those downloads would show that traffic. So you get dependencies in that data, but not weighted by the number of other modules that use the dependency. A single-use dependency would get as much weight as, say, HTM

        --

        --
        xoa

  • Now that would be a perfect list for testing a CPAN packaging system...
  • I have some equivalent tools that crawl the packages themselves. I just ran my dependency chain tool for MojoMojo and come up with 239 deps rather than your 266. I'd be very curious to see what the discrepancy is.
  • It would be good to have a few different measures of module popularity. Personally, I think a listof the "most depended on" modules would be really useful. That some other module author would use a module I think is a pretty good vote of confidence for the usefulness and quality of that module. Such rankings would especially help when trying to choose between roughly equivalent modules for a project.