Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

Alias (5735)

Alias
  (email not shown publicly)
http://ali.as/

Journal of Alias (5735)

Monday July 06, 2009
01:32 AM

CPANDB should now be synchronised and up to date

[ #39235 ]

Update:

The CPANDB now also contains the data for CPAN Ratings

In my previous announcements for CPANDB, I mentioned that while it was generating the database the correct way, the data was going to be a bit out of sync because data was coming from different places.

This should now be resolved.

The server syncs a minicpan to http://cpan.cpantesters.org/, then a META.yml database is updated from the minicpan, a CPAN::SQLite index is built from the same minicpan, and then both of them are joined together with the ORDB::CPANUploads database to product the completed tables.

The final step is the application of the graph algorithms to produce the weight and volatility scores for each distribution, which are now being added to the database as well.

Now that cpandb is ready for the big time, I'll start to try and merge in the three main outstanding data sets (CPAN Ratings, rt.cpan.org and CPAN Testers) and then rewrite the CPAN Top 100 to turn it into a fully operational death star... I mean website.

You can use the CPANDB module to fetch and ORM-inflate the database, or to download the database directly, use the following URL.

http://svn.ali.as/db/cpandb.gz

The current schema for the database now has the slightly expanded distribution table.

CREATE TABLE distribution (
    distribution TEXT NOT NULL PRIMARY KEY,
    version TEXT NULL,
    author TEXT NOT NULL,
    release TEXT NOT NULL,
    uploaded TEXT NOT NULL,
    weight INTEGER NOT NULL,
    volatility INTEGER NOT NULL,
    FOREIGN KEY ( author ) REFERENCES author ( author )
)

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • I sort of missed if you're doing the indexing by yourself by now, or of you're still pulling in the data generated by cpants...
    • I'm now doing it all from scratch myself.

      I started from the CPAN Index, so while it's superficially like the CPANTS graph, it follows permissions properly, and correctly handles changes of authors.

    • See CPANDB::Generator for the methodology used to generate the database.