Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

Ovid (2709)

Ovid
  (email not shown publicly)
http://publius-ovidius.livejournal.com/
AOL IM: ovidperl (Add Buddy, Send Message)

Stuff with the Perl Foundation. A couple of patches in the Perl core. A few CPAN modules. That about sums it up.

Journal of Ovid (2709)

Tuesday August 11, 2009
04:27 PM

Beautiful Freebase Metadata Dreams Slain

[ #39448 ]

So I've been hacking on a pet project and thought that Freebase would be my answer. As far as I can tell, it's not. Not even close. Right now, Freebase is like a huge Wikipedia, but with a nice query language on top. I needed a list of all countries in the world along with basic stats like capital, population, GDP, official language, etc. Here's the script I hacked together:

use strict;
use warnings;

use WWW::Metaweb;

my $mh = WWW::Metaweb->connect(
    server      => 'www.freebase.com',
    read_uri    => '/api/service/mqlread',
    trans_uri   => '/api/trans',
    pretty_json => 1
);

my $countries = '[{"type":"/location/country","name":null}]';
my $result    = $mh->read( $countries, 'perl' ) or die $WWW::Metaweb::errstr;
my @countries = sort map { $_->{name} } @$result;

# http://www.freebase.com/app/queryeditor
my %country_stats;

for my $country (@countries) {
    my $country_info = sprintf <<'    END' => $country;
    [{
      "type": "/location/country",
      "name": "%s",
      "capital":null,
      "currency_used": [],
      "form_of_government": [],
      "gdp_nominal" : [{"timestamp":null,"currency":null,"amount":null}],
      "gdp_nominal_per_capita" : [{"timestamp":null,"currency":null,"amount":null}],
      "/location/statistical_region/population" : [{"number":null,"timestamp":null}],
      "official_language":[{"name":null}]
    }]
    END
    print "Reading the data for $country\n";
    my $result = $mh->read( $country_info, 'perl' )
      or die $WWW::Metaweb::errstr;
    use Data::Dumper;
    $Data::Dumper::Indent   = 1;
    $Data::Dumper::Sortkeys = 1;
    print Dumper($result);
}

Not only do I get only 100 countries returned -- including the Weimar Republic and West Germany (but not East Germany) -- most of whom have almost no data associated with them. The ones which do have data often have curious results which might be correct (see the official languages), but without context, who knows? Oh, and WWW::Metaweb needs a monkey patch to get around an incompatible API change in JSON::XS. One suggestion on the Freeweb message boards involved posting back the correct information. This sounds reasonable, but at the end of the day, it also sounds like a lot of work, particularly since I didn't want to base my project on Freebase. I just saw it as a useful source of information. Freebase looks awesome, but it's not quite there yet. Or I don't understand it. Who knows?

I'll have to figure out a better way of extracting this information (CIA World Factbook sounds good), but then figuring out the posting API for Freebase just sounds like more work that will distract me from my main project.

Back to the drawing board.

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • 1) If you want more than 100 results, up the limit to a higher number; 100 is the default. Use "limit" : 500 or whatever.

    2) You probably want to mark some of your clauses as optional; as it stands, if the system doesn't know the capital of a country, it will be entirely excluded.

    3) The Weimar Republic *is* a country -- or was. That's perfectly valid in Freebase. The country type is used for past and present countries and things that act like countries (eg. have an ISO code). Admittedly this might not qui

    --
    Kirrily "Skud" Robert perl@infotrope.net http://infotrope.net/
    • Thank you! I'll give it another go. Understanding the ins and outs of MQL is harder than I thought! :)

      • Jump on IM or IRC (irc.freenode.net #freebase) if you'd like some realtime help. Remembering that I'm on the US West Coast, of course -- late in your day and early in mine would probably be our most likely crossover point. I'm skud11111 on AIM/YIM and kirrily.robert@gmail.com on GTalk.
        --
        Kirrily "Skud" Robert perl@infotrope.net http://infotrope.net/
    • And on the API thing, would the most prominent Freebase Perl person be interested in taking over and fixing the Freebase module?

      • Which module is that? Kirrily already has the Metaweb [cpan.org] module and there's a WWW::Metaweb [cpan.org] module also. Regrettably, both fail to install due to incompatible API changes to JSON modules. I've a locally hacked copy of WWW::Metaweb (and I have filed a bug report [cpan.org]), so it's easy to fix, but newcomers might be confused.

        I'm not certain which of these modules would be of greater benefit, though.

  • Oh, btw, re: languages... the languages listed are taken from Wikipedia at http://en.wikipedia.org/wiki/index.html?curid=576 [wikipedia.org] -- looking at the topic history ( http://www.freebase.com/history/view/en/austria [freebase.com] ) or explore view ( http://www.freebase.com/tools/explore/en/austria [freebase.com] ) should help you understand where certain information comes from. For the most part, our country data comes from Wikipedia infoboxes.
    --
    Kirrily "Skud" Robert perl@infotrope.net http://infotrope.net/