Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

ivorw (5222)

ivorw
  (email not shown publicly)
http://myweb.tiscali.co.uk/ivorw

Journal of ivorw (5222)

Monday October 17, 2005
05:18 AM

Mirroring OpenGuides websites

[ #27209 ]

In the aftermath of the September 2005 incident, I have been thinking about steps we can take to prevent a reoccurrence, or at least minimise the impact.

As it is, I have lost quite a number of writeups and updates I have made in the last six months.

I have been looking at decentralising the OpenGuides data by using one or more mirror sites, which hold all the data and are kept up to date. I'm using CPAN mirroring as my model.

In terms of how to do this, each page has wiki content and metadata. The wiki text can be retrieved using format=raw, for example:

http://london.openguides.org/?id=Borough+Market;format=raw

This was recently implemented by hex (cheers mate!) - though you could previously achieve the same result by using action=edit and scraping the HTML response for the CGI form corresponding to the text.

The metadata is obtained in RDF/XML format using format=rdf. This has highlighted a number of issues, resulting in several RT bug tickets for OpenGuides. It has also resulted in a CPAN module OpenGuides::RDF::Reader - standardising data retrieval from OG, mapping namespace qualified tag names into more directly meaningful hash keys. In principle, these translated hash keys match the values going into the column metadata_type in the metadata table.

The idea is that a guide mirror can pull down new and changed pages, when detected from the RecentChanges RSS feed.

The guide mirror gives us new possibilities, such as having all OG data on one website. This will allow an aggregated search over all the Guides.

The other aspect of this is that the data pulled from another site comes with a hash key "source", containing the URL where the data has come from. This will allow implementation of Creative Commons "Attribution", and will allow a future release of OpenGuides to redirect all requests to edit, to the source website.

Exciting stuff! More to come...

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.