Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

Shlomi Fish (918)

Shlomi Fish
  shlomif@iglu.org.il
http://www.shlomifish.org/
AOL IM: ShlomiFish (Add Buddy, Send Message)
Yahoo! ID: shlomif2 (Add User, Send Message)
Jabber: ShlomiFish@jabber.org

I'm a hacker of Perl, C, Shell, and occasionally other languages. Perl is my favourite language by far. I'm a member of the Israeli Perl Mongers, and contribute to and advocate open-source technologies. Technorati Profile [technorati.com]

Journal of Shlomi Fish (918)

Thursday June 12, 2008
04:29 AM

XML-Grammar-Fortune

[ #36668 ]

One of my many computerised passions is to collect quotes in UNIX-like fortune format. Throughout the years, I have formed a moderately large collection of them in several files. As time went on, I noticed a few problems. First of all, they were all in large plaintext files, and pointing someone to a quote involved giving a link to the fortune, and saying "search for Foobar". Moreover, since they were just chunks of text, they couldn't hold any meta-data.

At one time, I heard of someone who created an XML grammar to describe Unix fortunes, but a Google search was no help in finding that. And I also have the grand "Fortunes Mania" vision for a community site that for collecting and sorting quotes. This vision was very intimidating, but recently, I decided to take a small baby step by defining a grammar for fortunes as XMLs. So I present to you the XML-Grammar-Fortune distribution.

I've taken quite a lot of time to think about what I wanted there. One thing I concluded was that there are several different types of fortune cookies: run-of-the-mill quotes, IRC conversations, excerpts from screenplays, structured plaintext, HTML, etc. Therefore, the XML grammar should be able to have several different types of sub-nodes, which each corresponds to a certain class of fortune cookies

Until now I've used DTDs for defining my XML schemas, but for XML-Grammar-Fortune, I decided to learn Relax NG, which I was told was easier than the W3C XML Schemas. I was very impressed from Relax NG - it's easy, it's fun, and it's powerful. One problem I've encountered was that, when validating a document using it, XML::LibXML (version perl-XML-LibXML-1.66-1mdv2008.1), does not give the line number where the validation error has occured. To overcome such problems, one needs to look at the diffs or bisect the document.

Anyway, I defined a Relax NG Schema for the documents, and made sure that some basic examples will validate (test-driven-development-style). Then I worked on an XSLT stylesheet to convert them to XHTML.

When I started, I only had one fortune type - <raw>, which is a gigantic <pre> block with some meta-data. I gradually implemented more fortune types: irc, quote and screenplay, whose RNG and XSLT were based on XML-Grammar-Screenplay, with a lot of ugly copying-and-pasting.

I gradually converted more and more fortunes to have a richer XML semantics. The XML grammar requires an id for each fortune, and also allows specifying a title-element, and some fields in the <info> tag, like "author" or "work". For example all the "Friends" fortunes were converted to XML by first normalising the screenplay and then using a script I wrote to convert them to XML.

So I had all the fortunes as XMLs, but now the plaintext versions went out of sync. So I coded a Perl module to convert them from XML to plaintext.

I should note that due to a problem with XML-LibXSLT and perl-5.10.0, I didn't upload it to CPAN yet, because I do not want to receive so many failure reports.

On a different note: my former co-worker has read "Perl for Perl Newbies" in order to learn Perl, liked it a lot, and told me I should add more to it. That also feels good.

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.