Slash Boxes
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

Shlomi Fish (918)

Shlomi Fish
AOL IM: ShlomiFish (Add Buddy, Send Message)
Yahoo! ID: shlomif2 (Add User, Send Message)

I'm a hacker of Perl, C, Shell, and occasionally other languages. Perl is my favourite language by far. I'm a member of the Israeli Perl Mongers, and contribute to and advocate open-source technologies. Technorati Profile []

Journal of Shlomi Fish (918)

Wednesday May 16, 2007
02:39 PM

The Tale of XML-Grammar-Screenplay

[ #33293 ]

Being a creative writer, I recently started working on a screenplay titled "Star Trek: We the Living Dead". I wrote it in a certain well-formed text, using vim, and planned on creating a way to translate it into HTML. Eventually it resulted in the newly released XML-Grammar-Screenplay distribution on CPAN. Here's how I got it.

I decided that I first translate the well-formed text into a custom XML format, and from that into DocBook/XML and in the future possibly other formats. The well-formed text had XML-style tags, descriptions ([David walks to Goliath.]), paragraphs and other elements, and so I looked at Damian's Parse-RecDescent for help in parsing it. Working with P::RD on something so complex proved to be very time-consuming. I was again bitten by the fact P::RD skipped various stuff based on $Parse::RecDescent::skip, and had to assign a "" to it. (After banging my hand for several hours.)

I also found out it only reported that it failed to parse, and not why exactly, at least with the rudimentary logic that I built for it. While at first, I tried to use the $::RD_TRACE display, I eventually found it that it is often more effective to binarily isolate the offending text until one finds the problem. Sometimes I found I had a syntax error. And in other cases, I found out that the grammar I defined for P::RD was lacking, including in many regular expression problems.

After a lot of trying, I could get the parser to work. Then I wrote an XSLT stylesheet to translate the resultant XML to DocBook/XML. That was very easy considering, but I got stomped on doing <xsl-apply-templates match="*" /> instead of "xsl:apply-templates" without any arguments, which caused the textnodes not to render. Having solved that, everything worked fine.

As a bit of sugar, I created two XML::Grammar::Screenplay::App:: modules that can be used by perl -M and then -e 'run()' -- [ARGS] to process files from the command line.

I already discovered one :utf8 bug in it since I uploaded it, which I fixed in the trunk.

On other news, I've received and cashed in my cheque for the XML::RSS grant. I'm planning to spend some of the money on extending the Web-CPAN T-shirts offer to have one T-shirt of choice from "Think Geek" as well as the "Ozy&Millie" T-shirt. And I also spent quite a lot of time working on Test::Run.

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
More | Login | Reply
Loading... please wait.