Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

rjbs (4671)

rjbs
  (email not shown publicly)
http://rjbs.manxome.org/
AOL IM: RicardoJBSignes (Add Buddy, Send Message)
Yahoo! ID: RicardoSignes (Add User, Send Message)

I'm a Perl coder living in Bethlehem, PA and working Philadelphia. I'm a philosopher and theologan by training, but I was shocked to learn upon my graduation that these skills don't have many associated careers. Now I write code.

Journal of rjbs (4671)

Tuesday January 17, 2006
11:55 AM

toward the promised land of playlists, through the valley of plists

[ #28386 ]

I am trying to produce a SAX-based parser for Apple Property List documents. It's been a frustrating, educational experience. BDFOY has a module, Mac::PropertyList, but it's not what I want. First of all, it can't just produce a simple Perl datastructure from a deep plist. That's fine: I could just patch it to do so, and it would only take a few lines. The bigger concern is that it does its parsing with regular expressions, and takes more than twelve hours to turn my iTunes Music Library file into an object. I'm not sure how much more than twelve hours, because I gave up, at that point; I didn't want my CPU pegged while I was on the bus back to work.

Maybe XML::SAX will not do any better. I'll probably write a do-nothing SAX handler to see how long it takes just to dispatch all the events to no-op subs. I really hope it's less than half a day, since the hard work should be getting done by expat.

Anyway, this has been my first experience with SAX, and, really, with much XML work at all. I know a fair bit about XML, but I've done nearly nothing with regard to handling it with something other than a specialized module. (I have, for example, used XML::RSS a lot, but that mostly shields me from the XML.) SAX seems like a really cool system. I'm not entirely clear on how to do what I want, yet. My first pass at a solution nearly did what it needed to, but was completely hideous. I think my biggest problem is that I want to change handlers and perform a recursive descent. That is, I want each element to do something like say: "now that I'm handling a dict element, I want to use the dict parser until I see the end_element event, and then I will invoke some callback with the value generated by the dict parser."

I think this is possible -- and not a wildly misinformed plan -- but I'm not sure how to do it from the XML::SAX documentation. I think those docs assume that I'm familiar with SAX itself, which I'm not.

John R. suggested that I should just write some sort of XS interface to Apple's plist parser, but I'm pretty sure that my C skills are nowhere near the required level. I guess that could be a reason to brush up on my C!

In the end, this really wouldn't be so ridiculous, if Apple had just done something more reasonable for plists. Why aren't they traditional Lisp-form-like plists? If they wanted to translate them to XML, why didn't they translate them to sensible XML? It's weird to imagine that the programmers in charge just failed to understand XML, but I guess I shouldn't be surprised, anymore, at bad XML applications.

I just want my iPod to suggest that I should listen to all of "OK Computer" once in a while. Is that so wrong?

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • Mac::PropertyList definitely sucks for big files.

    In my implementation, I actually wanted to parse the files on a FreeBSD machine, so I didn't want to do any Mac-specific coding. For small files, which is whatI had to deal with, my approach was fine. Everyone seems to want to parse their iTunes Library file, and those are ridiculously huge. I tried to get around that with a iTunes Music Library binary parser and was most of the way there until they changed the format from odd to completely goofy.

    If anyone co