Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

Tuesday January 06, 2004
10:20 AM

Binary file visualizers?

[ #16666 ]

My Mac::iTunes module reads the binary "iTunes Music Library" files that iTunes uses as its database (why can't iPhoto do that?). Each iTunes version changes it slightly, even for the minor versions. That is really no big whoop, but I still have to do a little work to discover how it changed.

I think a good way to handle all of these version issues in the module would be to step back a bit. I want to write a binary file description, in human readable text, then have a Perl module turn that into the parsing code. It is not as complicated as Parse::RecDescent sort of things because it is just going to be a list of things that come after each other and how long they are (with a few exceptions). For instance, a GIF image can be described by such a thing, because it is block-oriented too.

The file for iTunes would look something like this, where the numbers are the bytes, and the names somehow relate to Perl variables the program can access later.

MAJOR_VERSION 1
MINOR_VERSION 1     # the second byte
FOO           1.3.7 # some bit field, bits 3 to 7
HTIM_LENGTH   4
HTIM_BLOCK    4 * HTIM_LENGTH

I write a different description file for each version of iTunes, and I do not have to change the code when iTunes adds a couple of bytes to a block.

I have not really thought about this, so it is a very rough idea, but since I like to grok binary files (just for giggles, you know), this is a really interesting side project for me.

Perhaps, if I can figure out that stuff, I can make a color-coded hex dump utility that visualises all the pieces. That would really be something I would like to have since I am at the level of print-outs and multiple colors of highlighting pens. Even if I could just get it to print out with the colors I wanted I would be happy, but I would rather be thrilled.

So, worthy audience, tell me where this has already been done and why it is a stupid idea!

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • The file(1) program has a magic(5) database that talks about byte offsets and bit positions. You could probably start with that format and add in more semantic hooks about what to do when you get there.

    Or, a more perlish solution would be to define "unpack" values for byte offsets and bit positions.

    --
    • Randal L. Schwartz
    • Stonehenge
    • You mean a more C-ish solution? :)

      magic(5) looks for specific things at specific places, and I need a little bit of logic. For instance, in the iTunes file, the first 100 bytes or so may have fixed meaning, but after that there are variable length blocks, so the offsets now have offsets. I will have to think about that.

      I might be able to describe things in terms of pseudo-pack type things, but that would require a lot of byte-fiddling I think since some of the values are odd numbers of bytes. That may
  • I've actually implemented a backend for this, and am now looking to rewrite the frontend. What I'm using now to specify the format is a large data structure. I've been looking around for an existing mini-language to adapt, but have yet to find one.

    My code is here [shoebox.net]. The binary file parser is File::ComplexFormat [shoebox.net] (after I rewrite with a new mini-language I intend to rename it to Parse::Binary, or something, and release to CPAN); the parser for the data structure is File::ComplexFormat::SpecParser::Array [shoebox.net].