Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

Thursday February 05, 2004
05:09 PM

A tale of two orders

[ #17236 ]

I have been developing Mac::iTunes on a Mac---no surprise there---but I meant for parts of it to work anywhere. Perl is mostly architecture independent, but I got bit with one of the bits it cannot help.

As part of the parser that reads the binary iTunes database file, I read two or four bytes and then unpack them into shorts or longs, which, in Perl, are just numbers.

Now, on a PowerPC, the big bits show up on the left just like our thousands place show up on the left of the hundreds place. This is not true on all processors though. Some mix up the bytes so the bits end up all over the place. Ick!

On the Mac, my parsing stuff worked because I do not have to worry about byte ordering, and the binary file has the bytes in the right order. When I tested the module on another unix account yesterday, everything exploded, sending goofy characters all over the screen and messing up my terminal (a windows client that is far inferior to Terminal). Somewhere the parser read something wrong, got confused, and starting slurping bytes it should not have slurped. Things that were not strings became strings with wierd characters.

At first I suspected a unicode boo-boo, since I had started to muck around with the unicode strings that iTunes uses, but "use bytes" did not do anything to help.

Now, since this explosion messed up my terminal, I had a hard time getting output I could read. I tried to redirect stderr to stdout and then stdout to a file (in BASH), but that did not work. I found a little BASH trick that did, though, and I still do not know why it worked and the other did not.

# does not work
% perl script 2>&1 > test.out
 
#works
% perl script &> test.out

I took the output back to my computer and stared at it for a bit. Really, just looked at it while I ate some Spaghettios I heated in my canteen cup.
I wondered how many bytes the first string was, for some reason. I counted 1280. That number does not look special to me, but I knew is was supposed to be 5.

I thought for a moment. I wondered "What is 5 if the byte order is reversed?", figuring I would get some other strange number. If I have the string "\000\005", and I turn it into a short with unpack, I get 5 on my PowerPC. If I turn it into a short on an Intel, I get...yep, I get 1280.

Huzzah.

I now have to deal with this in my code. I know unpack can figure this out because there is network order and VAX order formats, although I always get them mixed up. I need to turn this unpack into something that give the same answer on both architectures.

unpack( "S", $data );

With a two options, either the network or VAX order, I get it on the second try.

unpack( "n", $data );

Thus, with this fix, *BSD folks (and, if they play nicely, maybe the Linux folks) can parse the iTunes database format too. They will even find a spiffy new Makefile.PL that does not bother them with all the Mac specific bits.

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • Bash stuph (Score:3, Insightful)

    by Ovid (2709) on 2004.02.05 17:32 (#28113) Homepage Journal
       # does not work
        % perl script 2>&1 > test.out

    I'm not entirely certain why, but when you do that, only what would originally have been sent to STDOUT is redirected to test.out and your warnings will go to STDOUT. Change that to:

    %perl script > test.out 2>&1
        #works
        % perl script &> test.out

    Never seen that before, but then, I still don't understand Linux terribly well.

    • Re:Bash stuph (Score:4, Interesting)

      by vsergu (505) on 2004.02.05 17:35 (#28115) Journal

      From man bash:

      Note that the order of redirections is significant. For example, the command

      ls > dirlist 2>&1

      directs both standard output and standard error to the file dirlist, while the command

      ls 2>&1 > dirlist

      directs only the standard output to file dirlist, because the standard error was duplicated as standard output before the standard output was redirected to dirlist.

      • by Ovid (2709) on 2004.02.05 17:43 (#28116) Homepage Journal

        man bash: directions for using the feminist shell :)

      • It is easy to figure out if you remeber that bash does the redirections in the order specified. And that it uses dup to do the redirections, making a copy of the current file descriptor. You will get the same behavior from Perl code. open(STDERR, ">&STDOUT"); open(STDOUT, ">dirlist"); open(STDOUT, ">dirlist"); open(STDERR, ">&STDOUT");

        BTW, bash has a shorthand for redirectiny stdin and stderr together. ls &> file

        • It only seems easy to remember, but in my mind it does not work properly. If I redirect something to stdout, then redirect stdout, in my mind anything in stdout should go to the new place. Alas, that is not the case.
  • Your post got me thinking...

    In my File::SAUCE module [cpan.org] I have a pack template. A test on a solaris machine [perl.org] was giving me messages like this:

    t/20-read.........#     Failed test (t/20-read.t at line 65)
    #          got: '256'
    #     expected: '1'

    I was using 'S' in my template. So, on my win32 box, i plug in 'n' just to see what it would do. It was giving me the same errors as above.

    So, as per perl-port [perldoc.com], i now do an endian-ness check and use 'n' or 'S' wh

  • I thought it was pretty interesting how you puzzled out that byte orders were different on different platforms and how to work around it. I thought all programmers knew about little endian and big endian byte orders.

    Then I realized that Perl does an excellent job of hiding byte order. With Perl, it is much less common to read binary structures than in C. Since pack does a good job of handling the differences, byte order just becomes part of the specification of the format.

    Basically, there are two d

    • I do know about byte orders. I just have not had to deal with it for a long time, so I was not thinking about it. All I knew when I started was that the function was reading too many bytes, and, as usual, I started by looking at changes to the code I had made recently.

      My first battle with endianness was moving a couple of gigabytes of data from an intel machine to a motorola based one. To make the analysis of this data easier, I needed to flip around the byte order of the longs. C was taking too long, s
      • I've had a little more experience than most people with byte order and Perl, primarily because of MacPerl work. But it also came into play with stuff like Storable and DB_File output when going from SPARCs or PPCs to Intels, or vice versa. Hurray for Storable's nstore()!