Slash Boxes
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

ethan (3163)

  reversethis-{ed. ... rap.nov.olissat}

Being a 25-year old chap living in the western-most town of Germany. Stuying communication and information science and being a huge fan of XS-related things.

Journal of ethan (3163)

Wednesday February 19, 2003
06:10 AM

putting perl on diet

[ #10662 ]

I got this idea from a thread on comp.lang.perl.misc (and notably some encouraging from Janek Schleicher who considered the idea very cool): Creating a module that allows storing of data zlib-compressed.

I initially thought this could be done via tie() but Perl's tying interface is too limited to do that effectively. For scalars you only have FETCH() and STORE(). This defeats the purpose of compression for example in the following code:

        $string = "string" x 1_000_000;
        print substr $string, 1, 1;

Obviously, via tie() this would result in uncompressing the whole data in memory. It would also be very slow.

The obvious solution therefore is (apart from adding SUBSTR and all the other string-operators to the tie-interface) a class of its own with a little bit of overloading of "", .= etc.

It sounds much more trivial than it is as I had to realize. I started hacking away the XS part till I could at least store and get the data. The string becomes a linked list of buffers with the original large string divided into CHUNK_SIZE-large pieces which are then compressed into the aforementioned buffers. After that I was eager to do a little benchmark:

        my $uncompressed;
        my $compressed = String::Compress->new;

        cmpthese (-2, {
                        compressed => sub {
                                $compressed->store("hallo" x 1023);
                                my $d = $compressed->get;
                        uncompressed => sub {
                                $uncompressed = "hallo" x 1023;
                                my $d = $uncompressed;

Urmmh, here's the embarrassing part now:

        compressed: 5 wallclock secs ( 1.02 usr + 1.18 sys = 2.20 CPU) @ 509.09/s (n=1120)
        uncompressed: 4 wallclock secs ( 2.05 usr + 0.00 sys = 2.05 CPU) @ 40707.32/s (n=83450)
                                        Rate compressed uncompressed
        compressed 509/s -- -99%
        uncompressed 40707/s 7896% --

So it's slightly slower.

On the other hand, "hallo" x 1_000_000 eats about half the memory an ordinary Perl scalar would need. When increasing CHUNK_SIZE to a real large value such as 500_000 (it's just 4096 right now) it could probably be further dropped to less than 10kb (for a repetitive string like the above only, of course).

But my actual concern is something else: I reimplement the string operators as methods which is at least feasible for thinks like chomp, substr etc. But what about regular expressions? I'd need to reimplement Perl's RE-engine (working on segmented compressed little strings which form one large string!). I think I'll leave that to someone else (Janek perhaps:-).

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
More | Login | Reply
Loading... please wait.
  • I wonder whether it's possible to implement this using magic. Look up PERL_MAGIC_uvar in the perlguts manpage to see what I mean.
    • I wonder whether it's possible to implement this using magic. Look up PERL_MAGIC_uvar in the perlguts manpage to see what I mean.

      I am not sure whether the U magic is powerful enough. The ufuncs struct simply contains a pointer to a get and set function. The third member, uf_index, is just an IV that doesn't seem to be used for anything else other than as an identifier (that is what grepping through the 5.8.0 sources suggests).

      There is a whole mot more of magic available, but I am not sure whether I am su