Slash Boxes
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
More | Login | Reply
Loading... please wait.
  • i think both of the prior comments are on target.

    If you have such quantities of data that scaling and calling log repeatedly is a problem - and only if - there are old integer bitbang routines for log2 that could be done with XS or Inline::C or PDL.

    Or you could (ab)use Perl 5.10 pack() to grab the floating point representation's exponent

    Note that 256 buckets is a lot, hi res, for loglinear data unless it already was floating point or Math::BigFloat - as log2(MAXLONG) - log2(1) 256 or 8 bits -- it's the aize of plain float's exponent. BUT grabbing it is hard since it's offset one bit by the sign.

    If your data is not log clean but may include the gamut from -INF to 0 to +INF, if you grab the sign and the top 7 bits of exponent, that's nice, but it's not a single numeric range -- exponent is UCHAR biased by 127 [or by 63 after we nip one bit taking a byte with the sign] -- while the sign prefixed to it inverts direction.) A crass fix -

    my $s;
    my $n= unpack("C",pack("f>",$data)); #> on x86 cpu
    $n ^= 0xff if $s=$n & 0x80;
    $n ^= 0x80 if !$s;

    if its all positive, and you want full 256 buckets without XS or PDL or log(), the best I can see is unpack with B9, discard the leading sign bit, repack B8, unpack C. But if you're going to do that might as well get full dynamic range from 11 bit 'double' exponent.

    sub log2{
      require 5.010; # assumes x86 too...
      # should die if arg <= 0 ...
      # instead gives log2(abs())
      my $str =unpack("B12",pack("F>", shift));
         $str =~ s/^[01]/00000/;  # drop sign and pad
      my $exp =  unpack("s>",pack("B*", $str ));
      return -1023 + $exp;

    With a bit of magic number abuse, we can easily squeeze out a single fractional bit as well, which could give upto 4096 buckets for positive numbers, and as many more for negatives if you rescale somewhere.

    Why 5.10? Because on any commodity platform (x86) we need to coerce to a sensible bit/byte order. I am sure I could work out a x86 specific way with older pack but life is too short.

    # I had a sig when sigs were cool
    use Sig;