Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • If I haven't been misinformed, -B uses the current STDIO buffer of the filehandle for its value, so you could seek near the end and possibly get a different result for -B as the file became "more binary".
    --
    • Randal L. Schwartz
    • Stonehenge
    • For the record, I decided to just match each line against \0 as I read it, and that seems to work fine for now. Not quite as advanced a heuristic as -B, but good enough.

      --
      J. David works really hard, has a passion for writing good software, and knows many of the world's best Perl programmers
  • The heuristics seem to be pretty simple... if you ignore fancy bits like Unicode, locales, EBCDIC, MS-DOG line endings, and accept also the vertical tab as whitespace, the test for -B is pretty much

    3 * tr/\0-\x07\x0e-\x1a\x1c-\x1f\x7f-\xff/\0-\x07\x0e-\x1a\x1c-\x1f\x7f-\xff/ > length

    That is, printable and whitespace ASCII and ESC are okay,
    others not, and if there are more than 1/3 not okays, call it binary.
    • That seems like the kind of thing that would be nice to expose in a function, in much the same way uc, lc, and glob started their lives.

      --
      J. David works really hard, has a passion for writing good software, and knows many of the world's best Perl programmers
      • > That seems like the kind of thing that would be nice to expose in a function, in much the same way uc, lc, and glob started their lives.

        I dunno... I think the heuristic is so weak (false positives for arguably "text" data, for example), and by definition a binary test (ta-dah!), as opposed to multivalued, that I find little value of exposing that logic. I think e.g. adding my snippet to the FAQ should be quite enough for those who need it.