I want to be able to apply the -B test to the contents of an arbitrary scalar to see if it's binary or not. I've got files that are occasionally spewing junk at me; the first N,000,000 records may be just fine, but toward the end they turn into gibberish. I want to print out erroneous records, unless they are binary garbage, in which case I just want to print a statement that says so.
Thinking of looking into IO::Scalar or something...
Uses the current buffer (Score:2)
Re:Uses the current buffer (Score:2)
For the record, I decided to just match each line against \0 as I read it, and that seems to work fine for now. Not quite as advanced a heuristic as -B, but good enough.
J. David works really hard, has a passion for writing good software, and knows many of the world's best Perl programmers
-B cheap plastic imitation (Score:2)
3 * tr/\0-\x07\x0e-\x1a\x1c-\x1f\x7f-\xff/\0-\x07\x0e-\x1a\x1c-\x1f\x7f-\xff/ > length
That is, printable and whitespace ASCII and ESC are okay,
others not, and if there are more than 1/3 not okays, call it binary.
Re:-B cheap plastic imitation (Score:2)
That seems like the kind of thing that would be nice to expose in a function, in much the same way uc, lc, and glob started their lives.
J. David works really hard, has a passion for writing good software, and knows many of the world's best Perl programmers
Re:-B cheap plastic imitation (Score:2)
I dunno... I think the heuristic is so weak (false positives for arguably "text" data, for example), and by definition a binary test (ta-dah!), as opposed to multivalued, that I find little value of exposing that logic. I think e.g. adding my snippet to the FAQ should be quite enough for those who need it.