NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.
All the Perl that's Practical to Extract and Report
Stories, comments, journals, and other submissions on use Perl; are Copyright 1998-2006, their respective owners.
Thanks (Score:2)
A quick way to upgrade yourself from level 2 to 3 is reading http://juerd.nl/site.plp/perluniadvice [juerd.nl]
Re: (Score:2)
So, I've been thinking there need to be some standards for CPAN modules to declare if it accept/return strings or bytes. (If they need to handle both)
For instance, HTML::Parser has an instance method called utf8_mode [cpan.org].
Another example (that triggered me to write this entry) is Catalyst's uri_for() method [cpan.org]. At some release the developers changed the implementation to accept only strings (UTF-8 flagged or not) in its %query_values hash.
Based on the complaints and patches made by
Strings or bytes (Score:2)
A trap is the UTF-8 string, which is a byte string representing characters, and has "the flag" off (which to perluninewbies is confusing because this flag is called UTF8). Compare this with the result of pack "N*", LIST, which is a byte string representing numbers. You'll note that UTF-32 looks a lot like pack N* in practice
I strongly believe that the behavior of accepting both UTF-8 encoded strings, and SvUTF8 flagged strings, in the same function, is wrong.
I also strongly suggest that any use of "use bytes" and functions in the utf8:: namespace, is misguided. If you want to use bytes, either have a function that deals only with byte input, or have a function that deals only with text input and encode it yourself. The bytes.pm stuff does not encode, it provides a view into perl's internal byte buffer. For text strings, the encoding of this buffer may be either utf8 or latin1.
For DBIx::Simple I have a similar dilemma. I can easily add automatic decoding/encoding for database values, and would love to do so. But databases can also be used for storing binary values. My current plan is to release a very simple CPAN module:
(Typed in my browser, untested.) Then, DBIx::Simple doesn't have to parse SQL and know which columns are blobs: the user can mark a string as a BLOB and I can just skip encoding for those values. PerlIO layers could be told to skip things marked as BLOB when encoding too.
Functions that for some reason need to accept both kinds of string (which can be necessary to support existing stuff, or in heavily abstracted code, but should generally be avoided), can then just tell the user to mark byte strings as blobs before passing them.
To take it one step further, it should have a mechanism like encoding::warnings in place to disallow (fatal error would be best, I believe) upgrading a BLOB.
Reply to This
Parent
Re: (Score:2)
And also, I'm a bit afraid that you misunderstood what I meant with mention to bytes.pm. I didn't mean we should call "use bytes" in this situation to force string operations to be bytes-wise. Not at all.
I meant declaring "use bytes" *might be* a good way for programmers to tell the module authors "Hey I want this module to do what
Re: (Score:2)
Experience has show so far that the only workable way of supporting both byte strings and text strings in your function, is to provide two separate functions, or a mecha
Re: (Score:2)