NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.
All the Perl that's Practical to Extract and Report
Stories, comments, journals, and other submissions on use Perl; are Copyright 1998-2006, their respective owners.
Thanks (Score:2)
A quick way to upgrade yourself from level 2 to 3 is reading http://juerd.nl/site.plp/perluniadvice [juerd.nl]
Re:Thanks (Score:2)
So, I've been thinking there need to be some standards for CPAN modules to declare if it accept/return strings or bytes. (If they need to handle both)
For instance, HTML::Parser has an instance method called utf8_mode [cpan.org].
Another example (that triggered me to write this entry) is Catalyst's uri_for() method [cpan.org]. At some release the developers changed the implementation to accept only strings (UTF-8 flagged or not) in its %query_values hash.
Based on the complaints and patches made by Japanese developers, they changed the code to accept both strings OR utf-8 bytes, by doing utf8::encode() if utf8::is_utf8(); Like said in the post, this might break latin-1 strings if it's not explicitly upgraded by users using utf8::upgrade() before passing it to the method.
I was suggesting them to make another method, like uri_for_bytes, so as it won't do any utf8::encode() inside the module to treat everything as bytes. But another idea flashed me like "Hey, perl has a core pragma to say that".
bytes.pm.
Does this sound crazy if we change the behavior of these modules by looking at %^H hash values to see if bytes.pm is enabled? (Maybe we can wrap it like bytes::enabled). I know enabling bytes.pm affects functions like index(), substr() and length() globally, so this might not be what you want. They just might want to pass one argument as bytes, and let the other modules/behaviors still be in Unicode semantics. Maybe some packaged scope for bytes pragma?
Hmm.
Reply to This
Parent
Strings or bytes (Score:2)
A trap is the UTF-8 string, which is a byte string representing characters, and has "the flag" off (which to perluninewbies is confusing because this flag is called UTF8). Compare this with the result of pack "N*", LIST, which is a byte
Re: (Score:2)
And also, I'm a bit afraid that you misunderstood what I meant with mention to bytes.pm. I didn't mean we should call "use bytes" in this situation to force string operations to be bytes-wise. Not at all.
I meant declaring "use bytes" *might be* a good way for programmers to tell the module authors "Hey I want this module to do what
Re: (Score:2)
Experience has show so far that the only workable way of supporting both byte strings and text strings in your function, is to provide two separate functions, or a mecha
Re: (Score:2)