Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • That's a very clear explanation of what I've thought for some time, but was unable to phrase.

    A quick way to upgrade yourself from level 2 to 3 is reading http://juerd.nl/site.plp/perluniadvice [juerd.nl] ;-).
    • Thank *you* for the good tutorial.

      So, I've been thinking there need to be some standards for CPAN modules to declare if it accept/return strings or bytes. (If they need to handle both)

      For instance, HTML::Parser has an instance method called utf8_mode [cpan.org].

      Another example (that triggered me to write this entry) is Catalyst's uri_for() method [cpan.org]. At some release the developers changed the implementation to accept only strings (UTF-8 flagged or not) in its %query_values hash.

      Based on the complaints and patches made by
      • Strings or bytes is not the right distinction, because both kinds are strings. I usually call them "text string" and "binary string", or "character string" and "byte string". Sometimes I call the former "Unicode string" to emphasize that all text strings are Unicode strings.

        A trap is the UTF-8 string, which is a byte string representing characters, and has "the flag" off (which to perluninewbies is confusing because this flag is called UTF8). Compare this with the result of pack "N*", LIST, which is a byte
        • Hm, just to clarify, I prefer to use characters vs. bytes like you say. If I sometimes use "strings" somewhere, it's just a slip of keystrokes, or I meant Unicode strings instead.

          And also, I'm a bit afraid that you misunderstood what I meant with mention to bytes.pm. I didn't mean we should call "use bytes" in this situation to force string operations to be bytes-wise. Not at all.

          I meant declaring "use bytes" *might be* a good way for programmers to tell the module authors "Hey I want this module to do what
          • "use bytes;" is lexical: it cannot influence what a module does. I don't know who to thank for this, but I'm happy that at least my code won't be broken at a distance by the numerous uninformed and misinformed people who throw a "use bytes" at their code to replace one kind of (for them) vague behavior with another kind of vague behavior. :)

            Experience has show so far that the only workable way of supporting both byte strings and text strings in your function, is to provide two separate functions, or a mecha
            • Agreed in both: we should use two different functions to accept characters or bytes, and also BLOB.pm would be useful to DWIM. :)