Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • as long as you're using an 8-bit character set. Unicode kinda makes this sub-optimal :)
    • It should do just as well on UTF8, no? Except for its probable confusion on string length vs. number of bytes..
      • Nope. UTF-8 is particularly bad. In addition to potentially having combining characters in the match, which is a Unicode issue, you can run into lots of cases where the second (and third, and fourth, and fifth...) byte of two multibyte characters are identical but the characters aren't the same because the first byte isn't.

        This technique only works on fixed-width characters in sets with no combining characters (In which case it works really well) or in cases where you can ensure that you're only using the subset of characters that don't allow combining characters.