Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • Actually they were standard Unicode characters. And I wasn't actually trying to find edge cases; I was just aiming for nice typography and stumbled upon the bug by accident!

    For the record, I'd like it to be known I wasn't anywhere near Windows! I was actually using Ubuntu Linux running Gnome. Keyboard preferences lets you define a 'compose' key (I chose Caps Lock, cos that isn't something I ever use) then you can type sequences like Compose --- to get an em dash, or Compose "< to get opening curly quotes; the sequences are reasonably mnemonic.

    And those are legitimate Unicode characters. Latin-1 doesn't have them, but then Latin-1 is only an 8-bit encoding so doesn't have most characters. Windows CP1252 caused problems by being kind-of like Latin-1, but with additional characters filling in slots Latin-1 left unused; CP1252 text often got mislabelled as Latin-1, messing things up for non-Windows users.

    But all the CP1252 characters are in Unicode, and today you're much better off using the Unicode UTF-8 encoding than either Latin-1 or CP1252, especially on the web.

    (And apologies for the delayed response; feed backlog built up while away.)