NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.
All the Perl that's Practical to Extract and Report
Stories, comments, journals, and other submissions on use Perl; are Copyright 1998-2006, their respective owners.
Unicode (Score:1)
Actually they were standard Unicode characters. And I wasn't actually trying to find edge cases; I was just aiming for nice typography and stumbled upon the bug by accident!
For the record, I'd like it to be known I wasn't anywhere near Windows! I was actually using Ubuntu Linux running Gnome. Keyboard preferences lets you define a 'compose' key (I chose Caps Lock, cos that isn't something I ever use) then you can type sequences like
Compose ---to get an em dash, orCompose "<to get opening curly quotes; the sequences are reasonably mnemonic.And those are legitimate Unicode characters. Latin-1 doesn't have them, but then Latin-1 is only an 8-bit encoding so doesn't have most characters. Windows CP1252 caused problems by being kind-of like Latin-1, but with additional characters filling in slots Latin-1 left unused; CP1252 text often got mislabelled as Latin-1, messing things up for non-Windows users.
But all the CP1252 characters are in Unicode, and today you're much better off using the Unicode UTF-8 encoding than either Latin-1 or CP1252, especially on the web.
(And apologies for the delayed response; feed backlog built up while away.)
Reply to This