NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.
All the Perl that's Practical to Extract and Report
Stories, comments, journals, and other submissions on use Perl; are Copyright 1998-2006, their respective owners.
Accents (Score:2)
I've been surprised at how consistently my name turns up on official documents here in France. The spelling Rafaël being completely abnormal, of course, and no-one ever spells it correctly, (even I can't bother spelling it correctly most of the time), but on my passport it's right.
I remember when I registered François last year, I got asked quite precise questions about the spelling: a dash or no dash between Garcia and Suarez? They care about that kind of stuff.
Re: (Score:2)
Bah apparently the use.perl comment boxes are not friends with my browser :/
Those were, in order :
00EB LATIN SMALL LETTER E WITH DIAERESIS
00E7 LATIN SMALL LETTER C WITH CEDILLA
Re: (Score:1)
To spell Rafaël and François properly you need to entity-encode the, uh, extravagant characters: use.perl is a Latin-1 Only Zone. Quelle bêtise…
Re: (Score:1)
Hehe, I would say ASCII-only. Rafael's accents perfectly fit in the latin-1 charset.
Re:Accents (Score:2)
I'm suspecting browser character set headers on the form submission, because I can paste a literal ć in no problem. It looks like his browser sent UTF-8, but either described it as ISO-8859-1, or didn't say, resulting in the far end treating it as ISO-8859-1.
Ho ho ho. When that ć comes back to me on preview, the HTML source has turned into
ć.Which reminds me. Currently, does
pod2textusemanas an intermediate step when generating its output?Reply to This
Parent
Re: (Score:1)
The initial problem is that the use.perl.org pages declare iso-8859-1
as its charset. So form data has also to be sent as iso-8859-1. Maybe
a browser shouldn't accept any non-latin1 characters when entering or
pasting data into form fields, but at least gecko-based browsers
doesn't do this. To do something with non-latin1 characters,
gecko-based browsers on Unix system seem to do use this heuristic:
* codepoints below 256 are fine
* if there are codepoints in the 0x80-0x9f range of win1252, then they