Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • I've been surprised at how consistently my name turns up on official documents here in France. The spelling Rafaël being completely abnormal, of course, and no-one ever spells it correctly, (even I can't bother spelling it correctly most of the time), but on my passport it's right.

    I remember when I registered François last year, I got asked quite precise questions about the spelling: a dash or no dash between Garcia and Suarez? They care about that kind of stuff.

    • Bah apparently the use.perl comment boxes are not friends with my browser :/

      Those were, in order :

      00EB LATIN SMALL LETTER E WITH DIAERESIS

      00E7 LATIN SMALL LETTER C WITH CEDILLA

      • To spell Rafaël and François properly you need to entity-encode the, uh, extravagant characters: use.perl is a Latin-1 Only Zone. Quelle bêtise…

        • Hehe, I would say ASCII-only. Rafael's accents perfectly fit in the latin-1 charset.

          • I'm suspecting browser character set headers on the form submission, because I can paste a literal ć in no problem. It looks like his browser sent UTF-8, but either described it as ISO-8859-1, or didn't say, resulting in the far end treating it as ISO-8859-1.

            Ho ho ho. When that ć comes back to me on preview, the HTML source has turned into ć.

            Which reminds me. Currently, does pod2text use man as an intermediate step when generating its output?

            • by srezic (8057) on 2009.07.31 14:44 (#69832) Journal

              The initial problem is that the use.perl.org pages declare iso-8859-1
              as its charset. So form data has also to be sent as iso-8859-1. Maybe
              a browser shouldn't accept any non-latin1 characters when entering or
              pasting data into form fields, but at least gecko-based browsers
              doesn't do this. To do something with non-latin1 characters,
              gecko-based browsers on Unix system seem to do use this heuristic:

              * codepoints below 256 are fine

              * if there are codepoints in the 0x80-0x9f range of win1252, then they
                  are send like this (try LATIN CAPITAL LETTER S WITH CARON for a test)

              * every other codepoint is sent as a numerical HTML entity

              About pod2text: no, *pod2text* does not use man, but *perldoc* uses by
              default pod2man. The plan was to fix pod2text encoding issues (there
              are still some, but they are fixable, in contrast to pod2man) and then
              to use something like Pod::Text::Overstrike or Pod::Text::Termcap
              instead of Pod::Man.

              I just right now created and uploaded
              Pod-Perldoc-ToTextTermcap-0.00_50.tar.gz to CPAN. Just install it and
              set

                  export PERLDOC=-MPod::Perldoc::ToTextTermcap

              or

                  export PERLDOC=-MPod::Perldoc::ToTextOverstrike

              and perldoc will use the new renderer. It looks somewhat different
              than man output, but at least bold and underline is done (unlike with
              stock Pod::Perldoc::ToText).