Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • The encodings handling needs some cleaning up. Fortunately it doesn’t appear to be broken so badly as to be hard to fix.

    My “Expressiveness matters” [perl.org] post gets its curly quotes encoded “ and ”, respectively, which are undefined in the ISO-8859-1 charset the pages claim to be encoded in. They are only defined in Windows Codepage 1252. It still works browsers have generally given up and just treat the two as equal (which is doable because Win1252 is a true superset of Latin1),

    • by pudge (1) on 2005.04.14 14:17 (#39690) Homepage Journal
      Not only are you not forgiven for claiming to be Latin1 when you are Win1252 there

      And you are not forgiven for *using* Win1252 in the first place. I am not sure it is correct for me to try to fix your mistake and guess at what character you intended. How can I know you meant those to be curly quotes, and not something else? Sure, those are undefined in Latin-1, but how do I know what charset you are using, if you're not using Latin-1?
      • Ugh! You are correct. The problem is precisely the aforementioned fact that browsers treat Latin1 as Win1252: the form is Latin1, so when I paste curly quotes, my browser throws its arms up and sends Win1252, instead of telling me. Gahhhh.

        Can we please have UTF-8 as soon as manageably possible? :-(

        • In Slash right now, we have special casing for high-bit chars, for sites that want plain ASCII. What I can probably do is add to that, for sites like useperl that are more open, special-casing those few chars from 128-159. It should catch most cases, like this one. It sucks, but ... so does the web. :-)

          As to UTF, we tried it once and it messed us up in various ways, largely due to browser support, so I am not eager to try again any time soon. I think this is the best way for now, converting everything
        • I implemented the special-casing for those few non-Latin-1 chars that browsers like to send. Your journal entry title now has the proper encoding.
          • It had it before the fix as well; after our exchange, I went and fixed the entities manually. If you want I can try seeing what happens if I change the entities back though.