Slash Boxes
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
More | Login | Reply
Loading... please wait.
  • The encodings handling needs some cleaning up. Fortunately it doesn’t appear to be broken so badly as to be hard to fix.

    My “Expressiveness matters” [] post gets its curly quotes encoded “ and ”, respectively, which are undefined in the ISO-8859-1 charset the pages claim to be encoded in. They are only defined in Windows Codepage 1252. It still works browsers have generally given up and just treat the two as equal (which is doable because Win1252 is a true superset of Latin1),

    • Not only are you not forgiven for claiming to be Latin1 when you are Win1252 there

      And you are not forgiven for *using* Win1252 in the first place. I am not sure it is correct for me to try to fix your mistake and guess at what character you intended. How can I know you meant those to be curly quotes, and not something else? Sure, those are undefined in Latin-1, but how do I know what charset you are using, if you're not using Latin-1?
      • Ugh! You are correct. The problem is precisely the aforementioned fact that browsers treat Latin1 as Win1252: the form is Latin1, so when I paste curly quotes, my browser throws its arms up and sends Win1252, instead of telling me. Gahhhh.

        Can we please have UTF-8 as soon as manageably possible? :-(

        • by pudge (1) on 2005.04.14 14:35 (#39693) Homepage Journal
          In Slash right now, we have special casing for high-bit chars, for sites that want plain ASCII. What I can probably do is add to that, for sites like useperl that are more open, special-casing those few chars from 128-159. It should catch most cases, like this one. It sucks, but ... so does the web. :-)

          As to UTF, we tried it once and it messed us up in various ways, largely due to browser support, so I am not eager to try again any time soon. I think this is the best way for now, converting everything to an entity. That assumes the browser sends us good data, which is an unfortunate assumption, which maybe for this one set of cases I can try to handle separately.