Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • I've yet to find a bad experience of the entity conversion issue, and we deliver a web interface to a couple of million users in different languages, using AxKit.

    Where are all these bad browsers?
    • by ziggy (25) on 2004.08.19 8:06 (#33494) Journal
      Yes, you're right. The browser is where the problem manifests itself, not the cause of the problem.

      It's been a while since I looked at AxKit, but I think the (XML) output that is serialized to the browser is properly re-escaped. This is the correct behavior according to the XML character model.

      For some reason, this system goes to extreme measures to do as little work as possible in each transaction. That includes passing around raw XML text instead of higher level data structures, and making as few modifications to that buffer as technically possible.

      I'm not going to comment as to whether this "performance optimization" is good or bad, useful or not, or even whether I agree with it. But that's the way this system is designed. And that's why it was ripe for this integration bug once a real version of expat was loaded.

      • I think libxslt's HTML renderer just sends raw UTF-8 to the output - it doesn't go to any effort to re-encode it. I'm still not seeing any problems though - maybe things have changed in browser terms, but opinions of what you have to do haven't?
        • There's probably a charset issue at the root of it all. I saw the problem again the other day -- an entity that came in as – got parsed into a unicode character and needed to be output as either – or – to render properly. When it came out as a unicode character, it was unrenderable (the obligatory question mark instead of an ndash). Most likely a unicode multibyte sequence coming out in iso8859-1 or even ascii.

          The tool chain is old enough that there are likely lots of XM

          • Just a thought, but could you use Encode::encode('us-ascii',$xml,Encode::FB_XMLCREF)?

            -Dom

            • I could, except this project is a mixture of Tcl and C. [*]

              The problem came about because the standard version of expat was munged, and linking in a new module with a clean expat broke one or the other of the dependencies. And expat was munged in the first place explicitly to avoid re-escaping previously escaped entities on output once they had been parsed into unicode characters. (A performance optimization to do as little work as possible, and work at the lowest layer possible, to enable high throu