Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • Reading your post, I had an "I wonder if..." thought and checked the bug report. Yup. Eleven months agoI found and reported the same bug, noting that bug 27 had the fix, and adding a pointer to the W3C recommendation. Sad that the (one line) fix hasn't been applied, since URI is part of the core distribution.
    • I looked at the fix to the URI module, and after about an hour stop working on it. There are several problems with the one-character patch:

      * It only breaks apart URIs, it doesn't put them back together

      * The parser needs to break on either a ; or a &, not both of them at the same time. Although there shouldn't be both, I'm painfully aware that "shouldn't be" means "is".

      * There is no way for the programmer to tell URI which delimiter to use. This is the rather troublesome part because it has implications
      • We had need of scanning URLs, not generating them. So, I'm embarrased to admin, I completely ignored the generation issue when figuring out a one-line patch and generating the bug report. Generating URLs is more complicated, because you'd need a way to specify whether you're going to emit them into HTML or XHTML. And the W3C recommendation isn't crystal clear on what the rules are. Oh yeah, and tests.

        Sigh.

        • Generating URLs is more complicated, because you'd need a way to specify whether you're going to emit them into HTML or XHTML.

          That appears to make no sense whatsoever.

          • More context. If you're generating a URL to go into HTML, you typically use & to separate paramaters. For XHTML, if you're playing by the rules, you have the option of using & or ;

            Surprised me, too, but it's in the W3C recommendation.

            • But that rule applies to any content you put in XHTML or HTML documents. The fact that it’s a URI is a red herring.

              Putting entity escaping into the URI processing code is bad distribution of responsibilities. It is the caller’s job to put the URI through entity escaping when the output necessitates it.

              • HTML doesn't require that the amperand (when used to separate key value pairs) be escaped in hrefs; XHTML does.

                See http://www.w3.org/TR/xhtml1/#C_12 [w3.org]

                • It should be escaped in HTML too. The fact that it mostly works if you don't escape them, is thanks to browsers that try to accept anything people throw at them and make sense of it.

                  But, like someone else said, the HTML escaping has nothing to do with the fact that it's an URL. Any attribute of a HTML tag ought to be escaped. It is an extra layer on top of the content, but it is not part of the content itself. For example, the content of the attribute bar in the tag <foo bar="a&amp;b"> is "a&b". At least, that is how I understand it.