NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.
All the Perl that's Practical to Extract and Report
Stories, comments, journals, and other submissions on use Perl; are Copyright 1998-2006, their respective owners.
This one is sad (Score:2)
Re: (Score:2)
* It only breaks apart URIs, it doesn't put them back together
* The parser needs to break on either a ; or a &, not both of them at the same time. Although there shouldn't be both, I'm painfully aware that "shouldn't be" means "is".
* There is no way for the programmer to tell URI which delimiter to use. This is the rather troublesome part because it has implications
Re: (Score:2)
We had need of scanning URLs, not generating them. So, I'm embarrased to admin, I completely ignored the generation issue when figuring out a one-line patch and generating the bug report. Generating URLs is more complicated, because you'd need a way to specify whether you're going to emit them into HTML or XHTML. And the W3C recommendation isn't crystal clear on what the rules are. Oh yeah, and tests.
Sigh.
Re: (Score:1)
That appears to make no sense whatsoever.
Re:This one is sad (Score:2)
More context. If you're generating a URL to go into HTML, you typically use & to separate paramaters. For XHTML, if you're playing by the rules, you have the option of using & or ;
Surprised me, too, but it's in the W3C recommendation.
Reply to This
Parent
Re: (Score:1)
But that rule applies to any content you put in XHTML or HTML documents. The fact that it’s a URI is a red herring.
Putting entity escaping into the URI processing code is bad distribution of responsibilities. It is the caller’s job to put the URI through entity escaping when the output necessitates it.
Re: (Score:2)
HTML doesn't require that the amperand (when used to separate key value pairs) be escaped in hrefs; XHTML does.
See http://www.w3.org/TR/xhtml1/#C_12 [w3.org]
Re: (Score:2)
Arg. That was supposed to be a preview, and not a post. I'd intended to add that while ensuring correct escape for XHTML is the programmers responsiblity, adding another layer of escaping into the pipeline after URI, rather than having a flavor of URI that knows how to escape for XHMTL, seems to me as thought it's putting the burden into the wrong place.
Re: (Score:1)
That makes the least sense yet. If the program is outputting URIs in HTML, it is outputting HTML, and so it has to deal with properly escaping content in other contexts anyway. What differentiates URIs from other content such that piercing the separation of concerns is sensible in their case?
Re: (Score:1)
The advantage of using semicolon is that a properly encoded URI is always valid HTML or XHTML and doesn't need to be escaped. The downside is that s
Re: (Score:2)
But, like someone else said, the HTML escaping has nothing to do with the fact that it's an URL. Any attribute of a HTML tag ought to be escaped. It is an extra layer on top of the content, but it is not part of the content itself. For example, the content of the attribute bar in the tag <foo bar="a&b"> is "a