NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.
All the Perl that's Practical to Extract and Report
Stories, comments, journals, and other submissions on use Perl; are Copyright 1998-2006, their respective owners.
This one is sad (Score:2)
Re:This one is sad (Score:2)
* It only breaks apart URIs, it doesn't put them back together
* The parser needs to break on either a ; or a &, not both of them at the same time. Although there shouldn't be both, I'm painfully aware that "shouldn't be" means "is".
* There is no way for the programmer to tell URI which delimiter to use. This is the rather troublesome part because it has implications across the interface.
* There are no tests to go along with the patch. Specifically, tests have to ensure that the semicolon splitting only works on the query string and not anything else that might be in the URL. I don't think this is a problem, but a patch isn't a patch until it does the whole task.
* You simply can't change the behavior of an often used module and expect it not to upset a lot of people. When CGI.pm went to the ; by default, it broke scripts for some of my clients. I expect that a change to URI might do the same.
It's not as simple as adding a single character to a file. There's actually a lot of work that needs to happen to ensure thigns don't break, other software won't break because of the change, the interface stays consistent, and the feature works in the way it should.
So far, I haven't seen anyone who's step forward to do all of that.
Reply to This
Parent
Re: (Score:2)
We had need of scanning URLs, not generating them. So, I'm embarrased to admin, I completely ignored the generation issue when figuring out a one-line patch and generating the bug report. Generating URLs is more complicated, because you'd need a way to specify whether you're going to emit them into HTML or XHTML. And the W3C recommendation isn't crystal clear on what the rules are. Oh yeah, and tests.
Sigh.
Re: (Score:1)
That appears to make no sense whatsoever.
Re: (Score:2)
More context. If you're generating a URL to go into HTML, you typically use & to separate paramaters. For XHTML, if you're playing by the rules, you have the option of using & or ;
Surprised me, too, but it's in the W3C recommendation.
Re: (Score:1)
But that rule applies to any content you put in XHTML or HTML documents. The fact that it’s a URI is a red herring.
Putting entity escaping into the URI processing code is bad distribution of responsibilities. It is the caller’s job to put the URI through entity escaping when the output necessitates it.
Re: (Score:2)
HTML doesn't require that the amperand (when used to separate key value pairs) be escaped in hrefs; XHTML does.
See http://www.w3.org/TR/xhtml1/#C_12 [w3.org]
Re: (Score:2)
Arg. That was supposed to be a preview, and not a post. I'd intended to add that while ensuring correct escape for XHTML is the programmers responsiblity, adding another layer of escaping into the pipeline after URI, rather than having a flavor of URI that knows how to escape for XHMTL, seems to me as thought it's putting the burden into the wrong place.
Re: (Score:1)
That makes the least sense yet. If the program is outputting URIs in HTML, it is outputting HTML, and so it has to deal with properly escaping content in other contexts anyway. What differentiates URIs from other content such that piercing the separation of concerns is sensible in their case?
Re: (Score:1)
The advantage of using semicolon is that a properly encoded URI is always valid HTML or XHTML and doesn't need to be escaped. The downside is that s
Re: (Score:2)
But, like someone else said, the HTML escaping has nothing to do with the fact that it's an URL. Any attribute of a HTML tag ought to be escaped. It is an extra layer on top of the content, but it is not part of the content itself. For example, the content of the attribute bar in the tag <foo bar="a&b"> is "a