Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

rhesa (5696)

rhesa
  (email not shown publicly)

Journal of rhesa (5696)

Friday July 29, 2005
07:38 AM

HTTP spec problem for lwp scripts

[ #25952 ]

I have a script that fetches an rss feed from blogs.com. A couple of days ago it broke. Reason: server is sending gzip encoded content. I felt that was mightly impolite, but it turns out the HTTP spec is on their side:

If no Accept-Encoding field is present in a request, the server MAY assume that the client will accept any content coding. In this case, if "identity" is one of the available content-codings, then the server SHOULD use the "identity" content-coding, unless it has additional information that a different content-coding is meaningful to the client.

http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html , Section 14.3

Since LWP by default does not send any Accept-Encoding request header, this means sending gzip'ed content is perfectly acceptable. But LWP doesn't handle that, and simply gives us the undecoded content.

It would seem - to me at least - that the quickest fix to LWP would be to include a 'Accept-Encoding: identity' in the request headers by default:

If an Accept-Encoding field is present in a request, and if the server cannot send a response which is acceptable according to the Accept-Encoding header, then the server SHOULD send an error response with the 406 (Not Acceptable) status code.

ibid.
(Although I'd have to say that a 406 response would be pretty useless.)

Unfortunately, with this particular server, that does not do the trick:

[user@host]% GET -H 'Accept-Encoding: identity' -U -e -d http://$blog_id.blogs.com/index.rdf
GET http://$blog_id.blogs.com/index.rdf
Accept-Encoding: identity
User-Agent: lwp-request/2.06

Connection: close
Date: Thu, 28 Jul 2005 18:46:27 GMT
Accept-Ranges: bytes
Age: 63019
ETag: "1cc057-75c4-427c6400"
Server: Apache
Content-Encoding: gzip
Content-Length: 8956
Content-Type: application/rdf+xml
Last-Modified: Wed, 20 Jul 2005 14:00:48 GMT
Client-Date: Fri, 29 Jul 2005 12:16:46 GMT
Client-Peer: 216.129.107.21:80
Client-Response-Num: 1
X-Cache: HIT from www.sixapart.com

If you don't have a plain text version, then where's the 406, guys?

Which means that it would be nice if LWP could decode the content when it sees the Content-Encoding: gzip header, and if a suitable Compress module is available.

I'd like to point out that - in my opinion - the content-encoding is a transport detail, and should not have to matter to the user. In other words, I would appreciate LWP to handle that for me transparently.

Maybe I should send in a patch....

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.