Stories
Slash Boxes
Comments

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

rhesa (5696)

rhesa
  (email not shown publicly)

Journal of rhesa (5696)

Friday July 29, 2005
07:38 AM

HTTP spec problem for lwp scripts

[ #25952 ]

I have a script that fetches an rss feed from blogs.com. A couple of days ago it broke. Reason: server is sending gzip encoded content. I felt that was mightly impolite, but it turns out the HTTP spec is on their side:

If no Accept-Encoding field is present in a request, the server MAY assume that the client will accept any content coding. In this case, if "identity" is one of the available content-codings, then the server SHOULD use the "identity" content-coding, unless it has additional information that a different content-coding is meaningful to the client.

http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html , Section 14.3

Since LWP by default does not send any Accept-Encoding request header, this means sending gzip'ed content is perfectly acceptable. But LWP doesn't handle that, and simply gives us the undecoded content.

It would seem - to me at least - that the quickest fix to LWP would be to include a 'Accept-Encoding: identity' in the request headers by default:

If an Accept-Encoding field is present in a request, and if the server cannot send a response which is acceptable according to the Accept-Encoding header, then the server SHOULD send an error response with the 406 (Not Acceptable) status code.

ibid.
(Although I'd have to say that a 406 response would be pretty useless.)

Unfortunately, with this particular server, that does not do the trick:

[user@host]% GET -H 'Accept-Encoding: identity' -U -e -d http://$blog_id.blogs.com/index.rdf
GET http://$blog_id.blogs.com/index.rdf
Accept-Encoding: identity
User-Agent: lwp-request/2.06

Connection: close
Date: Thu, 28 Jul 2005 18:46:27 GMT
Accept-Ranges: bytes
Age: 63019
ETag: "1cc057-75c4-427c6400"
Server: Apache
Content-Encoding: gzip
Content-Length: 8956
Content-Type: application/rdf+xml
Last-Modified: Wed, 20 Jul 2005 14:00:48 GMT
Client-Date: Fri, 29 Jul 2005 12:16:46 GMT
Client-Peer: 216.129.107.21:80
Client-Response-Num: 1
X-Cache: HIT from www.sixapart.com

If you don't have a plain text version, then where's the 406, guys?

Which means that it would be nice if LWP could decode the content when it sees the Content-Encoding: gzip header, and if a suitable Compress module is available.

I'd like to point out that - in my opinion - the content-encoding is a transport detail, and should not have to matter to the user. In other words, I would appreciate LWP to handle that for me transparently.

Maybe I should send in a patch....

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.