I have a script that fetches an rss feed from blogs.com. A couple of days ago it broke. Reason: server is sending gzip encoded content. I felt that was mightly impolite, but it turns out the HTTP spec is on their side:
If no Accept-Encoding field is present in a request, the server MAY assume that the client will accept any content coding. In this case, if "identity" is one of the available content-codings, then the server SHOULD use the "identity" content-coding, unless it has additional information that a different content-coding is meaningful to the client.
http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html , Section 14.3
Since LWP by default does not send any Accept-Encoding request header, this means sending gzip'ed content is perfectly acceptable. But LWP doesn't handle that, and simply gives us the undecoded content.
It would seem - to me at least - that the quickest fix to LWP would be to include a 'Accept-Encoding: identity' in the request headers by default:
If an Accept-Encoding field is present in a request, and if the server cannot send a response which is acceptable according to the Accept-Encoding header, then the server SHOULD send an error response with the 406 (Not Acceptable) status code.
(Although I'd have to say that a 406 response would be pretty useless.)
Unfortunately, with this particular server, that does not do the trick:
[user@host]% GET -H 'Accept-Encoding: identity' -U -e -d http://$blog_id.blogs.com/index.rdf
Date: Thu, 28 Jul 2005 18:46:27 GMT
Last-Modified: Wed, 20 Jul 2005 14:00:48 GMT
Client-Date: Fri, 29 Jul 2005 12:16:46 GMT
X-Cache: HIT from www.sixapart.com
If you don't have a plain text version, then where's the 406, guys?
Which means that it would be nice if LWP could decode the content when it sees the Content-Encoding: gzip header, and if a suitable Compress module is available.
I'd like to point out that - in my opinion - the content-encoding is a transport detail, and should not have to matter to the user. In other words, I would appreciate LWP to handle that for me transparently.
Maybe I should send in a patch....