I have a script that fetches an rss feed from blogs.com. A couple of days ago it broke. Reason: server is sending gzip encoded content. I felt that was mightly impolite, but it turns out the HTTP spec is on their side:
If no Accept-Encoding field is present in a request, the server MAY assume that the client will accept any content coding. In this case, if "identity" is one of the available content-codings, then the server SHOULD use the "identity" content-coding, unless it has additional information that a different content-coding is meaningful to the client.
http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html , Section 14.3
Since LWP by default does not send any Accept-Encoding request header, this means sending gzip'ed content is perfectly acceptable. But LWP doesn't handle that, and simply gives us the undecoded content.
It would seem - to me at least - that the quickest fix to LWP would be to include a 'Accept-Encoding: identity' in the request headers by default:
If an Accept-Encoding field is present in a request, and if the server cannot send a response which is acceptable according to the Accept-Encoding header, then the server SHOULD send an error response with the 406 (Not Acceptable) status code.
ibid.
(Although I'd have to say that a 406 response would be pretty useless.)
Unfortunately, with this particular server, that does not do the trick:
[user@host]% GET -H 'Accept-Encoding: identity' -U -e -d http://$blog_id.blogs.com/index.rdf
GET http://$blog_id.blogs.com/index.rdf
Accept-Encoding: identity
User-Agent: lwp-request/2.06
Connection: close
Date: Thu, 28 Jul 2005 18:46:27 GMT
Accept-Ranges: bytes
Age: 63019
ETag: "1cc057-75c4-427c6400"
Server: Apache
Content-Encoding: gzip
Content-Length: 8956
Content-Type: application/rdf+xml
Last-Modified: Wed, 20 Jul 2005 14:00:48 GMT
Client-Date: Fri, 29 Jul 2005 12:16:46 GMT
Client-Peer: 216.129.107.21:80
Client-Response-Num: 1
X-Cache: HIT from www.sixapart.com
If you don't have a plain text version, then where's the 406, guys?
Which means that it would be nice if LWP could decode the content when it sees the Content-Encoding: gzip header, and if a suitable Compress module is available.
I'd like to point out that - in my opinion - the content-encoding is a transport detail, and should not have to matter to the user. In other words, I would appreciate LWP to handle that for me transparently.
Maybe I should send in a patch....
LWP does it (Score:1)
LWP does it if you ask for the $response->decoded_content instead of $response->content. The decoded_content method was introduced in LWP-5.802.
From: http://www.issociate.de/board/post/155483/Downloading_a_page_compressed.html [issociate.de] ( Look for message #559404 )Re:LWP does it (Score:1)