NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.
All the Perl that's Practical to Extract and Report
Stories, comments, journals, and other submissions on use Perl; are Copyright 1998-2006, their respective owners.
Sounds like a job for The Demorniser (Score:2)
Sounds like another job for The Demoroniser [fourmilab.ch]... Or an equivalent tool.
-- "It's not magic, it's work..."
Re:Sounds like a job for The Demorniser (Score:2)
It's probably a programmer somewhere who thinks that iso-8859-1 and Windows-1252 are the same thing.
Re:Sounds like a job for The Demorniser (Score:2)
Fair point. You can probably assume that the main problem is when Windows sends out a Windows encoding say cp-1252, but lies and calls is iso-8859-1.
One possible solution is to scan any iso-8859-1 files, looking for any diagnostic control-code points (Demoroniser has some suggestions for that), and then ask a recoding program convert this to a civilised scheme, or correctly set the Content-Type.
I have to deal with this all the time, users think that Windows is "correct", then the output a document, incorrectly tagged, and then when they don't see a trademark symbol (™) or micon symbol (µ) on the web site it's my fault! Yesterday several hours were wasted because of the lies that Windows tells...
-- "It's not magic, it's work..."
Reply to This
Parent