NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.
All the Perl that's Practical to Extract and Report
Stories, comments, journals, and other submissions on use Perl; are Copyright 1998-2006, their respective owners.
That code is painful to read (Score:1)
Why in Larry’s name are you fiddling with the UTF8 flag at all? And for what, turning the flag on and then downgrading the string? That’s pure obfuscation.
Re: (Score:1)
That doesn't work. I didn't know it was supposed to.
Re: (Score:1)
Actually, that would apparently work if the value being decoded were fully valid but it doesn't because the input was an abuse of Unicode.
Re: (Score:1)
It is indeed supposed to work.
So what does your data look like, then? Does it contain a mixture of encoding levels at once?
Re:That code is painful to read (Score:1)
Ah. Ok, it works provided the value is valid UTF-X, Perl's more permissive variant of UTF-8. When I'd tried your snippet I copied my original post but the rendered blog post had a space inserted into the middle of the string which made the value no longer be valid UTF-X.
Also, utf8::decode returns a boolean indicating whether it did anything. Presumable this means your function should read as follows. This leaves both the interpretation of the string up to Perl and also lets us eventually abort when there's no more work to be done according to perl.
As a note, utf8::decode is implemented by sv_utf8_decode of sv.c which will abort when the string stops being valid UTF-X.
Reply to This
Parent