NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.
All the Perl that's Practical to Extract and Report
Stories, comments, journals, and other submissions on use Perl; are Copyright 1998-2006, their respective owners.
Wrong wrong wrong (Score:1)
Don’t look at the UTF8 flag. The UTF8 flag does not mean what you think it means. You can have a perfectly valid Unicode string that does not have its UTF8 flag set, and you can have a JPEG image in a string that does have its UTF8 flag set. The UTF8 flag is a lie. It should not have been called the UTF8 flag. There is no flag in Perl that means what you think the UTF8 flag means. Don’t look at the UTF8 flag.
What you want to do is very simple:
Re: (Score:1)
Re:Wrong wrong wrong (Score:1)
I misunderstood where the problem is in the code, but it’s still wrong. Since it’s XS, you specifically do need to look at the flag, explicitly:
The problem is that you’re using
utf8_to_uvuniunconditionally. But the PV of a string with SvUTF8 off has a different format than when the flag is on. You should be usingutf8_to_uvunionly if the flag is on; otherwise, you should just take one byte at a time from the string and use that directly.FWIW, since there are only three ranges and three single codepoints, I wouldn’t use a loop for the conditionals. Just unroll the whole thing.
So add the above code to the module as
02_utf8_flag.t, removeChar.h, and replaceChar.xswith the following code. After that, all tests will pass.On an API stylistic note, I really really hate when modules expect me to call functions as methods. How about renaming the XS function to
is_valid_xml_stringand making it exportable? Then people have the option to either writeXML::Char->valid($foo)or exporting it and writingis_valid_xml_string($foo).Reply to This
Parent
Re: (Score:1)