I'm doing some processing of the OpenDirectory dmoz RDF dump, but couldn't find a decent way to make sense of the Unicode chars.
After having lucked out on Google and CPAN for a week I finally found the way to transcribe utf8 text to Latin-1:
My previously home grown version is slightly more complete when it comes to e.g. Romanian chars. It's 100% manual though... (I log missing chars and add them by looking at the dmoz.org web site