Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

jplindstrom (594)

jplindstrom
  (email not shown publicly)

Journal of jplindstrom (594)

Saturday May 11, 2002
11:30 AM

Unicode transliteration

[ #4855 ]

I'm doing some processing of the OpenDirectory dmoz RDF dump, but couldn't find a decent way to make sense of the Unicode chars.

After having lucked out on Google and CPAN for a week I finally found the way to transcribe utf8 text to Latin-1:

http://groups.google.com/groups?selm=note-18266%40php.net

My previously home grown version is slightly more complete when it comes to e.g. Romanian chars. It's 100% manual though... (I log missing chars and add them by looking at the dmoz.org web site :)

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.