Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

darobin (1316)

darobin
  (email not shown publicly)
http://berjon.com/

Journal of darobin (1316)

Sunday February 17, 2002
12:25 PM

UTF-8 on search.cpan.org

[ #2922 ]

It would seem that search.cpan.org (and possibly other parts of the CPAN system as this particular problem may have roots elsewhere in the chain) is having a few problems with some characters outside the Latin-1 range.

If you go to http://search.cpan.org/search?mode=module&query=SOAP::Client, you will note that the second entry seems rather garbled. I know not the author, but judging from his entries it would seem that his name may be ethiopian.

Not that this is a real problem, hardly a buglet in fact as the site works nevertheless, but it shows once more how hard it is to deal properly with encodings, even for experienced programmers. At first this touched mostly XML, and people were making fun of us for the problems we'd thrown ourselves into to get things right encodings-wise. But now people are expecting -- and rightly so -- the entire web to be Unicode safe. This puts quite a few of us into trouble, as few tools are yet ready for this and few know how to deal with this properly. I got burnt many times already so I can only advise anyone that is doing web stuff to pay attention to such issues, as it won't be possible to dodge them long :)

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • of those and you can see them for yourself on the who's who list [cpan.org] at the bottom. I'm not sure what the cause of the mangling is as it would appear to be more than just an encoding issue.

    • I'm not sure either, especially as they appear fine on the who's who list. It could be an encoding problem as only one error in the pipeline is sufficient to rot the whole thing (and it doesn't take much to have just that one bug :-). It might simply have been because search.cpan.org was sending the correct content, but with charset iso-8859-1. Anyway, it looks like you fixed it by using his englishified name now :)

      PS: I should have said this more clearly in my journal: I'm not trying to say anythi

      --

      -- Robin Berjon [berjon.com]

      • Hmm? They appear the same on the list and on search.cpan in my browser. I didn't fix anything :)

        • Now that's strange... When I posted earlier, Daniel Yacob (DYACOB) was listed on search.cpan as a garbled string of chars but is now "Daniel Yacob". On the who's who list I see him listed as <something I had to delete to get this post to be accepted> (ie a name in what looks like hebrew but which could be in something else as the fonts on this box are very unclear). search.cpan has definitely changed over the past few hours (as seen from here at last) !

          --

          -- Robin Berjon [berjon.com]