Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • We all want Unicode to work, and there is no question that it is the Right Thing (tm) to do, being open, allowing other cultures to join us and use their own writing scheme and all.

    The sad truth is that it is actually a huge pain in the ass to implement for most coders, at least in the US and especially in Europe, and I would be really interested to know if it makes things really easier for Asian coders.

    Plus Unicode is usually being forced upon us by XML, which is never a nice thing when you are already f

    • Is it worse in Europe specifically because UTF-8 and 8-bit Latin-1 are incompatible?
      • Yes, XML parsers not only tend to die a swift but painful death when they encounter a Latin-1 (or 2 or more) character, even in a CDATA section, but also, at least XML::Parser converts everything to UTF-8, even if the rest of the environment is entirely Latin-n. This is extremely annoying as it adds an extra level of complexity to all applications, and forces people to care about encodings when really they don't want to.

    • The problem with the Asian languages is most of them already have a perfectly serviceable local standard. Big5 (traditional and simplified) for Chinese and Shift-JIS (amongst others) for Japanese. Korean and Vietnamese also have standards that work just fine.

      Unicode's in some ways more of a change for them than for us--while ASCII maps to Unicode (especially the utf8 encoding) with no change, the same can not be said for the asian languages. For them Unicode's more than just an annoyance, it's something t
  • Why not just write things as Latin-1 if they consist only of characters [\x00-xFF], and UTF8 otherwise?
    • Won't those characters show up wrongly when you expect to see UTF-8 characters, then? I don't really understand. Let's say I have ÿ, \xFF. I assume that character has some other byte representation in UTF-8. But how is that byte represented in UTF-8? Do you understand what it is that I don't understand?
      • I'm assuming all mp3-readers auto-detect encoding, so there's no "expecting to see UTF8" -- if you see UTF8, you see it and decode it as such, otherwise you assume it's something else. Remember, pretty much only UTF8 looks like UTF8.

        Or: if mp3s have an explicit settign that says what encoding something is, then presumably there's no guesswork involved at all.

        • MP3 tags aren't just for MP3 readers, they are for web browsers, databases, text files of various kinds, etc.
          • My "mp3 reader", I mean anything that accesses the tag data in the files, including libraries that just pass it on to other applications.

            But anyway. Ideally, calling applications (like a CGI that passes on the tag data) should make clear what kinds of data-encoding they can or can't cope with.