Slash Boxes
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

Ovid (2709)

  (email not shown publicly)
AOL IM: ovidperl (Add Buddy, Send Message)

Stuff with the Perl Foundation. A couple of patches in the Perl core. A few CPAN modules. That about sums it up.

Journal of Ovid (2709)

Monday January 22, 2007
10:32 AM

MySQL Default Collation

[ #32231 ]

In researching a database problem, I just discovered that MySQL's default collation is latin1_swedish_ci. Apparently, this is considered OK because english doesn't have accented characters and will therefore sort the same way. Still, sounds a bit dodgy to me.

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
More | Login | Reply
Loading... please wait.
  • Any default that isn't UTF-8 is wrong. Yes, really.
    • Uhm, that makes no sense. UTF-8 is an encoding, Unicode is a charset, and neither is a collation. F.ex., Ü will sort to different places depending on whether the collation is English or German (and may sort in yet otherwise in one of the Scandinavian languages, or in Turkish, or what have you). Whether you represent this character in Latin-1 or Unicode (happens to be the same codepoint in both charsets) and whether you encode the Unicode codepoint using UTF-8 or another encoding all has nothing to do w

  • Arguably we have at least three accented characters - the o with diaresis, o with circumflex and e with acute. At least, they occur in Modern English words in the OED even if in practice most people omit them. But we undeniably have two non-ASCII letters both of which occur both capitalised and not - the ae and oe which I dare not type here because your browser will almost certainly get them wrong.

    AE occurs in words like encyclopaedia and aesc, and OE in, for example, the proper name OEdipus and a few s

    • And even then, we have words like résumé which also have accents, depending on who is spelling them.

      • Yes, that's e with acute. Some would argue that furrin words adopted into English lose the accents though. For example, when we stole théâtre from the French it became theatre and café is normally cafe.
    • Don’t forget naïve. Æsthetics and co. work just fine in browsers, btw.

      • It's worth noting that the funny dots in ï and ö in English are diaresis marks, not umlauts. I wonder if they're spelt differently in Unicode ...