Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

Smylers (2592)

Journal of Smylers (2592)

Tuesday August 28, 2007
07:19 AM

When MySQL Bites: Quirks to Watch Out For

[ #34246 ]

This is the paper that accompanies the talk I gave at Yapc::EU in Vienna this morning. It's a draft there are a few things I plan to tidy up shortly, and some more references to be added.

If you're using MySQL, or considering using it, here are some things you probably should be aware of.

Dealing with Invalid Data

MySQL lets you specify restict what can be stored in fields, for example the number of characters in a string. If you try to insert data that doesn't meet the restrictions then MySQL tries really hard to insert someting anyway -- such as by truncating the string:

mysql> CREATE TABLE creature (name VARCHAR(3) NOT NULL);
Query OK, 0 rows affected (0.25 sec)

mysql> INSERT INTO creature SET name = 'caterpillar';
Query OK, 1 row affected, 1 warning (0.02 sec)

mysql> SELECT * FROM creature;
+------+
| name |
+------+
| cat  |
+------+

Though you can often spot when text has been truncated; noticing numbers changing can be trickier:

mysql> CREATE TABLE exam (mark TINYINT(2) NOT NULL);
Query OK, 0 rows affected (0.01 sec)

mysql> INSERT INTO exam SET mark = 172;
Query OK, 1 row affected, 1 warning (0.00 sec)

mysql> SELECT * FROM exam;
+------+
| mark |
+------+
|  127 |
+------+
1 row in set (0.00 sec)

At least with strings and numbers those cases the value that gets stored is plausible for the field type, if wrong. With an ENUM field it can be surprising to find the empty string being stored there at all:

mysql> CREATE TABLE game (shape ENUM('scissors', 'paper', 'stone') NOT NULL);
Query OK, 0 rows affected (0.01 sec)

mysql> INSERT INTO game SET shape = 'rock';
Query OK, 1 row affected, 1 warning (0.01 sec)

mysql> SELECT * FROM game;
+-------+
| shape |
+-------+
|       |
+-------+
1 row in set (0.00 sec)

If you're using the interactive MySQL client then you do get notified when data has been tweaked on insertion like this: the warning count. Use SHOW WARNINGS to see what it's done:

mysql> INSERT INTO game SET shape = 'rock';
Query OK, 1 row affected, 1 warning (0.01 sec)

mysql> SHOW WARNINGS;
+---------+------+--------------------------------------------+
| Level   | Code | Message                                    |
+---------+------+--------------------------------------------+
| Warning | 1265 | Data truncated for column 'shape' at row 1 |
+---------+------+--------------------------------------------+
1 row in set (0.00 sec)

Putting this into ~/.my.cnf:

[client]
show-warnings

makes it less tedious to see the warnings, and more obvious when you have some:

mysql> INSERT INTO game SET shape = 'rock';
Query OK, 1 row affected, 1 warning (0.02 sec)

Warning (Code 1265): Data truncated for column 'shape' at row 1

But you'd have much less chance of discovering corrupt data in your database if MySQL wouldn't let it be inserted in the first place. You can achieve this by putting the server into strict mode: add this to the server's config file (/etc/mysql/my.cnf or similar):

[mysqld]
sql-mode = STRICT_ALL_TABLES

and restart the server (merely reloading it isn't sufficient), with something like:

$ sudo /etc/init.d/mysql restart

Then it refuses to insert any invalid data:

mysql> INSERT INTO game SET shape = 'rock';
ERROR 1265 (01000): Data truncated for column 'shape' at row 1

That should stop you inadvertently inserting any bad data. It doesn't however prevent anybody from sticking some bad data in there if they really want to. Firstly, the SQL modes are per-connection, so somebody could simply turn off strict mode:

mysql> INSERT INTO game SET shape = 'rock';
ERROR 1265 (01000): Data truncated for column 'shape' at row 1

mysql> SET sql_mode = '';
Query OK, 0 rows affected (0.00 sec)

mysql> INSERT INTO game SET shape = 'rock';
Query OK, 1 row affected, 1 warning (0.00 sec)

And even with strict mode on MySQL helpfully provides the IGNORE keyword for circumventing it:

mysql> INSERT INTO game SET shape = 'rock';
ERROR 1265 (01000): Data truncated for column 'shape' at row 1

mysql> INSERT IGNORE INTO game SET shape = 'rock';
Query OK, 1 row affected, 1 warning (0.05 sec)

So far as I can tell it's impossible to guarantee that an ENUM field can't contain the empty string. But it's pretty hard to code something which does that accidentally, which is probably good enough.

MySQL Server Upgrades

Except that strict mode requires MySQL 5.0, which is less than 2 years old and so isn't being used yet in many production systems.

For software 2 years might seem quite old; certainly many people try to keep their systems more up to date than that, especially desktop applications such as web browsers or even workstation operating systems. So it's easy to be unsympathetic to somebody who hasn't upgraded.

But production database servers are things you want to be the most conservative about upgrading; they are running continuously and are critical to many functions, so stability is important. Upgrades need serious planning and aren't to be undertaken on a whim; if the database is currently running fine then upgrading it merely to get new features may not be a priority, or something your sys-admins want to do often.

As such, a cycle of only upgrading your MySQL servers every couple of years sounds quite reasonable.

It's also reasonable to wish to stick with the stable version of MySQL provided in your operating system; this introduces a lag after a new MySQL release before there's an OS release with it. And you probably don't want to risk putting a new OS version on to your production DB servers the day it's released, instead waiting for user reaction to it. Then you might want to do some testing, trying out the upgrade on a staging server and running it there for a while before committing it to the live one.

Together these factors can mean that somebody who's been making entirely reasonable choices is running a MySQL version that's 3 year's old.

For example, consider somebody currently running version 4.1 and who doesn't want to upgrade more often than once every 2 years. Should she upgrade now? That would give her version 5.0, but means she'd still be running 5.0 in 2 years' time. Or should she hold on for version 5.1 to be released meaning she continues with 4.1 currently?

Lusting After the Next MySQL Version

What makes this particularly frustrating with MySQL is how many issues with it can be resolved by using version 'n + 1', one higher than you're currently using and what's particularly disturbing is that this seems to be perpetually true, no matter how many years pass or how many upgrades you do.

In the days of version 3.22 we were keen to upgrade to 3.23, yet when we got there we found ourselves frustrated by the lack of things like foreign keys and subselects. 4.0 brought foreign keys, but subselects were put back to 4.1, still leaving us wanting. Then once we had subselects we found ourselves continually hitting up against problems that could only be solved by strict mode, or a view, or a trigger, and so on.

Often these things conspire in groups: you can think of a workaround for an 'n + 1' feature, but only by using another feature that's also 'n + 1'! For example, the above problem with invalid ENUMs inserting empty strings is solved by strict mode, which requires 5.0; you could get avoid it with a trigger, but that's also 5.0; and then you realize that you're only using an ENUM in the first place because you don't have views, which are also 5.0:

Initially we thought we just wanted one upgrade; the 'new' version of MySQL we'd read about sounded like it had finally become a 'proper' DBMS and would remedy the major frustrations we'd found with it. Tantalizingly, many years later that's still the case.

MySQL Warnings in Perl

If you aren't running a recent enough version of MySQL to have strict mode, you can at least get the same warnings in Perl that you do interactively. Consider the query:

$db->do(q[INSERT INTO game SET shape = 'rock']);

You can see if there are any warnings with the {mysql_warning_count} attribute on a statement handle, and then do SHOW WARNINGS to get them, perhaps turning the above into:

use Lingua::EN::Inflect qw<inflect>;

my $insert = $db->prepare(q[INSERT INTO game SET shape = 'rock']);
$insert->execute;
if ($insert->{mysql_warning_count})
{
  warn inflect "NUM($insert->{mysql_warning_count}) PL_N(warning):\n";
  my $warning_query = $db->prepare(q[SHOW WARNINGS]);
  $warning_query->execute;
  while (my $warning = $warning_query->fetchrow_hashref)
  {
    warn "$warning->{Message}\n";
  }
}

which yields this output:

1 warning:
Data truncated for column 'shape' at row 1

You'd probably want to abstract the warning-checking into a routine used by all your queries. Rather than just printing the warnings it could abort the program or rollback the current transaction, emulating the behaviour you get with strict mode.

Something like the above could also be useful even if you do have strict mode, for ensuring that other MySQL warnings aren't ignored.

Dates

Strict mode dissuades MySQL from treating nonsense as dates:

mysql> CREATE TABLE log (noted DATE NOT NULL);
Query OK, 0 rows affected (0.01 sec)

mysql> INSERT INTO log SET noted = 'kapow';
ERROR 1292 (22007): Incorrect date value: 'kapow' for column 'noted' at row 1

But it still thinks that a bunch of zeros is a cromulent date, without even so much as a warning:

mysql> INSERT INTO log SET noted = '0000-00-00';
Query OK, 1 row affected (0.00 sec)

What's particularly special is that MySQL treats these zeros as being NULL:

mysql> SELECT * FROM log WHERE noted IS NULL;
+------------+
| noted      |
+------------+
| 0000-00-00 |
+------------+
1 row in set (0.02 sec)

And remember that that's in a field declared NOT NULL! That zero value is still NOT NULL as well of course:

mysql> SELECT * FROM log WHERE noted IS NOT NULL;
+------------+
| noted      |
+------------+
| 0000-00-00 |
+------------+
1 row in set (0.00 sec)

In fact it is both NULL and NOT NULL at the same time:

mysql> SELECT * FROM log WHERE noted IS NULL AND noted IS NOT NULL;
+------------+
| noted      |
+------------+
| 0000-00-00 |
+------------+
1 row in set (0.00 sec)

That's probably not a date you want in your database. Even through strict mode permits it there's another SQL mode, NO_ZERO_DATE, which stops it being inserted:

mysql> SET sql_mode = 'STRICT_ALL_TABLES,NO_ZERO_DATE';
Query OK, 0 rows affected (0.00 sec)

mysql> INSERT INTO log SET noted = '0000-00-00';
ERROR 1292 (22007): Incorrect date value: '0000-00-00' for column 'noted' at row 1

But even with that, dates with zeroes in a part of them can still get into your database:

mysql> INSERT INTO log SET noted = '2007-07-00';
Query OK, 1 row affected (0.00 sec)

To prevent that there's yet another SQL mode, NO_ZERO_IN_DATE! No, really:

mysql> SET sql_mode = 'STRICT_ALL_TABLES,NO_ZERO_DATE,NO_ZERO_IN_DATE';
Query OK, 0 rows affected (0.00 sec)

mysql> INSERT INTO log SET noted = '2007-07-00';
ERROR 1292 (22007): Incorrect date value: '2007-07-00' for column 'noted' at row 1

SQL Modes

All these modes are getting a bit tedious. Fortunately it isn't necessary to remember them all individually: MySQL kindly provides a single 'shorthand' mode for turning on all these various sorts of strictures. It's called TRADITIONAL, and that's probably what you want in your server config:

[mysqld]
sql-mode = TRADITIONAL

Yup MySQL considers all this checking to be something traditionally people have wanted to do with databases; perhaps they think it's even a little quaint or old-fashioned, given that it all defaults to being off.

Timezones

Back to dates, beware that TIMESTAMP values are returned (compared, manipulated, etc) in the server's local timezone. Often this is convenient, but since no timezone is returned it can pose problems with daylight-saving time.

When the clocks are turned back in the autumn there are two consecutive hours which have the same local time, and you have no way of distinguishing them in the database.

One way of dealing wtih this would be to tell MySQL to operate in UTC.

But after all the above my preferred option is simply not to let MySQL get anywhere near my dates, storing everything as epoch seconds in integer fields, which can be stored and retrieved reasonably sanely and without giving MySQL a chance to fiddle with them it can't even tell they are dates!

The Perl DateTime module makes writing and reading epoch times easy (as well as generally being excellent at dealing with dates in Perl), and if interactively you ever need to peek into a date field you can use the MySQL functions From_UnixTime() and Unix_Timestamp() to convert between epoch times and something readable.

Foreign Keys

MySQL has often been accused at being poor at referential integrity. This requires foreign keys, introduced in MySQL 4.0. They work fine, just so long as you use the correct syntax.

Just as MySQL by default really tries hard to insert data it's given, even if it has to tweak some values, it wants to be accommodating of valid SQL to such an extent that if passed something it recognizes as being an SQL construct it doesn't support, it will quietly ignore it and just get on with processing the rest of the statement. So MySQL has a background of happily ignoring foreign key references it can't do anything with.

With InnoDB though it can do something with them, and indeed this does what you want:

mysql> CREATE TABLE supplier
       (
         id INTEGER AUTO_INCREMENT PRIMARY KEY,
         name TINYTEXT NOT NULL
       ) ENGINE=InnoDB;
Query OK, 0 rows affected (0.01 sec)

mysql> CREATE TABLE product
       (
         id INTEGER AUTO_INCREMENT PRIMARY KEY,
         supplier_id INTEGER NOT NULL,
         name TINYTEXT NOT NULL,
         FOREIGN KEY (supplier_id) REFERENCES supplier(id)
       ) ENGINE=InnoDB;
Query OK, 0 rows affected (0.01 sec)

This alternative syntax for specifying the foreign key is also valid SQL, and possibly a little more convenient:

mysql> CREATE TABLE product
       (
         id INTEGER AUTO_INCREMENT PRIMARY KEY,
         supplier_id INTEGER NOT NULL REFERENCES supplier(id),
         name TINYTEXT NOT NULL
       ) ENGINE=InnoDB;
Query OK, 0 rows affected (0.01 sec)

There's only one problem with it: it doesn't do anything. For reasons that currently remain mysterious, MySQL continues to treat the above REFERENCES as something which it knows about but doesn't do anything with despite this being a situation in which it can do something with it! And note the lack of a warning in response to it: it doesn't even admit to having ignored a crucial part of what you typed in.

Unfortunately there doesn't appear to be a way round this. The best you can hope for is finding the situation so horrendous that it embeds itself deep in your psyche such that you'll always remember its existence and to check very carefully whenever adding foreign keys.

MySQL Features and Table Types

To get foreign key support you need to use InnoDB tables. Unfortunately that prevents you from using some of MySQL's widely touted features: full-text search only works with MyIsam tables, and clustering uses its own table type, NDB.

Or to put it the other way round, if your system requires either full-text searching or clustering (and you can only pick one) then you can't have enforced referential integrity though a future release may improve the situation.

Something else MySQL has a reputation for is its speed, though again that's usually using MyIsam tables, and therefore without foreign keys.

Full-Text Searching

The full-text searching is quite nice, the only tricky bit being coaxing MySQL into actually indexing all of your text. The first issue is that by default it only indexes words of 4 or more letters which can be irritating when you're trying to search your wiki for ``FTP'' (and of course when you search for a word too short to have been indexed it merely returns no results; there's no error).

The minimum word length can be reduced in my.cnf:

[mysqld]
ft_min_word_len=3
[myisamchk]
ft_min_word_len=3

If you've already indexed some content before lowering this value then 'repairing' the relevant tables will activate the change:

mysql> REPAIR TABLE archive QUICK;

The other part of MySQL's default set-up which clobbers your indexing attempts is its extensive list of stop-words. It often makes sense not to index occurrences of really common words with little semantic value, such as ``the'', ``of'', and ``and''; but it's far from clear that you'd want to avoid words such as ``immediate'', ``appropriate'', ``sensible'', and ``unfortunately'' from bothering your index, yet they are among dozens of default stop-words.

(For some reason the stop-words are displayed in the manual with lines round them, making them look like those fridge magnets which can be used for spelling out phrases. Perhaps we should have a competition for creating the longest valid paragraph entirely out of MySQL stop-words?)

You can remedy the situation by creating a file containing only the stop-words you want and setting ft_stopword_file to its path.

Alternatives to MySQL

If you don't have a particular need to use MySQL, then you could consider avoiding the quirks highlighted by picking a different DMBS.

Postgres seems to get referential integrity right and is suitable for many situations where you want a ‘proper’ database; it's a plausible alternative to MySQL if you'd only be using MySQL InnoDB tables. SQLite is good for the niche of wanting something small and low-hassle.

Of course other software has other issues, and it may be that on balance MySQL is the best DBMS to use for your situation.

A common reason for this is because you're already using MySQL, and it'd be awkward to switch. I'd be interested to see any case studies of non-trivial live systems which have switched from MySQL to Postgres how much effort it was, and whether it was felt to be worth it.

Conclusion

MySQL has a few surprises. If you're going to use it then it's better to be aware of them. How much they affect you depends on your circumstances, but in many cases merely knowing about the potential issues is a big help in avoiding being hurt by them. MySQL is widely deployed and many people use it successfully despite all the above.

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • It is pretty interesting to see all those quirks together like that. Particularly the date handling.

    One of the things I actually like about MySQL is the idea of having different Storage Engines tuned for different purposes. I've often thought it would be interesting to some how use Postgres as a MySQL Storage Engine. I think Falcon will be close to that goal. I am looking forward to seeing how 6.0 turns out.
  • Excellent article, thank you!

    With regard to MySQL ignoring the REFERENCES field, I found it preferable to write my schema using REFERENCES and then convert it with SQL::Translator [cpan.org] to the FOREIGN KEY style like so:

    sqlt --from MySQL --to MySQL schema.sql > schema.mysql
  • A little dated, but the MySQL gotchas [sql-info.de] cover similar ground.
  • Saying MySQL 5 has only been available for 2 years is a bit questionable. It's been out in production for 2.5 and in (widely-used) beta for 3. It's standard on any modern Linux distro at this point.

    You seem to be implying that InnoDB tables are slower than MyISAM. For my workload it's been quite the opposite: InnoDB tables have been significantly faster.

    The right way to do ENUMs with enforcement is to use a lookup table with foreign keys, just like you would in many other databases. You made a really