Smylers's Journal http://use.perl.org/~Smylers/journal/ Smylers's use Perl Journal en-us use Perl; is Copyright 1998-2006, Chris Nandor. Stories, comments, journals, and other submissions posted on use Perl; are Copyright their respective owners. 2012-01-25T02:29:17+00:00 pudge pudge@perl.org Technology hourly 1 1970-01-01T00:00+00:00 Smylers's Journal http://use.perl.org/images/topics/useperl.gif http://use.perl.org/~Smylers/journal/ Talking at Conferences: A Beginners' Guide http://use.perl.org/~Smylers/journal/34284?from=rss <p> <em>I've just talked about this at Yapc in Vienna. This paper is still a draft, and will be changed a little over the next few days.</em> </p><p>You've been to a Perl conference and seen other people give presentations, and you think you might like to have a go next time. Or you're not sure, you're wondering whether you could do it; perhaps you've briefly entertained the thought, but you're not sure you've got what it takes to become a speaker, sharing the bill with the likes of Larry, Damian, and MJD. And anyway, what would you speak about?</p><p> <strong>Getting Accepted </strong> </p><p>For this conference the organizers published the algorithm they used to work out which talks to accept and reject. They particularly wanted to encourage first-time speakers, and accepted at least one talk from <em>everybody</em> who submitted one; the only rejected talks were from those who made multiple proposals.</p><p>Other conferences have other organizers, who may use other criteria, but this gives an idea of how keen the Perl community are to welcome newcomers. Don't be concerned you aren't already well-known; if you've got an idea for a talk, submit it anyway.</p><p> <strong>What to Talk About? </strong> </p><p>If you've recently created some amazing software that you're eager to evangelize then that's an obvious thing to talk about. But what if you haven't? That's OK -- there's lots of good stuff you can talk about that's produced by others who aren't at the conference, or don't have time to create presentations about it, or are so close to the project that they can't relate to beginners, or<nobr> <wbr></nobr>...</p><p>Think about what software you've been using recently, and more generally what you've been doing. Good topics for talks include:</p><dl> <dt> <strong>Introducing some software </strong></dt> <dd> <p>Tell others about some great Perl software you've recently been using.</p></dd><dt> <strong>Comparisons </strong></dt> <dd> <p>In many areas there's a choice of several ways of achieving something with Perl. For example, what's the best way of validating parameters? Of the many ways of sending mail using Perl, which ones suit which circumstances? How do Apache's <code>mod_perl</code> and FastCGI compare? Many people are only familiar with one way of doing something, even the creators of these systems; if you do the research and present a useful comparison then you become the <em>de facto</em> expert in the field.</p></dd><dt> <strong>Productivity tips </strong></dt> <dd> <p>Everybody likes to learn something which makes their lives easier, so talks on getting the best out of commonly used software are popular, and gratefully received. The first time I tried this I felt a little fraudulent, giving Bash tips at a Linux conference; I was surrounded by all these clever kernel hackers, my talk was on about software I hadn't contributed to and had no claim to being an expert on beyond reading the publicly available manpage. But it turns out that, despite using it on a daily basis, not many kernel hackers have read and digested the Bash manpage, and the talk went down well.</p></dd></dl><p>Mostly it's just a case of finding a subject on which you know a little more than most others at the conference. At a Perl conference people are already going to know a fair bit about Perl, so one option for finding something new to share with them is to <em>talk about something other than Perl</em>.</p><p>There's lots of related technology used by Perl people, such as version control systems. SVK is written in Perl, so is obviously on-topic; but other systems such as Git, Mercurial, Bazaar, Monotone are also of interest to Perl programmers, and a talk on one would be handy for those considering using it. Obviously that isn't everybody, but then that's going to be the case for any talk -- that's why a conference has multiple simultaneous talks.</p><p>Databases are also used a lot in Perl programs, so there's interest in talks on MySQL, Postgres, Microsoft Sql Server, SQLite, and so on -- or some combination, such as experiences in converting a system from MySQL to Postgres.</p><p>Perl is often used in creating web interfaces, which suggests topics such as Ajax, web accessibility, CSS, and HTML5.</p><p>On a productivity theme could be tips for using Emacs or Vim well with Perl, or your favourite Windows editor/IDE. Or people who mostly do their Perl hacking on Unix systems might be interested in what to install on Windows to give a comfortable Perl-hacking experience, and what software that Unix users know has Windows ports available.</p><p>You could talk about another programming language. There are several reasons why another language could be of interest:</p><ul> <li> alternatives to Perl, such as Python, Ruby, PHP, and C# </li><li> used in parts of projects which also use Perl, such as JavaScript, Bash, or Windows PowerShell </li><li> demonstration of other paradigms, such as Lisp, Haskell, or Prolog </li></ul><p>Many people at a Perl conference develop software for a living, and there's a bunch of non-software things which are likely to be relevant to their lives, such as extreme programming, team working, time management, project management, line management, recruitment, interviewing, persuading management to use Perl, and writing good technical English. Any of those could be the source of a good talk.</p><p>Wouldn't the above risk making this into a generic technical conference, rather than specifically a Perl one? Well obviously it would if <em>none</em> of the talks were about Perl; I'm throwing out the above as suggestions of things to have as well as unambiguously Perl talks. Though I do think it would be interesting for a Perl conference to embrace this notion and have one of its three streams explicitly devoted to talks on things which aren't Perl but are of interest to Perl people.</p><p>Also note that there's a big difference between a talk that's an introduction to Python and one that's an introduction to Python specifically for Perl programmers.</p><p> <strong>Submitting a Proposal </strong> </p><p>One you have a topic you need to submit a proposal. Don't fret about this too much: there isn't some secret formal language that you have to use when writing an abstract. Just imagine chatting a colleague in the pub at lunchtime who asks you what your talk is about, and write down whatever answer you'd give. (Or of course instead of imagining this you could use an actual colleague in an actual pub.)</p><p>Do note the deadline though. Conferences can receive so many on-time proposals that they don't consider those that arrive late.</p><p> <strong>Talk Duration </strong> </p><p>You'll need to pick a duration for your talk. 20 minutes is almost always the right answer, and definitely for any 'introduction to' talks: if a topic can't be given a beginner-friendly overview in 20 minutes then it's too complicated.</p><p>When you looked at this conference's programme there were probably a few talks you immediately knew you wanted to see, then for the rest of the time you looked around and picked things you thought might be interesting. My guess is that most of those were 20-minute talks; you're much more likely to chance 20 minutes on a topic you aren't sure about than risk the possibility of being stuck for 40 minutes or longer in a talk that turns out to be of no interest to you. So if you keep your talk short, you're likely to get more people checking it out.</p><p> <strong>Writing </strong> </p><p>When it comes to writing your talk <em>first write a paper</em>.</p><p>That may seem like odd advice, a rather roundabout way of getting a presentation created, but there are at least two good reasons for starting with a paper rather than slides:</p><ul> <li> <p>It's something you already have experience of, so you know how to do it, and are better at it. If you don't think you have experience of writing a paper then instead pretend it's an article, a blog entry, a wiki page, an e-mail to a colleague, or whatever it is you <em>do</em> have experience of.</p></li><li> <p>It's easier to concentrate on the content, without being distracted by the form. Slides make you divide your thoughts into a sequence of chunks, and once you've put content on a slide it can seem like too much hassle to re-arrange it to elsewhere. With a text document it's trivial to keep making changes as you write, to cut-and-paste a few paragraphs to somewhere else, and it doesn't really matter how long your paper ends up.</p></li></ul><p>Once you have your paper, you can turn it into a talk. Or more accurately, you can create a talk <em>about it</em>.</p><p> <strong>Considering Your Audience </strong> </p><p>As I mentioned above, in any conference there are only going to be a few 'must-see' talks for an attendee. There will be some people who are really keen to see your talk, but equally there'll be a significant number in your audience who are mainly there because they're left over from the previous talk, they're waiting for the next one, or because the talk in the other room right now is on XML.</p><p>For the people who've come to see your goal is simple: <em>make them want to read your paper</em>. A talk, especially a short one, is not a good way transferring detailed technical information to the audience, and especially not syntax. After your talk people are not going to remember the exact regular expression that you recommended, the particular command-line options needed to do something, or that nifty Vim keystroke; and after lunch they're going to be struggling even with simple things like the name of the module you were talking about or the URL of the website you said had more information.</p><p>But that's OK: all those details are in your paper. People who are inspired by your talk and want to do something that you mentioned can look at your paper and find out what they need to know.</p><p>So, if the paper contains all the relevant information, why bother with the talk at all? The talk's purpose is to inspire and enthuse people on your chosen topic, to dazzle your audience so much with what they see in your presentation that they are keen to try it out for themselves. Many of your audience won't be aware of your paper's existence, and of those who do it won't have occured to most to read it; your talk is your chance to bring it to their attention.</p><p>You can demonstrate what some software does, show how easy it is to use, or how magnificent its output is. By all means include code samples and command-lines and the like -- just remember that their purpose is show the audience that they exist, to plant the seed of what is possible, and that it isn't desirable for you to dwell on them, explaining every last punctuation character.</p><p>Hopefully the people who've come specifically to see your talk will be interested in following it up, but maybe some of the others present, randomly passing time, will also take an interest. That's more likely to happen if your talk is engaging, interesting, or even entertaining. For some people your talk simply won't be relevant, but they'll still appreciate it if you entertained them while they sat there waiting for the speaker after you.</p><p>As long as the hardcore technical information is available in your paper for anybody who wants them, I'd always go for entertainment over detail in your talk. Again think of the talks you've enjoyed seeing at the conference; they probably include some by speakers who were fun to experience even though what they were talking about has no interest to you.</p><p> <strong>Live Demos </strong> </p><p>A live demo is an excellent way of getting across what a piece of software does -- people are much more impressed seeing something for themselves rather than just hearing about it. Unfortunately it's also an excellent way of catching the speaker unawares and ruining what would otherwise be a perfectly good presentation.</p><p>Some people recommend never to risk a truly live demo, and instead to use a pre-recorded demo, a video created in advance with screen-capturing software.</p><p>The main problem with live demos is the difficulty of driving the computer and talking to the audience, at the same time. This gets worse if there's a minor problem with the demo, as sorting that out takes over your available brain cells and the audience are left witnessing you stammering a few incoherent words between frantic keypresses.</p><p>So an alternative to screen captures is to get an assistant who's responsible for driving the computer. You only have to concentrate on the audience, and your assistant makes things appear on the screen at the right time. If something goes wrong then you can continue to give the audience your undivided attention while assistant fiddles with the computer.</p><p>And even if your talk doesn't feature a live demo, it's useful to have a 'keyboard monkey' merely for controlling the slides; it frees up your hands, and means you can wander around the stage or move closer to the audience.</p><p> <strong>Slides </strong> </p><p>Talking of slides, what should you put on them? Firstly let's dismiss a couple of bad reasons for putting something on a slide:</p><dl> <dt> <strong>For the audience to refer to later </strong></dt> <dd> <p>You've already written a paper. Anything the audience (or even, if the buzz about your talk travels far enough, people not in the audience, possibly not even that this conference) want to look up later is in your paper, so there's no need to clutter the slides used during your talk with anything which doesn't directly help the talk at that moment.</p><p>And when following along at home it's much easier to read paragraphs of text in a paper than to run through a slideshow.</p></dd><dt> <strong>To remind you what to say </strong></dt> <dd> <p>Some conference talks merely consist of reading bullet points out loud. Those are rarely the good talks. A reminder of what you want to say is good, but there's no need for the whole room to see it. Some laptops can display different things to the speaker from what is being projected, or you can use paper notes. My preference is for a print-out of my slides, annotated with reminders; if you've got somebody else driving the computer then clutching some notes isn't a problem.</p><p>One of the worst presentations I've seen had bullet points being complete sentences, each appearing one-at-a-time on slides but with not-yet-reached bullet points displayed in pale grey (on the white background), presumably to remind the presenter what was coming up. But even the greyed-out bullet points were readable from the audience, so on each slide change the audience would quickly scan the contents then sit there waiting for the speaker to read them all out loud.</p><p>Then one slide was a full-screen diagram. The speaker ad-libbed, drawing the audience's attention to various points of interest in the diagram; this was much more engaging, and showed what the speaker was capable of. Unfortunately the following slide was bullet point sentences of commentary on the diagram; the presenter went through each one, in most cases reading the out the full bullet point and then adding ``as I just said''. Somehow he'd been trapped by the bullet points, under some compulsion to read each one even though he knew he'd already conveyed the relevant material. Don't do this!</p></dd></dl><p>So what should you put on you slides then? As a first approximation, <em>nothing</em>.</p><p>Imagine that instead of presenting this at a conference you're explaining the same material to a colleague. You would be able to do this without your colleague seeing large bullet points appearing on the wall behind you. Guess what? You don't need the bullet points at a conference either. 'Death by PowerPoint' is a recognized phenomena, and definitely something you want to avoid.</p><p>Slides are good for some things though, the kind of things which when you're trying to explain them to a colleague would cause you to reach for a notepad or a web-browser:</p><dl> <dt> <strong>diagrams &amp; graphs </strong></dt> <dd> both of which can convey things much more clearly than mere words </dd><dt> <strong>lists </strong></dt> <dd> so the audience can glance through the items without you having to read them all out </dd><dt> <strong>code samples, URLs, regular expressions, etc </strong></dt> <dd> things which are tedious to spell out </dd></dl><p>Slides can also be used for emphasis, to re-iterate particularly important points -- but this has to be done sparingly for it to be effective.</p><p>Doing the above may mean that there are sections of your talk without any slides. Don't worry about it: if you've decided that the best way of conveying information about something is just to talk about it then adding in some unnecessary bullet points is only going to make it worse.</p><p>Most conferences give all attendees papers, on either paper or a CD. If yours doesn't, then make sure that your talk contains memorable instructions of where the audience can find your paper.</p><p> <strong>Content Order </strong> </p><p>Nearly everything can be improved given a little more time to work on them, but given that most speakers are preparing talks in their own time it seems a little harsh to criticize a presentation you've seen. But there is one metric which can be applied to a talk safe in the knowledge that following it would have taken exactly the same preparation time: would the talk have been better had exactly the same content been presented <em>but in the reverse order</em>.</p><p>That sounds ludicrous, but since thinking of it I've been surprised by how many talks that applies to. It's natural to think of presenting material in the order in which you first encountered it, or in the order which people will need to use it. But those often aren't the best order for a live talk:</p><ul> <li> <p>A talk on using a graphical user interface toolkit showed all the coding first with the live demo at the very end. So he was trying to explain what the effect of various lines of code would be without us having seen any output at all.</p></li><li> <p>A beginner's testing talk covered in detail commands for running tests, with the speaker trying to explain the effect various bits of code would have on the output to an audience who hadn't yet seen a test run or any Tap.</p></li><li> <p>An introduction to some label-generating software started with instructions on how to download and compile it; until the audience have seen what the program does they aren't going to be interested in running it (and anyway this is exactly the sort of detail which can be left to the paper and not mentioned in the talk at all). The talk then proceeded through configuring and running the program, all the time the speaker alluding to what he was trying to generate with it. Finally the last slide showed the finished product, a nice sheet of produce labels. Far better to have those (or, even better, visual aids of actual jars of rhubarb conserve and tomato chutney sporting the labels) as the starting point, so the audience can see what the point of the talk is, and those whose interest has been piqued can follow the details of how the speaker gets there.</p></li></ul><p>If at any point you find yourself struggling to explain something, trying to convince the audience that they'll see what your talking about shortly, then that's a big clue that the order of your content needs swapping over. Remember that a live presentation, unlike a book or webpage, is strictly linear; people can't jump straight to particular sections.</p><p> <strong>Practising </strong> </p><p>Unsurprisingly talks benefit from being practised. If you're speaking at a conference then try to find a local Perl Monger's or similar group to present your talk at first; most are very grateful for any offers of talks.</p><p> <strong>Setting Up </strong> </p><p>You'd've thought that by now technology would have reached a state where any laptop works with any projector. Unfortunately that appears not to be the case, and every conference features talks delayed at the start as flummoxed speakers graple with laptops and cables.</p><p>To avoid this happening to you, scope out the room well in advance of your talk, the day before if possible. Sneak in during a quiet period and try connecting up your computer and make sure everything works.</p><p>On the day turn up to the room during the break that precedes the block of talks you're in, and get set up (though obviously you'll have to disconnect from the projector again if you aren't the first talk in your block).</p><p> <strong>Timing </strong> </p><p>One of the hardest things to get right in a talk is the timing. There's no magic fix for this, but practice runs definitely help -- and user groups can usually be more flexible than conferences about timing, so it doesn't matter as much if a talk presented there is longer or shorter than you were hoping for.</p><p>At a conference overrunning is much worse than underrunning, and can be very rude. Here are some overrunning sins I've seen committed at conferences:</p><ul> <li> <p>A speaker checks with the following speaker in the room that overrunning is acceptable, delaying the start of the next talk; this ignores the audience members who were planning on changing streams after this talk to one in another room, which isn't being delayed.</p></li><li> <p>A speaker sees there's a five-minute gap between talks so he can continue into that; this doesn't leave the following speaker any time to set up his equipment, and can result in him being very flustered as he's trying to start his talk.</p></li><li> <p>A speaker notes that his talk is followed by the lunchbreak, so concludes it's OK to run into it; this distresses audience members with plans to meet others for lunch, and who don't know where to go by themselves in this foreign country</p></li></ul><p>In short, don't overrun.</p><p>Many conferences provide a 'session chair' in each room, who will give keep track of time and signal to you when you have 5 minutes remaining. If this catches you unaware, say you're only halfway through your material, then do not do what many people seem to: attempt to whizz at high speed through all the rest (yes, all of it); the rushing bamboozles the audience, and you end up not finishing anyway, so they never get to hear your conclusions. Instead, remember that all the information is in your paper (so it doesn't matter that you say everything); point this out to the audience, try to decide what is the most important point in your remaining material, cover that, and then deliver your conclusion as planned.</p><p> <strong>Ending </strong> </p><p>There appear to be exactly two things you can say to end a talk: either "Any questions?" or "Thank you".</p><p>Jumping straight to asking for questions can initiall seem OK, but after a few questions it can become unclear whether the talk has finished yet; the next speaker is getting jumpy because your final slide is still on the screen and he wants to get plugged in; there are people in the audience who want to dash off to a talk in another room but who feel uneasy about leaving before clapping your talk; and there are people arriving from the other room who are chatting on the way in, thinking that you're just engaged in post-presentation chatting with a few individuals. It can all get quite messy and awkward.</p><p>The better way is to end simply with ``Thank you'', and a pause. The audience will applaud at this point, and you can disconnect your computer. Then you can seek questions while the next speaker is setting up and the audience are shuffling between rooms.</p><p> <strong>Further Reading </strong> </p><dl> <dt> <strong>'Conference Presentation Judo' -- Mark Jason Dominus </strong></dt> <dd> <p>While this talk is specifically about giving 3-hour conference tutorials, much of what MJD says is also useful for 20-minute presentations, and you get to find out why he talks with his mouth full. <a href="http://perl.plover.com/yak/presentation/">http://perl.plover.com/yak/presentation/</a> </p></dd><dt> <strong>'Presentation Aikido' -- Damian Conway </strong></dt> <dd> <p>This talk isn't on the web (and I haven't seen it), but Google finds several people who have seen it writing about it and offering useful nuggets.</p></dd><dt> <strong>'The Cognitive Style of PowerPoint' -- Edward Tufte </strong></dt> <dd> <p>Tufte's short book analyses how squashing thoughts into PowerPoint layouts harms what we're trying to say. <a href="http://www.edwardtufte.com/tufte/powerpoint">http://www.edwardtufte.com/tufte/powerpoint</a> </p></dd><dt> <strong>Presentation Zen </strong></dt> <dd> <p>A blog with tips on giving successful presentations. <a href="http://www.presentationzen.com/">http://www.presentationzen.com/</a> </p></dd></dl> Smylers 2007-08-30T14:03:40+00:00 journal When MySQL Bites: Quirks to Watch Out For http://use.perl.org/~Smylers/journal/34246?from=rss <p> <em>This is the paper that accompanies the talk I gave at Yapc::EU in Vienna this morning. It's a draft there are a few things I plan to tidy up shortly, and some more references to be added.</em> </p><p>If you're using MySQL, or considering using it, here are some things you probably should be aware of.</p><p> <strong>Dealing with Invalid Data </strong> </p><p>MySQL lets you specify restict what can be stored in fields, for example the number of characters in a string. If you try to insert data that doesn't meet the restrictions then MySQL tries really hard to insert <em>someting</em> anyway -- such as by truncating the string:</p><blockquote><div><p> <tt>mysql&gt; CREATE TABLE creature (name VARCHAR(3) NOT NULL);<br>Query OK, 0 rows affected (0.25 sec)<br> <br>mysql&gt; INSERT INTO creature SET name = 'caterpillar';<br>Query OK, 1 row affected, 1 warning (0.02 sec)<br> <br>mysql&gt; SELECT * FROM creature;<br>+------+<br>| name |<br>+------+<br>| cat&nbsp; |<br>+------+</tt></p></div> </blockquote><p>Though you can often spot when text has been truncated; noticing numbers changing can be trickier:</p><blockquote><div><p> <tt>mysql&gt; CREATE TABLE exam (mark TINYINT(2) NOT NULL);<br>Query OK, 0 rows affected (0.01 sec)<br> <br>mysql&gt; INSERT INTO exam SET mark = 172;<br>Query OK, 1 row affected, 1 warning (0.00 sec)<br> <br>mysql&gt; SELECT * FROM exam;<br>+------+<br>| mark |<br>+------+<br>|&nbsp; 127 |<br>+------+<br>1 row in set (0.00 sec)</tt></p></div> </blockquote><p>At least with strings and numbers those cases the value that gets stored is plausible for the field type, if wrong. With an <code>ENUM</code> field it can be surprising to find the empty string being stored there at all:</p><blockquote><div><p> <tt>mysql&gt; CREATE TABLE game (shape ENUM('scissors', 'paper', 'stone') NOT NULL);<br>Query OK, 0 rows affected (0.01 sec)<br> <br>mysql&gt; INSERT INTO game SET shape = 'rock';<br>Query OK, 1 row affected, 1 warning (0.01 sec)<br> <br>mysql&gt; SELECT * FROM game;<br>+-------+<br>| shape |<br>+-------+<br>|&nbsp; &nbsp; &nbsp; &nbsp;|<br>+-------+<br>1 row in set (0.00 sec)</tt></p></div> </blockquote><p>If you're using the interactive MySQL client then you do get notified when data has been tweaked on insertion like this: the warning count. Use <code>SHOW WARNINGS</code> to see what it's done:</p><blockquote><div><p> <tt>mysql&gt; INSERT INTO game SET shape = 'rock';<br>Query OK, 1 row affected, 1 warning (0.01 sec)<br> <br>mysql&gt; SHOW WARNINGS;<br>+---------+------+--------------------------------------------+<br>| Level&nbsp; &nbsp;| Code | Message&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; |<br>+---------+------+--------------------------------------------+<br>| Warning | 1265 | Data truncated for column 'shape' at row 1 |<br>+---------+------+--------------------------------------------+<br>1 row in set (0.00 sec)</tt></p></div> </blockquote><p>Putting this into <i>~/.my.cnf</i>:</p><blockquote><div><p> <tt>[client]<br>show-warnings</tt></p></div> </blockquote><p>makes it less tedious to see the warnings, and more obvious when you have some:</p><blockquote><div><p> <tt>mysql&gt; INSERT INTO game SET shape = 'rock';<br>Query OK, 1 row affected, 1 warning (0.02 sec)<br> <br>Warning (Code 1265): Data truncated for column 'shape' at row 1</tt></p></div> </blockquote><p>But you'd have much less chance of discovering corrupt data in your database if MySQL wouldn't let it be inserted in the first place. You can achieve this by putting the server into <strong>strict mode</strong>: add this to the server's config file (<em>/etc/mysql/my.cnf</em> or similar):</p><blockquote><div><p> <tt>[mysqld]<br>sql-mode = STRICT_ALL_TABLES</tt></p></div> </blockquote><p>and restart the server (merely reloading it isn't sufficient), with something like:</p><blockquote><div><p> <tt>$ sudo<nobr> <wbr></nobr>/etc/init.d/mysql restart</tt></p></div> </blockquote><p>Then it refuses to insert any invalid data:</p><blockquote><div><p> <tt>mysql&gt; INSERT INTO game SET shape = 'rock';<br>ERROR 1265 (01000): Data truncated for column 'shape' at row 1</tt></p></div> </blockquote><p>That <em>should</em> stop you inadvertently inserting any bad data. It doesn't however prevent anybody from sticking some bad data in there if they really want to. Firstly, the SQL modes are per-connection, so somebody could simply turn off strict mode:</p><blockquote><div><p> <tt>mysql&gt; INSERT INTO game SET shape = 'rock';<br>ERROR 1265 (01000): Data truncated for column 'shape' at row 1<br> <br>mysql&gt; SET sql_mode = '';<br>Query OK, 0 rows affected (0.00 sec)<br> <br>mysql&gt; INSERT INTO game SET shape = 'rock';<br>Query OK, 1 row affected, 1 warning (0.00 sec)</tt></p></div> </blockquote><p>And even with strict mode on MySQL helpfully provides the <code>IGNORE</code> keyword for circumventing it:</p><blockquote><div><p> <tt>mysql&gt; INSERT INTO game SET shape = 'rock';<br>ERROR 1265 (01000): Data truncated for column 'shape' at row 1<br> <br>mysql&gt; INSERT IGNORE INTO game SET shape = 'rock';<br>Query OK, 1 row affected, 1 warning (0.05 sec)</tt></p></div> </blockquote><p>So far as I can tell it's impossible to guarantee that an <code>ENUM</code> field can't contain the empty string. But it's pretty hard to code something which does that accidentally, which is probably good enough.</p><p> <strong>MySQL Server Upgrades </strong> </p><p>Except that strict mode requires MySQL 5.0, which is less than 2 years old and so isn't being used yet in many production systems.</p><p>For software 2 years might seem quite old; certainly many people try to keep their systems more up to date than that, especially desktop applications such as web browsers or even workstation operating systems. So it's easy to be unsympathetic to somebody who hasn't upgraded.</p><p>But production database servers are things you want to be the most conservative about upgrading; they are running continuously and are critical to many functions, so stability is important. Upgrades need serious planning and aren't to be undertaken on a whim; if the database is currently running fine then upgrading it merely to get new features may not be a priority, or something your sys-admins want to do often.</p><p>As such, a cycle of only upgrading your MySQL servers every couple of years sounds quite reasonable.</p><p>It's also reasonable to wish to stick with the stable version of MySQL provided in your operating system; this introduces a lag after a new MySQL release before there's an OS release with it. And you probably don't want to risk putting a new OS version on to your production DB servers the day it's released, instead waiting for user reaction to it. Then you might want to do some testing, trying out the upgrade on a staging server and running it there for a while before committing it to the live one.</p><p>Together these factors can mean that somebody who's been making entirely reasonable choices is running a MySQL version that's 3 year's old.</p><p>For example, consider somebody currently running version 4.1 and who doesn't want to upgrade more often than once every 2 years. Should she upgrade now? That would give her version 5.0, but means she'd still be running 5.0 in 2 years' time. Or should she hold on for version 5.1 to be released meaning she continues with 4.1 currently?</p><p> <strong>Lusting After the Next MySQL Version </strong> </p><p>What makes this particularly frustrating with MySQL is how many issues with it can be resolved by using version 'n + 1', one higher than you're currently using and what's particularly disturbing is that this seems to be perpetually true, no matter how many years pass or how many upgrades you do.</p><p>In the days of version 3.22 we were keen to upgrade to 3.23, yet when we got there we found ourselves frustrated by the lack of things like foreign keys and subselects. 4.0 brought foreign keys, but subselects were put back to 4.1, still leaving us wanting. Then once we had subselects we found ourselves continually hitting up against problems that could only be solved by strict mode, or a view, or a trigger, and so on.</p><p>Often these things conspire in groups: you can think of a workaround for an 'n + 1' feature, but only by using another feature that's also 'n + 1'! For example, the above problem with invalid <code>ENUM</code>s inserting empty strings is solved by strict mode, which requires 5.0; you could get avoid it with a trigger, but that's also 5.0; and then you realize that <a href="http://use.perl.org/comments.pl?sid=34758&amp;cid=53488">you're only using an <code>ENUM</code> in the first place because you don't have views</a>, which are also 5.0:</p><p>Initially we thought we just wanted one upgrade; the 'new' version of MySQL we'd read about sounded like it had finally become a 'proper' DBMS and would remedy the major frustrations we'd found with it. Tantalizingly, many years later that's still the case.</p><p> <strong>MySQL Warnings in Perl </strong> </p><p>If you aren't running a recent enough version of MySQL to have strict mode, you can at least get the same warnings in Perl that you do interactively. Consider the query:</p><blockquote><div><p> <tt>$db-&gt;do(q[INSERT INTO game SET shape = 'rock']);</tt></p></div> </blockquote><p>You can see if there are any warnings with the <code>{mysql_warning_count}</code> attribute on a statement handle, and then do <code>SHOW WARNINGS</code> to get them, perhaps turning the above into:</p><blockquote><div><p> <tt>use Lingua::EN::Inflect qw&amp;lt;inflect&gt;;<br> <br>my $insert = $db-&gt;prepare(q[INSERT INTO game SET shape = 'rock']);<br>$insert-&gt;execute;<br>if ($insert-&gt;{mysql_warning_count})<br>{<br>&nbsp; warn inflect "NUM($insert-&gt;{mysql_warning_count}) PL_N(warning):\n";<br>&nbsp; my $warning_query = $db-&gt;prepare(q[SHOW WARNINGS]);<br>&nbsp; $warning_query-&gt;execute;<br>&nbsp; while (my $warning = $warning_query-&gt;fetchrow_hashref)<br>&nbsp; {<br>&nbsp; &nbsp; warn "$warning-&gt;{Message}\n";<br>&nbsp; }<br>}</tt></p></div> </blockquote><p>which yields this output:</p><blockquote><div><p> <tt>1 warning:<br>Data truncated for column 'shape' at row 1</tt></p></div> </blockquote><p>You'd probably want to abstract the warning-checking into a routine used by all your queries. Rather than just printing the warnings it could abort the program or rollback the current transaction, emulating the behaviour you get with strict mode.</p><p>Something like the above could also be useful even if you do have strict mode, for ensuring that other MySQL warnings aren't ignored.</p><p> <strong>Dates </strong> </p><p>Strict mode dissuades MySQL from treating nonsense as dates:</p><blockquote><div><p> <tt>mysql&gt; CREATE TABLE log (noted DATE NOT NULL);<br>Query OK, 0 rows affected (0.01 sec)<br> <br>mysql&gt; INSERT INTO log SET noted = 'kapow';<br>ERROR 1292 (22007): Incorrect date value: 'kapow' for column 'noted' at row 1</tt></p></div> </blockquote><p>But it still thinks that a bunch of zeros is a cromulent date, without even so much as a warning:</p><blockquote><div><p> <tt>mysql&gt; INSERT INTO log SET noted = '0000-00-00';<br>Query OK, 1 row affected (0.00 sec)</tt></p></div> </blockquote><p>What's particularly special is that MySQL treats these zeros as being <code>NULL</code>:</p><blockquote><div><p> <tt>mysql&gt; SELECT * FROM log WHERE noted IS NULL;<br>+------------+<br>| noted&nbsp; &nbsp; &nbsp; |<br>+------------+<br>| 0000-00-00 |<br>+------------+<br>1 row in set (0.02 sec)</tt></p></div> </blockquote><p>And remember that that's in a field declared <code>NOT NULL</code>! That zero value is still <code>NOT NULL</code> as well of course:</p><blockquote><div><p> <tt>mysql&gt; SELECT * FROM log WHERE noted IS NOT NULL;<br>+------------+<br>| noted&nbsp; &nbsp; &nbsp; |<br>+------------+<br>| 0000-00-00 |<br>+------------+<br>1 row in set (0.00 sec)</tt></p></div> </blockquote><p>In fact it is both <code>NULL</code> and <code>NOT NULL</code> at the same time:</p><blockquote><div><p> <tt>mysql&gt; SELECT * FROM log WHERE noted IS NULL AND noted IS NOT NULL;<br>+------------+<br>| noted&nbsp; &nbsp; &nbsp; |<br>+------------+<br>| 0000-00-00 |<br>+------------+<br>1 row in set (0.00 sec)</tt></p></div> </blockquote><p>That's probably not a date you want in your database. Even through strict mode permits it there's another SQL mode, <code>NO_ZERO_DATE</code>, which stops it being inserted:</p><blockquote><div><p> <tt>mysql&gt; SET sql_mode = 'STRICT_ALL_TABLES,NO_ZERO_DATE';<br>Query OK, 0 rows affected (0.00 sec)<br> <br>mysql&gt; INSERT INTO log SET noted = '0000-00-00';<br>ERROR 1292 (22007): Incorrect date value: '0000-00-00' for column 'noted' at row 1</tt></p></div> </blockquote><p>But even with that, dates with zeroes in a <em>part</em> of them can still get into your database:</p><blockquote><div><p> <tt>mysql&gt; INSERT INTO log SET noted = '2007-07-00';<br>Query OK, 1 row affected (0.00 sec)</tt></p></div> </blockquote><p>To prevent that there's yet another SQL mode, <code>NO_ZERO_IN_DATE</code>! No, really:</p><blockquote><div><p> <tt>mysql&gt; SET sql_mode = 'STRICT_ALL_TABLES,NO_ZERO_DATE,NO_ZERO_IN_DATE';<br>Query OK, 0 rows affected (0.00 sec)<br> <br>mysql&gt; INSERT INTO log SET noted = '2007-07-00';<br>ERROR 1292 (22007): Incorrect date value: '2007-07-00' for column 'noted' at row 1</tt></p></div> </blockquote><p> <strong>SQL Modes </strong> </p><p>All these modes are getting a bit tedious. Fortunately it isn't necessary to remember them all individually: MySQL kindly provides a single 'shorthand' mode for turning on all these various sorts of strictures. It's called <code>TRADITIONAL</code>, and that's probably what you want in your server config:</p><blockquote><div><p> <tt>[mysqld]<br>sql-mode = TRADITIONAL</tt></p></div> </blockquote><p>Yup MySQL considers all this checking to be something traditionally people have wanted to do with databases; perhaps they think it's even a little quaint or old-fashioned, given that it all defaults to being off.</p><p> <strong>Timezones </strong> </p><p>Back to dates, beware that <code>TIMESTAMP</code> values are returned (compared, manipulated, etc) in the server's local timezone. Often this is convenient, but since no timezone is returned it can pose problems with daylight-saving time.</p><p>When the clocks are turned back in the autumn there are two consecutive hours which have the same local time, and you have no way of distinguishing them in the database.</p><p>One way of dealing wtih this would be to <a href="http://dev.mysql.com/doc/refman/5.1/en/time-zone-support.html">tell MySQL to operate in UTC</a>.</p><p>But after all the above my preferred option is simply not to let MySQL get anywhere near my dates, storing everything as epoch seconds in integer fields, which can be stored and retrieved reasonably sanely and without giving MySQL a chance to fiddle with them it can't even tell they <em>are</em> dates!</p><p>The Perl <em>DateTime</em> module makes writing and reading epoch times easy (as well as generally being excellent at dealing with dates in Perl), and if interactively you ever need to peek into a date field you can use the MySQL functions <code>From_UnixTime()</code> and <code>Unix_Timestamp()</code> to convert between epoch times and something readable.</p><p> <strong>Foreign Keys </strong> </p><p>MySQL has often been accused at being poor at referential integrity. This requires foreign keys, introduced in MySQL 4.0. They work fine, just so long as you use the correct syntax.</p><p>Just as MySQL by default really tries hard to insert data it's given, even if it has to tweak some values, it wants to be accommodating of valid SQL to such an extent that if passed something it recognizes as being an SQL construct it doesn't support, it will quietly ignore it and just get on with processing the rest of the statement. So MySQL has a background of happily ignoring foreign key references it can't do anything with.</p><p>With InnoDB though it can do something with them, and indeed this does what you want:</p><blockquote><div><p> <tt>mysql&gt; CREATE TABLE supplier<br>&nbsp; &nbsp; &nbsp; &nbsp;(<br>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;id INTEGER AUTO_INCREMENT PRIMARY KEY,<br>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;name TINYTEXT NOT NULL<br>&nbsp; &nbsp; &nbsp; &nbsp;) ENGINE=InnoDB;<br>Query OK, 0 rows affected (0.01 sec)<br> <br>mysql&gt; CREATE TABLE product<br>&nbsp; &nbsp; &nbsp; &nbsp;(<br>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;id INTEGER AUTO_INCREMENT PRIMARY KEY,<br>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;supplier_id INTEGER NOT NULL,<br>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;name TINYTEXT NOT NULL,<br>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;FOREIGN KEY (supplier_id) REFERENCES supplier(id)<br>&nbsp; &nbsp; &nbsp; &nbsp;) ENGINE=InnoDB;<br>Query OK, 0 rows affected (0.01 sec)</tt></p></div> </blockquote><p>This alternative syntax for specifying the foreign key is also valid SQL, and possibly a little more convenient:</p><blockquote><div><p> <tt>mysql&gt; CREATE TABLE product<br>&nbsp; &nbsp; &nbsp; &nbsp;(<br>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;id INTEGER AUTO_INCREMENT PRIMARY KEY,<br>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;supplier_id INTEGER NOT NULL REFERENCES supplier(id),<br>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;name TINYTEXT NOT NULL<br>&nbsp; &nbsp; &nbsp; &nbsp;) ENGINE=InnoDB;<br>Query OK, 0 rows affected (0.01 sec)</tt></p></div> </blockquote><p>There's only one problem with it: it doesn't do anything. For reasons that currently remain mysterious, MySQL continues to treat the above <code>REFERENCES</code> as something which it knows about but doesn't do anything with despite this being a situation in which it can do something with it! And note the lack of a warning in response to it: it doesn't even admit to having ignored a crucial part of what you typed in.</p><p>Unfortunately there doesn't appear to be a way round this. The best you can hope for is finding the situation so horrendous that it embeds itself deep in your psyche such that you'll always remember its existence and to check very carefully whenever adding foreign keys.</p><p> <strong>MySQL Features and Table Types </strong> </p><p>To get foreign key support you need to use InnoDB tables. Unfortunately that prevents you from using some of MySQL's widely touted features: full-text search only works with MyIsam tables, and clustering uses its own table type, NDB.</p><p>Or to put it the other way round, if your system requires either full-text searching or clustering (and you can only pick one) then you can't have enforced referential integrity though a future release may improve the situation.</p><p>Something else MySQL has a reputation for is its speed, though again that's usually using MyIsam tables, and therefore without foreign keys.</p><p> <strong>Full-Text Searching </strong> </p><p>The full-text searching is quite nice, the only tricky bit being coaxing MySQL into actually indexing all of your text. The first issue is that by default it only indexes words of 4 or more letters which can be irritating when you're trying to search your wiki for ``FTP'' (and of course when you search for a word too short to have been indexed it merely returns no results; there's no error).</p><p>The minimum word length can be reduced in <em>my.cnf</em>:</p><blockquote><div><p> <tt>[mysqld]<br>ft_min_word_len=3<br>[myisamchk]<br>ft_min_word_len=3</tt></p></div> </blockquote><p>If you've already indexed some content before lowering this value then 'repairing' the relevant tables will activate the change:</p><blockquote><div><p> <tt>mysql&gt; REPAIR TABLE archive QUICK;</tt></p></div> </blockquote><p>The other part of MySQL's default set-up which clobbers your indexing attempts is its extensive list of stop-words. It often makes sense not to index occurrences of really common words with little semantic value, such as ``the'', ``of'', and ``and''; but it's far from clear that you'd want to avoid words such as ``immediate'', ``appropriate'', ``sensible'', and ``unfortunately'' from bothering your index, yet they are among <a href="http://dev.mysql.com/doc/refman/5.1/en/fulltext-stopwords.html">dozens of default stop-words</a>.</p><p>(For some reason the stop-words are displayed in the manual with lines round them, making them look like those fridge magnets which can be used for spelling out phrases. Perhaps we should have a competition for creating the longest valid paragraph entirely out of MySQL stop-words?)</p><p>You can remedy the situation by creating a file containing only the stop-words you want and <a href="http://dev.mysql.com/doc/refman/5.1/en/fulltext-fine-tuning.html">setting <code>ft_stopword_file</code> to its path</a>.</p><p> <strong>Alternatives to MySQL </strong> </p><p>If you don't have a particular need to use MySQL, then you could consider avoiding the quirks highlighted by picking a different DMBS.</p><p>Postgres seems to get referential integrity right and is suitable for many situations where you want a &#8216;proper&#8217; database; it's a plausible alternative to MySQL if you'd only be using MySQL <code>InnoDB</code> tables. SQLite is good for the niche of wanting something small and low-hassle.</p><p>Of course other software has other issues, and it may be that on balance MySQL is the best DBMS to use for your situation.</p><p>A common reason for this is because you're already using MySQL, and it'd be awkward to switch. I'd be interested to see any case studies of non-trivial live systems which have switched from MySQL to Postgres how much effort it was, and whether it was felt to be worth it.</p><p> <strong>Conclusion </strong> </p><p>MySQL has a few surprises. If you're going to use it then it's better to be aware of them. How much they affect you depends on your circumstances, but in many cases merely knowing about the potential issues is a big help in avoiding being hurt by them. MySQL is widely deployed and many people use it successfully despite all the above.</p> Smylers 2007-08-28T12:19:54+00:00 journal