Smylers's Journal
http://use.perl.org/~Smylers/journal/
Smylers's use Perl Journalen-ususe Perl; is Copyright 1998-2006, Chris Nandor. Stories, comments, journals, and other submissions posted on use Perl; are Copyright their respective owners.2012-01-25T02:29:17+00:00pudgepudge@perl.orgTechnologyhourly11970-01-01T00:00+00:00Smylers's Journalhttp://use.perl.org/images/topics/useperl.gif
http://use.perl.org/~Smylers/journal/
Talking at Conferences: A Beginners' Guide
http://use.perl.org/~Smylers/journal/34284?from=rss
<p> <em>I've just talked about this at Yapc in Vienna. This paper is still a
draft, and will be changed a little over the next few days.</em> </p><p>You've been to a Perl conference and seen other people give presentations, and
you think you might like to have a go next time. Or you're not sure, you're
wondering whether you could do it; perhaps you've briefly entertained the
thought, but you're not sure you've got what it takes to become a speaker,
sharing the bill with the likes of Larry, Damian, and MJD. And anyway, what
would you speak about?</p><p>
<strong>Getting Accepted </strong>
</p><p>For this conference the organizers published the algorithm they used to work
out which talks to accept and reject. They particularly wanted to encourage
first-time speakers, and accepted at least one talk from <em>everybody</em> who
submitted one; the only rejected talks were from those who made multiple
proposals.</p><p>Other conferences have other organizers, who may use other criteria, but this
gives an idea of how keen the Perl community are to welcome newcomers. Don't
be concerned you aren't already well-known; if you've got an idea for a talk,
submit it anyway.</p><p>
<strong>What to Talk About? </strong>
</p><p>If you've recently created some amazing software that you're eager to
evangelize then that's an obvious thing to talk about. But what if you
haven't? That's OK -- there's lots of good stuff you can talk about that's
produced by others who aren't at the conference, or don't have time to create
presentations about it, or are so close to the project that they can't relate
to beginners, or<nobr> <wbr></nobr>...</p><p>Think about what software you've been using recently, and more generally what
you've been doing. Good topics for talks include:</p><dl>
<dt> <strong>Introducing some software </strong></dt>
<dd>
<p>Tell others about some great Perl software you've recently been using.</p></dd><dt> <strong>Comparisons </strong></dt>
<dd>
<p>In many areas there's a choice of several ways of achieving something with
Perl. For example, what's the best way of validating parameters? Of the many
ways of sending mail using Perl, which ones suit which circumstances? How do
Apache's <code>mod_perl</code> and FastCGI compare? Many people are only familiar with
one way of doing something, even the creators of these systems; if you do the
research and present a useful comparison then you become the <em>de facto</em> expert
in the field.</p></dd><dt> <strong>Productivity tips </strong></dt>
<dd>
<p>Everybody likes to learn something which makes their lives easier, so talks on
getting the best out of commonly used software are popular, and gratefully
received. The first time I tried this I felt a little fraudulent, giving Bash
tips at a Linux conference; I was surrounded by all these clever kernel
hackers, my talk was on about software I hadn't contributed to and had no claim
to being an expert on beyond reading the publicly available manpage. But it
turns out that, despite using it on a daily basis, not many kernel hackers have
read and digested the Bash manpage, and the talk went down well.</p></dd></dl><p>Mostly it's just a case of finding a subject on which you know a little more
than most others at the conference. At a Perl conference people are already
going to know a fair bit about Perl, so one option for finding something new to
share with them is to <em>talk about something other than Perl</em>.</p><p>There's lots of related technology used by Perl people, such as version control
systems. SVK is written in Perl, so is obviously on-topic; but other systems
such as Git, Mercurial, Bazaar, Monotone are also of interest to Perl
programmers, and a talk on one would be handy for those considering using it.
Obviously that isn't everybody, but then that's going to be the case for any
talk -- that's why a conference has multiple simultaneous talks.</p><p>Databases are also used a lot in Perl programs, so there's interest in talks on
MySQL, Postgres, Microsoft Sql Server, SQLite, and so on -- or some
combination, such as experiences in converting a system from MySQL to Postgres.</p><p>Perl is often used in creating web interfaces, which suggests topics such as
Ajax, web accessibility, CSS, and HTML5.</p><p>On a productivity theme could be tips for using Emacs or Vim well with Perl, or
your favourite Windows editor/IDE. Or people who mostly do their Perl hacking
on Unix systems might be interested in what to install on Windows to give a
comfortable Perl-hacking experience, and what software that Unix users know
has Windows ports available.</p><p>You could talk about another programming language. There are several reasons
why another language could be of interest:</p><ul>
<li>
alternatives to Perl, such as Python, Ruby, PHP, and C#
</li><li>
used in parts of projects which also use Perl, such as JavaScript, Bash, or
Windows PowerShell
</li><li>
demonstration of other paradigms, such as Lisp, Haskell, or Prolog
</li></ul><p>Many people at a Perl conference develop software for a living, and there's a
bunch of non-software things which are likely to be relevant to their lives,
such as extreme programming, team working, time management, project management,
line management, recruitment, interviewing, persuading management to use Perl,
and writing good technical English. Any of those could be the source of a good
talk.</p><p>Wouldn't the above risk making this into a generic technical conference, rather
than specifically a Perl one? Well obviously it would if <em>none</em> of the talks
were about Perl; I'm throwing out the above as suggestions of things to have as
well as unambiguously Perl talks. Though I do think it would be interesting
for a Perl conference to embrace this notion and have one of its three streams
explicitly devoted to talks on things which aren't Perl but are of interest to
Perl people.</p><p>Also note that there's a big difference between a talk that's an introduction
to Python and one that's an introduction to Python specifically for Perl
programmers.</p><p>
<strong>Submitting a Proposal </strong>
</p><p>One you have a topic you need to submit a proposal. Don't fret about this too
much: there isn't some secret formal language that you have to use when writing
an abstract. Just imagine chatting a colleague in the pub at lunchtime who
asks you what your talk is about, and write down whatever answer you'd give.
(Or of course instead of imagining this you could use an actual colleague in an
actual pub.)</p><p>Do note the deadline though. Conferences can receive so many on-time proposals
that they don't consider those that arrive late.</p><p>
<strong>Talk Duration </strong>
</p><p>You'll need to pick a duration for your talk. 20 minutes is almost always the
right answer, and definitely for any 'introduction to' talks: if a topic can't
be given a beginner-friendly overview in 20 minutes then it's too complicated.</p><p>When you looked at this conference's programme there were probably a few talks
you immediately knew you wanted to see, then for the rest of the time you
looked around and picked things you thought might be interesting. My guess is
that most of those were 20-minute talks; you're much more likely to chance 20
minutes on a topic you aren't sure about than risk the possibility of being
stuck for 40 minutes or longer in a talk that turns out to be of no interest to
you. So if you keep your talk short, you're likely to get more people checking
it out.</p><p>
<strong>Writing </strong>
</p><p>When it comes to writing your talk <em>first write a paper</em>.</p><p>That may seem like odd advice, a rather roundabout way of getting a
presentation created, but there are at least two good reasons for starting with
a paper rather than slides:</p><ul>
<li>
<p>It's something you already have experience of, so you know how to do it, and
are better at it. If you don't think you have experience of writing a paper
then instead pretend it's an article, a blog entry, a wiki page, an e-mail to a
colleague, or whatever it is you <em>do</em> have experience of.</p></li><li>
<p>It's easier to concentrate on the content, without being distracted by the
form. Slides make you divide your thoughts into a sequence of chunks, and once
you've put content on a slide it can seem like too much hassle to re-arrange it
to elsewhere. With a text document it's trivial to keep making changes as you
write, to cut-and-paste a few paragraphs to somewhere else, and it doesn't
really matter how long your paper ends up.</p></li></ul><p>Once you have your paper, you can turn it into a talk. Or more accurately, you
can create a talk <em>about it</em>.</p><p>
<strong>Considering Your Audience </strong>
</p><p>As I mentioned above, in any conference there are only going to be a few
'must-see' talks for an attendee. There will be some people who are really
keen to see your talk, but equally there'll be a significant number in your
audience who are mainly there because they're left over from the previous talk,
they're waiting for the next one, or because the talk in the other room right
now is on XML.</p><p>For the people who've come to see your goal is simple: <em>make them want to read
your paper</em>. A talk, especially a short one, is not a good way transferring
detailed technical information to the audience, and especially not syntax.
After your talk people are not going to remember the exact regular expression
that you recommended, the particular command-line options needed to do
something, or that nifty Vim keystroke; and after lunch they're going to be
struggling even with simple things like the name of the module you were talking
about or the URL of the website you said had more information.</p><p>But that's OK: all those details are in your paper. People who are inspired by
your talk and want to do something that you mentioned can look at your paper
and find out what they need to know.</p><p>So, if the paper contains all the relevant information, why bother with the
talk at all? The talk's purpose is to inspire and enthuse people on your
chosen topic, to dazzle your audience so much with what they see in your
presentation that they are keen to try it out for themselves. Many of your
audience won't be aware of your paper's existence, and of those who do it won't
have occured to most to read it; your talk is your chance to bring it to their
attention.</p><p>You can demonstrate what some software does, show how easy it is to use, or how
magnificent its output is. By all means include code samples and command-lines
and the like -- just remember that their purpose is show the audience that they
exist, to plant the seed of what is possible, and that it isn't desirable for
you to dwell on them, explaining every last punctuation character.</p><p>Hopefully the people who've come specifically to see your talk will be
interested in following it up, but maybe some of the others present, randomly
passing time, will also take an interest. That's more likely to happen if your
talk is engaging, interesting, or even entertaining. For some people your talk
simply won't be relevant, but they'll still appreciate it if you entertained
them while they sat there waiting for the speaker after you.</p><p>As long as the hardcore technical information is available in your paper for
anybody who wants them, I'd always go for entertainment over detail in your
talk. Again think of the talks you've enjoyed seeing at the conference; they
probably include some by speakers who were fun to experience even though what
they were talking about has no interest to you.</p><p>
<strong>Live Demos </strong>
</p><p>A live demo is an excellent way of getting across what a piece of software does
-- people are much more impressed seeing something for themselves rather than
just hearing about it. Unfortunately it's also an excellent way of catching
the speaker unawares and ruining what would otherwise be a perfectly good
presentation.</p><p>Some people recommend never to risk a truly live demo, and instead to use a
pre-recorded demo, a video created in advance with screen-capturing software.</p><p>The main problem with live demos is the difficulty of driving the computer and
talking to the audience, at the same time. This gets worse if there's a minor
problem with the demo, as sorting that out takes over your available brain
cells and the audience are left witnessing you stammering a few incoherent
words between frantic keypresses.</p><p>So an alternative to screen captures is to get an assistant who's responsible
for driving the computer. You only have to concentrate on the audience, and
your assistant makes things appear on the screen at the right time. If
something goes wrong then you can continue to give the audience your undivided
attention while assistant fiddles with the computer.</p><p>And even if your talk doesn't feature a live demo, it's useful to have a
'keyboard monkey' merely for controlling the slides; it frees up your hands,
and means you can wander around the stage or move closer to the audience.</p><p>
<strong>Slides </strong>
</p><p>Talking of slides, what should you put on them? Firstly let's dismiss a couple
of bad reasons for putting something on a slide:</p><dl>
<dt> <strong>For the audience to refer to later </strong></dt>
<dd>
<p>You've already written a paper. Anything the audience (or even, if the buzz
about your talk travels far enough, people not in the audience, possibly not
even that this conference) want to look up later is in your paper, so there's
no need to clutter the slides used during your talk with anything which doesn't
directly help the talk at that moment.</p><p>And when following along at home it's much easier to read paragraphs of text in
a paper than to run through a slideshow.</p></dd><dt> <strong>To remind you what to say </strong></dt>
<dd>
<p>Some conference talks merely consist of reading bullet points out loud. Those
are rarely the good talks. A reminder of what you want to say is good, but
there's no need for the whole room to see it. Some laptops can display
different things to the speaker from what is being projected, or you can use
paper notes. My preference is for a print-out of my slides, annotated with
reminders; if you've got somebody else driving the computer then clutching some
notes isn't a problem.</p><p>One of the worst presentations I've seen had bullet points being complete
sentences, each appearing one-at-a-time on slides but with not-yet-reached
bullet points displayed in pale grey (on the white background), presumably to
remind the presenter what was coming up. But even the greyed-out bullet points
were readable from the audience, so on each slide change the audience would
quickly scan the contents then sit there waiting for the speaker to read them
all out loud.</p><p>Then one slide was a full-screen diagram. The speaker ad-libbed, drawing the
audience's attention to various points of interest in the diagram; this was
much more engaging, and showed what the speaker was capable of. Unfortunately
the following slide was bullet point sentences of commentary on the diagram;
the presenter went through each one, in most cases reading the out the full
bullet point and then adding ``as I just said''. Somehow he'd been trapped by
the bullet points, under some compulsion to read each one even though he knew
he'd already conveyed the relevant material. Don't do this!</p></dd></dl><p>So what should you put on you slides then? As a first approximation,
<em>nothing</em>.</p><p>Imagine that instead of presenting this at a conference you're explaining the
same material to a colleague. You would be able to do this without your
colleague seeing large bullet points appearing on the wall behind you. Guess
what? You don't need the bullet points at a conference either. 'Death by
PowerPoint' is a recognized phenomena, and definitely something you want to
avoid.</p><p>Slides are good for some things though, the kind of things which when you're
trying to explain them to a colleague would cause you to reach for a notepad or
a web-browser:</p><dl>
<dt> <strong>diagrams & graphs </strong></dt>
<dd>
both of which can convey things much more clearly than mere words
</dd><dt> <strong>lists </strong></dt>
<dd>
so the audience can glance through the items without you having to read them
all out
</dd><dt> <strong>code samples, URLs, regular expressions, etc </strong></dt>
<dd>
things which are tedious to spell out
</dd></dl><p>Slides can also be used for emphasis, to re-iterate particularly important
points -- but this has to be done sparingly for it to be effective.</p><p>Doing the above may mean that there are sections of your talk without any
slides. Don't worry about it: if you've decided that the best way of conveying
information about something is just to talk about it then adding in some
unnecessary bullet points is only going to make it worse.</p><p>Most conferences give all attendees papers, on either paper or a CD. If yours
doesn't, then make sure that your talk contains memorable instructions of where
the audience can find your paper.</p><p>
<strong>Content Order </strong>
</p><p>Nearly everything can be improved given a little more time to work on them, but
given that most speakers are preparing talks in their own time it seems a
little harsh to criticize a presentation you've seen. But there is one metric
which can be applied to a talk safe in the knowledge that following it would
have taken exactly the same preparation time: would the talk have been better
had exactly the same content been presented <em>but in the reverse order</em>.</p><p>That sounds ludicrous, but since thinking of it I've been surprised by how many
talks that applies to. It's natural to think of presenting material in the
order in which you first encountered it, or in the order which people will need
to use it. But those often aren't the best order for a live talk:</p><ul>
<li>
<p>A talk on using a graphical user interface toolkit showed all the coding first
with the live demo at the very end. So he was trying to explain what the
effect of various lines of code would be without us having seen any output at
all.</p></li><li>
<p>A beginner's testing talk covered in detail commands for running tests, with
the speaker trying to explain the effect various bits of code would have on the
output to an audience who hadn't yet seen a test run or any Tap.</p></li><li>
<p>An introduction to some label-generating software started with instructions on
how to download and compile it; until the audience have seen what the program
does they aren't going to be interested in running it (and anyway this is
exactly the sort of detail which can be left to the paper and not mentioned in
the talk at all). The talk then proceeded through configuring and running the
program, all the time the speaker alluding to what he was trying to generate
with it. Finally the last slide showed the finished product, a nice sheet of
produce labels. Far better to have those (or, even better, visual aids of
actual jars of rhubarb conserve and tomato chutney sporting the labels) as the
starting point, so the audience can see what the point of the talk is, and
those whose interest has been piqued can follow the details of how the speaker
gets there.</p></li></ul><p>If at any point you find yourself struggling to explain something, trying to
convince the audience that they'll see what your talking about shortly, then
that's a big clue that the order of your content needs swapping over. Remember
that a live presentation, unlike a book or webpage, is strictly linear; people
can't jump straight to particular sections.</p><p>
<strong>Practising </strong>
</p><p>Unsurprisingly talks benefit from being practised. If you're speaking at a
conference then try to find a local Perl Monger's or similar group to present
your talk at first; most are very grateful for any offers of talks.</p><p>
<strong>Setting Up </strong>
</p><p>You'd've thought that by now technology would have reached a state where any
laptop works with any projector. Unfortunately that appears not to be the
case, and every conference features talks delayed at the start as flummoxed
speakers graple with laptops and cables.</p><p>To avoid this happening to you, scope out the room well in advance of your
talk, the day before if possible. Sneak in during a quiet period and try
connecting up your computer and make sure everything works.</p><p>On the day turn up to the room during the break that precedes the block of
talks you're in, and get set up (though obviously you'll have to disconnect
from the projector again if you aren't the first talk in your block).</p><p>
<strong>Timing </strong>
</p><p>One of the hardest things to get right in a talk is the timing. There's no
magic fix for this, but practice runs definitely help -- and user groups can
usually be more flexible than conferences about timing, so it doesn't matter as
much if a talk presented there is longer or shorter than you were hoping for.</p><p>At a conference overrunning is much worse than underrunning, and can be very
rude. Here are some overrunning sins I've seen committed at conferences:</p><ul>
<li>
<p>A speaker checks with the following speaker in the room that overrunning is
acceptable, delaying the start of the next talk; this ignores the audience
members who were planning on changing streams after this talk to one in another
room, which isn't being delayed.</p></li><li>
<p>A speaker sees there's a five-minute gap between talks so he can continue into
that; this doesn't leave the following speaker any time to set up his
equipment, and can result in him being very flustered as he's trying to start
his talk.</p></li><li>
<p>A speaker notes that his talk is followed by the lunchbreak, so concludes it's
OK to run into it; this distresses audience members with plans to meet others
for lunch, and who don't know where to go by themselves in this foreign country</p></li></ul><p>In short, don't overrun.</p><p>Many conferences provide a 'session chair' in each room, who will give keep
track of time and signal to you when you have 5 minutes remaining. If this
catches you unaware, say you're only halfway through your material, then do not
do what many people seem to: attempt to whizz at high speed through all the
rest (yes, all of it); the rushing bamboozles the audience, and you end up not
finishing anyway, so they never get to hear your conclusions. Instead,
remember that all the information is in your paper (so it doesn't matter that
you say everything); point this out to the audience, try to decide what is the
most important point in your remaining material, cover that, and then deliver
your conclusion as planned.</p><p>
<strong>Ending </strong>
</p><p>There appear to be exactly two things you can say to end a talk: either "Any
questions?" or "Thank you".</p><p>Jumping straight to asking for questions can initiall seem OK, but after a few
questions it can become unclear whether the talk has finished yet; the next
speaker is getting jumpy because your final slide is still on the screen and he
wants to get plugged in; there are people in the audience who want to dash off
to a talk in another room but who feel uneasy about leaving before clapping
your talk; and there are people arriving from the other room who are chatting
on the way in, thinking that you're just engaged in post-presentation chatting
with a few individuals. It can all get quite messy and awkward.</p><p>The better way is to end simply with ``Thank you'', and a pause. The audience
will applaud at this point, and you can disconnect your computer. Then you can
seek questions while the next speaker is setting up and the audience are
shuffling between rooms.</p><p>
<strong>Further Reading </strong>
</p><dl>
<dt> <strong>'Conference Presentation Judo' -- Mark Jason Dominus </strong></dt>
<dd>
<p>While this talk is specifically about giving 3-hour conference tutorials, much
of what MJD says is also useful for 20-minute presentations, and you get to
find out why he talks with his mouth full.
<a href="http://perl.plover.com/yak/presentation/">http://perl.plover.com/yak/presentation/</a> </p></dd><dt> <strong>'Presentation Aikido' -- Damian Conway </strong></dt>
<dd>
<p>This talk isn't on the web (and I haven't seen it), but Google finds several
people who have seen it writing about it and offering useful nuggets.</p></dd><dt> <strong>'The Cognitive Style of PowerPoint' -- Edward Tufte </strong></dt>
<dd>
<p>Tufte's short book analyses how squashing thoughts into PowerPoint layouts
harms what we're trying to say. <a href="http://www.edwardtufte.com/tufte/powerpoint">http://www.edwardtufte.com/tufte/powerpoint</a> </p></dd><dt> <strong>Presentation Zen </strong></dt>
<dd>
<p>A blog with tips on giving successful presentations.
<a href="http://www.presentationzen.com/">http://www.presentationzen.com/</a> </p></dd></dl>Smylers2007-08-30T14:03:40+00:00journalWhen MySQL Bites: Quirks to Watch Out For
http://use.perl.org/~Smylers/journal/34246?from=rss
<p> <em>This is the paper that accompanies the talk I gave at Yapc::EU in Vienna
this morning. It's a draft there are a few things I plan to tidy up
shortly, and some more references to be added.</em> </p><p>If you're using MySQL, or considering using it, here are some things you
probably should be aware of.</p><p>
<strong>Dealing with Invalid
Data </strong>
</p><p>MySQL lets you specify restict what can be stored in fields, for example the
number of characters in a string. If you try to insert data that doesn't meet
the restrictions then MySQL tries really hard to insert <em>someting</em> anyway --
such as by truncating the string:</p><blockquote><div><p> <tt>mysql> CREATE TABLE creature (name VARCHAR(3) NOT NULL);<br>Query OK, 0 rows affected (0.25 sec)<br> <br>mysql> INSERT INTO creature SET name = 'caterpillar';<br>Query OK, 1 row affected, 1 warning (0.02 sec)<br> <br>mysql> SELECT * FROM creature;<br>+------+<br>| name |<br>+------+<br>| cat |<br>+------+</tt></p></div> </blockquote><p>Though you can often spot when text has been truncated; noticing numbers
changing can be trickier:</p><blockquote><div><p> <tt>mysql> CREATE TABLE exam (mark TINYINT(2) NOT NULL);<br>Query OK, 0 rows affected (0.01 sec)<br> <br>mysql> INSERT INTO exam SET mark = 172;<br>Query OK, 1 row affected, 1 warning (0.00 sec)<br> <br>mysql> SELECT * FROM exam;<br>+------+<br>| mark |<br>+------+<br>| 127 |<br>+------+<br>1 row in set (0.00 sec)</tt></p></div> </blockquote><p>At least with strings and numbers those cases the value that gets stored is
plausible for the field type, if wrong. With an <code>ENUM</code> field it can be
surprising to find the empty string being stored there at all:</p><blockquote><div><p> <tt>mysql> CREATE TABLE game (shape ENUM('scissors', 'paper', 'stone') NOT NULL);<br>Query OK, 0 rows affected (0.01 sec)<br> <br>mysql> INSERT INTO game SET shape = 'rock';<br>Query OK, 1 row affected, 1 warning (0.01 sec)<br> <br>mysql> SELECT * FROM game;<br>+-------+<br>| shape |<br>+-------+<br>| |<br>+-------+<br>1 row in set (0.00 sec)</tt></p></div> </blockquote><p>If you're using the interactive MySQL client then you do get notified when data
has been tweaked on insertion like this: the warning count. Use <code>SHOW
WARNINGS</code> to see what it's done:</p><blockquote><div><p> <tt>mysql> INSERT INTO game SET shape = 'rock';<br>Query OK, 1 row affected, 1 warning (0.01 sec)<br> <br>mysql> SHOW WARNINGS;<br>+---------+------+--------------------------------------------+<br>| Level | Code | Message |<br>+---------+------+--------------------------------------------+<br>| Warning | 1265 | Data truncated for column 'shape' at row 1 |<br>+---------+------+--------------------------------------------+<br>1 row in set (0.00 sec)</tt></p></div> </blockquote><p>Putting this into <i>~/.my.cnf</i>:</p><blockquote><div><p> <tt>[client]<br>show-warnings</tt></p></div> </blockquote><p>makes it less tedious to see the warnings, and more obvious when you have
some:</p><blockquote><div><p> <tt>mysql> INSERT INTO game SET shape = 'rock';<br>Query OK, 1 row affected, 1 warning (0.02 sec)<br> <br>Warning (Code 1265): Data truncated for column 'shape' at row 1</tt></p></div> </blockquote><p>But you'd have much less chance of discovering corrupt data in your database
if MySQL wouldn't let it be inserted in the first place. You can achieve this
by putting the server into <strong>strict mode</strong>: add this to the
server's config file (<em>/etc/mysql/my.cnf</em> or similar):</p><blockquote><div><p> <tt>[mysqld]<br>sql-mode = STRICT_ALL_TABLES</tt></p></div> </blockquote><p>and restart the server (merely reloading it isn't sufficient), with something
like:</p><blockquote><div><p> <tt>$ sudo<nobr> <wbr></nobr>/etc/init.d/mysql restart</tt></p></div> </blockquote><p>Then it refuses to insert any invalid data:</p><blockquote><div><p> <tt>mysql> INSERT INTO game SET shape = 'rock';<br>ERROR 1265 (01000): Data truncated for column 'shape' at row 1</tt></p></div> </blockquote><p>That <em>should</em> stop you inadvertently inserting any bad data. It
doesn't however prevent anybody from sticking some bad data in there if they
really want to. Firstly, the SQL modes are per-connection, so somebody could
simply turn off strict mode:</p><blockquote><div><p> <tt>mysql> INSERT INTO game SET shape = 'rock';<br>ERROR 1265 (01000): Data truncated for column 'shape' at row 1<br> <br>mysql> SET sql_mode = '';<br>Query OK, 0 rows affected (0.00 sec)<br> <br>mysql> INSERT INTO game SET shape = 'rock';<br>Query OK, 1 row affected, 1 warning (0.00 sec)</tt></p></div> </blockquote><p>And even with strict mode on MySQL helpfully provides the
<code>IGNORE</code> keyword for circumventing it:</p><blockquote><div><p> <tt>mysql> INSERT INTO game SET shape = 'rock';<br>ERROR 1265 (01000): Data truncated for column 'shape' at row 1<br> <br>mysql> INSERT IGNORE INTO game SET shape = 'rock';<br>Query OK, 1 row affected, 1 warning (0.05 sec)</tt></p></div> </blockquote><p>So far as I can tell it's impossible to guarantee that an <code>ENUM</code>
field can't contain the empty string. But it's pretty hard to code something
which does that accidentally, which is probably good enough.</p><p>
<strong>MySQL Server Upgrades </strong>
</p><p>Except that strict mode requires MySQL 5.0, which is less than 2 years old
and so isn't being used yet in many production systems.</p><p>For software 2 years might seem quite old; certainly many people try to keep
their systems more up to date than that, especially desktop applications such
as web browsers or even workstation operating systems. So it's easy to be
unsympathetic to somebody who hasn't upgraded.</p><p>But production database servers are things you want to be the most conservative
about upgrading; they are running continuously and are critical to many
functions, so stability is important. Upgrades need serious planning and
aren't to be undertaken on a whim; if the database is currently running fine
then upgrading it merely to get new features may not be a priority, or
something your sys-admins want to do often.</p><p>As such, a cycle of only upgrading your MySQL servers every couple of years
sounds quite reasonable.</p><p>It's also reasonable to wish to stick with the stable version of MySQL provided
in your operating system; this introduces a lag after a new MySQL release
before there's an OS release with it. And you probably don't want to risk
putting a new OS version on to your production DB servers the day it's
released, instead waiting for user reaction to it. Then you might want to do
some testing, trying out the upgrade on a staging server and running it there
for a while before committing it to the live one.</p><p>Together these factors can mean that somebody who's been making entirely
reasonable choices is running a MySQL version that's 3 year's old.</p><p>For example, consider somebody currently running version 4.1 and who doesn't
want to upgrade more often than once every 2 years. Should she upgrade now?
That would give her version 5.0, but means she'd still be running 5.0 in 2
years' time. Or should she hold on for version 5.1 to be released
meaning she continues with 4.1 currently?</p><p>
<strong>Lusting After the Next
MySQL Version </strong>
</p><p>What makes this particularly frustrating with MySQL is how many issues with it
can be resolved by using version 'n + 1', one higher than you're currently
using and what's particularly disturbing is that this seems to be
perpetually true, no matter how many years pass or how many upgrades you do.</p><p>In the days of version 3.22 we were keen to upgrade to 3.23, yet when we got
there we found ourselves frustrated by the lack of things like foreign keys and
subselects. 4.0 brought foreign keys, but subselects were put back to 4.1,
still leaving us wanting. Then once we had subselects we found ourselves
continually hitting up against problems that could only be solved by strict
mode, or a view, or a trigger, and so on.</p><p>Often these things conspire in groups: you can think of a workaround for an
'n + 1' feature, but only by using another feature that's also 'n + 1'! For
example, the above problem with invalid <code>ENUM</code>s inserting empty
strings is solved by strict mode, which requires 5.0; you could get avoid it
with a trigger, but that's also 5.0; and then you realize that
<a href="http://use.perl.org/comments.pl?sid=34758&cid=53488">you're only
using an <code>ENUM</code> in the first place because you don't have
views</a>, which are also 5.0:</p><p>Initially we thought we just wanted one upgrade; the 'new' version of MySQL
we'd read about sounded like it had finally become a 'proper' DBMS and would
remedy the major frustrations we'd found with it. Tantalizingly, many years
later that's still the case.</p><p>
<strong>MySQL Warnings in Perl </strong>
</p><p>If you aren't running a recent enough version of MySQL to have strict mode,
you can at least get the same warnings in Perl that you do interactively.
Consider the query:</p><blockquote><div><p> <tt>$db->do(q[INSERT INTO game SET shape = 'rock']);</tt></p></div> </blockquote><p>You can see if there are any warnings with the
<code>{mysql_warning_count}</code> attribute on a statement handle, and then do
<code>SHOW WARNINGS</code> to get them, perhaps turning the above into:</p><blockquote><div><p> <tt>use Lingua::EN::Inflect qw&lt;inflect>;<br> <br>my $insert = $db->prepare(q[INSERT INTO game SET shape = 'rock']);<br>$insert->execute;<br>if ($insert->{mysql_warning_count})<br>{<br> warn inflect "NUM($insert->{mysql_warning_count}) PL_N(warning):\n";<br> my $warning_query = $db->prepare(q[SHOW WARNINGS]);<br> $warning_query->execute;<br> while (my $warning = $warning_query->fetchrow_hashref)<br> {<br> warn "$warning->{Message}\n";<br> }<br>}</tt></p></div> </blockquote><p>which yields this output:</p><blockquote><div><p> <tt>1 warning:<br>Data truncated for column 'shape' at row 1</tt></p></div> </blockquote><p>You'd probably want to abstract the warning-checking into a routine used by
all your queries. Rather than just printing the warnings it could abort the
program or rollback the current transaction, emulating the behaviour you get
with strict mode.</p><p>Something like the above could also be useful even if you do have strict
mode, for ensuring that other MySQL warnings aren't ignored.</p><p>
<strong>Dates </strong>
</p><p>Strict mode dissuades MySQL from treating nonsense as dates:</p><blockquote><div><p> <tt>mysql> CREATE TABLE log (noted DATE NOT NULL);<br>Query OK, 0 rows affected (0.01 sec)<br> <br>mysql> INSERT INTO log SET noted = 'kapow';<br>ERROR 1292 (22007): Incorrect date value: 'kapow' for column 'noted' at row 1</tt></p></div> </blockquote><p>But it still thinks that a bunch of zeros is a cromulent date, without even so
much as a warning:</p><blockquote><div><p> <tt>mysql> INSERT INTO log SET noted = '0000-00-00';<br>Query OK, 1 row affected (0.00 sec)</tt></p></div> </blockquote><p>What's particularly special is that MySQL treats these zeros as being
<code>NULL</code>:</p><blockquote><div><p> <tt>mysql> SELECT * FROM log WHERE noted IS NULL;<br>+------------+<br>| noted |<br>+------------+<br>| 0000-00-00 |<br>+------------+<br>1 row in set (0.02 sec)</tt></p></div> </blockquote><p>And remember that that's in a field declared <code>NOT NULL</code>! That
zero value is still <code>NOT NULL</code> as well of course:</p><blockquote><div><p> <tt>mysql> SELECT * FROM log WHERE noted IS NOT NULL;<br>+------------+<br>| noted |<br>+------------+<br>| 0000-00-00 |<br>+------------+<br>1 row in set (0.00 sec)</tt></p></div> </blockquote><p>In fact it is both <code>NULL</code> and <code>NOT NULL</code> at the same
time:</p><blockquote><div><p> <tt>mysql> SELECT * FROM log WHERE noted IS NULL AND noted IS NOT NULL;<br>+------------+<br>| noted |<br>+------------+<br>| 0000-00-00 |<br>+------------+<br>1 row in set (0.00 sec)</tt></p></div> </blockquote><p>That's probably not a date you want in your database. Even through strict
mode permits it there's another SQL mode, <code>NO_ZERO_DATE</code>, which
stops it being inserted:</p><blockquote><div><p> <tt>mysql> SET sql_mode = 'STRICT_ALL_TABLES,NO_ZERO_DATE';<br>Query OK, 0 rows affected (0.00 sec)<br> <br>mysql> INSERT INTO log SET noted = '0000-00-00';<br>ERROR 1292 (22007): Incorrect date value: '0000-00-00' for column 'noted' at row 1</tt></p></div> </blockquote><p>But even with that, dates with zeroes in a <em>part</em> of them can still
get into your database:</p><blockquote><div><p> <tt>mysql> INSERT INTO log SET noted = '2007-07-00';<br>Query OK, 1 row affected (0.00 sec)</tt></p></div> </blockquote><p>To prevent that there's yet another SQL mode, <code>NO_ZERO_IN_DATE</code>! No, really:</p><blockquote><div><p> <tt>mysql> SET sql_mode = 'STRICT_ALL_TABLES,NO_ZERO_DATE,NO_ZERO_IN_DATE';<br>Query OK, 0 rows affected (0.00 sec)<br> <br>mysql> INSERT INTO log SET noted = '2007-07-00';<br>ERROR 1292 (22007): Incorrect date value: '2007-07-00' for column 'noted' at row 1</tt></p></div> </blockquote><p> <strong>SQL Modes </strong>
</p><p>All these modes are getting a bit tedious. Fortunately it isn't necessary
to remember them all individually: MySQL kindly provides a single 'shorthand'
mode for turning on all these various sorts of strictures. It's called
<code>TRADITIONAL</code>, and that's probably what you want in your server
config:</p><blockquote><div><p> <tt>[mysqld]<br>sql-mode = TRADITIONAL</tt></p></div> </blockquote><p>Yup MySQL considers all this checking to be something traditionally
people have wanted to do with databases; perhaps they think it's even a little
quaint or old-fashioned, given that it all defaults to being off.</p><p>
<strong>Timezones </strong>
</p><p>Back to dates, beware that <code>TIMESTAMP</code> values are returned
(compared, manipulated, etc) in the server's local timezone. Often this is
convenient, but since no timezone is returned it can pose problems with
daylight-saving time.</p><p>When the clocks are turned back in the autumn there are two consecutive
hours which have the same local time, and you have no way of distinguishing
them in the database.</p><p>One way of dealing wtih this would be to <a href="http://dev.mysql.com/doc/refman/5.1/en/time-zone-support.html">tell
MySQL to operate in UTC</a>.</p><p>But after all the above my preferred option is simply not to let MySQL get
anywhere near my dates, storing everything as epoch seconds in integer fields,
which can be stored and retrieved reasonably sanely and without giving MySQL a
chance to fiddle with them it can't even tell they <em>are</em>
dates!</p><p>The Perl <em>DateTime</em> module makes writing and reading epoch times easy
(as well as generally being excellent at dealing with dates in Perl), and if
interactively you ever need to peek into a date field you can use the MySQL
functions <code>From_UnixTime()</code> and <code>Unix_Timestamp()</code> to
convert between epoch times and something readable.</p><p>
<strong>Foreign Keys </strong>
</p><p>MySQL has often been accused at being poor at referential integrity. This
requires foreign keys, introduced in MySQL 4.0. They work fine, just so long
as you use the correct syntax.</p><p>Just as MySQL by default really tries hard to insert data it's given, even
if it has to tweak some values, it wants to be accommodating of valid SQL
to such an extent that if passed something it recognizes as being an
SQL construct it doesn't support, it will quietly ignore it and just get on
with processing the rest of the statement. So MySQL has a background of
happily ignoring foreign key references it can't do anything with.</p><p>With InnoDB though it can do something with them, and indeed this does what
you want:</p><blockquote><div><p> <tt>mysql> CREATE TABLE supplier<br> (<br> id INTEGER AUTO_INCREMENT PRIMARY KEY,<br> name TINYTEXT NOT NULL<br> ) ENGINE=InnoDB;<br>Query OK, 0 rows affected (0.01 sec)<br> <br>mysql> CREATE TABLE product<br> (<br> id INTEGER AUTO_INCREMENT PRIMARY KEY,<br> supplier_id INTEGER NOT NULL,<br> name TINYTEXT NOT NULL,<br> FOREIGN KEY (supplier_id) REFERENCES supplier(id)<br> ) ENGINE=InnoDB;<br>Query OK, 0 rows affected (0.01 sec)</tt></p></div> </blockquote><p>This alternative syntax for specifying the foreign key is also valid SQL,
and possibly a little more convenient:</p><blockquote><div><p> <tt>mysql> CREATE TABLE product<br> (<br> id INTEGER AUTO_INCREMENT PRIMARY KEY,<br> supplier_id INTEGER NOT NULL REFERENCES supplier(id),<br> name TINYTEXT NOT NULL<br> ) ENGINE=InnoDB;<br>Query OK, 0 rows affected (0.01 sec)</tt></p></div> </blockquote><p>There's only one problem with it: it doesn't do anything. For reasons that
currently remain mysterious, MySQL continues to treat the above
<code>REFERENCES</code> as something which it knows about but doesn't do
anything with despite this being a situation in which it can do
something with it! And note the lack of a warning in response to it: it
doesn't even admit to having ignored a crucial part of what you typed in.</p><p>Unfortunately there doesn't appear to be a way round this. The best you can
hope for is finding the situation so horrendous that it embeds itself deep in
your psyche such that you'll always remember its existence and to check very
carefully whenever adding foreign keys.</p><p>
<strong>MySQL Features and Table
Types </strong>
</p><p>To get foreign key support you need to use InnoDB tables. Unfortunately
that prevents you from using some of MySQL's widely touted features: full-text
search only works with MyIsam tables, and clustering uses its own table type,
NDB.</p><p>Or to put it the other way round, if your system requires either full-text
searching or clustering (and you can only pick one) then you can't have
enforced referential integrity though a future release may improve the
situation.</p><p>Something else MySQL has a reputation for is its speed, though again that's
usually using MyIsam tables, and therefore without foreign keys.</p><p>
<strong>Full-Text Searching </strong>
</p><p>The full-text searching is quite nice, the only tricky bit being coaxing
MySQL into actually indexing all of your text. The first issue is that by
default it only indexes words of 4 or more letters which can be
irritating when you're trying to search your wiki for ``FTP'' (and of course
when you search for a word too short to have been indexed it merely returns no
results; there's no error).</p><p>The minimum word length can be reduced in
<em>my.cnf</em>:</p><blockquote><div><p> <tt>[mysqld]<br>ft_min_word_len=3<br>[myisamchk]<br>ft_min_word_len=3</tt></p></div> </blockquote><p>If you've already indexed some content before lowering this value then
'repairing' the relevant tables will activate the change:</p><blockquote><div><p> <tt>mysql> REPAIR TABLE archive QUICK;</tt></p></div> </blockquote><p>The other part of MySQL's default set-up which clobbers your indexing attempts
is its extensive list of stop-words. It often makes sense not to index
occurrences of really common words with little semantic value, such as ``the'',
``of'', and ``and''; but it's far from clear that you'd want to avoid words such as
``immediate'', ``appropriate'', ``sensible'', and ``unfortunately'' from bothering your
index, yet they are among
<a href="http://dev.mysql.com/doc/refman/5.1/en/fulltext-stopwords.html">dozens
of default stop-words</a>.</p><p>(For some reason the stop-words are displayed in the manual with lines round
them, making them look like those fridge magnets which can be used for spelling
out phrases. Perhaps we should have a competition for creating the longest
valid paragraph entirely out of MySQL stop-words?)</p><p>You can remedy the situation by creating a file containing only the
stop-words you want and
<a href="http://dev.mysql.com/doc/refman/5.1/en/fulltext-fine-tuning.html">setting
<code>ft_stopword_file</code> to its path</a>.</p><p>
<strong>Alternatives to MySQL </strong>
</p><p>If you don't have a particular need to use MySQL, then you could consider
avoiding the quirks highlighted by picking a different DMBS.</p><p>Postgres seems to get referential integrity right and is suitable for many
situations where you want a ‘proper’ database; it's a plausible
alternative to MySQL if you'd only be using MySQL <code>InnoDB</code> tables.
SQLite is good for the niche of wanting something small and low-hassle.</p><p>Of course other software has other issues, and it may be that on balance
MySQL is the best DBMS to use for your situation.</p><p>A common reason for this is because you're already using MySQL, and it'd be
awkward to switch. I'd be interested to see any case studies of non-trivial
live systems which have switched from MySQL to Postgres how much effort it
was, and whether it was felt to be worth it.</p><p>
<strong>Conclusion </strong>
</p><p>MySQL has a few surprises. If you're going to use it then it's better to be
aware of them. How much they affect you depends on your circumstances, but in
many cases merely knowing about the potential issues is a big help in avoiding
being hurt by them. MySQL is widely deployed and many people use it
successfully despite all the above.</p>Smylers2007-08-28T12:19:54+00:00journal