Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

miyagawa (1653)

miyagawa
  (email not shown publicly)
http://bulknews.vox.com/
AOL IM: bulknews (Add Buddy, Send Message)

Journal of miyagawa (1653)

Tuesday April 07, 2009
06:51 PM

DBD::SQLite and Unicode

[ #38770 ]

Attention to anyone using DBD::SQLite and $dbh->{unicode} attribute set to 1.

This module has a long standing bug where it assumes passed strings internal encoding is UTF-8 when inserting values into the database and I'm trying to fix it.


use DBI;
use Encode;

my $utf8_string = "This is \x{30c6}\x{30b9}\x{30c8}"; # "Test" in Japanese
my $utf8_bytes = encode_utf8($string);
my $lat1_string = "H\xe9llo World"; # Héllo

my $dbh = DBI->connect("DBI:SQLite:...", ...);
$dbh->{unicode} = 1;

my $sth = $dbh->prepare("INSERT INTO foo (bar) VALUES (?)";

$sth->execute($utf8_string); # (1) Good
$sth->execute($utf8_bytes); # (2) ???
$sth->execute($lat1_string); # (3) ???

Current version of DBD::SQLite (prior to 1.21) assumes given string's INTERNAL encoding as UTF-8 and stores the octet stream into the database without calling encode_utf8 nor utf8::upgrade, so this makes #2 PASS and #3 FAIL (invalid UTF-8 octet in the database), which is not correct.

My patch solves this, and #2 $utf8_bytes will be now double encoded and FAIL, but #3 PASS with correct UTF-8 octet stream.

That #2 FAIL might break your (potentially-already-broken) app, when you try to save UTF-8 encoded strings into the database under 'unicode' option, but I believe this is a right fix to make it FAIL.

http://svn.ali.as/cpan/trunk/DBD-SQLite/t/rt_25371_asymmetric_unicode.t is a failing test by Juerd and http://fisheye2.atlassian.com/changelog/cpan/trunk/DBD-SQLite?cs=6077 is my patch to fix that. This patch still passes all tests, including 12_unicode.t and 20_blobs.t, and this makes DBD::SQLite's unicode option compatible to what DBD::mysql's mysql_enable_utf8 option does, etc.

Note that if you REALLY want to save the octet bytes without being encoded into UTF-8, you can still define the table with BLOB column type and use 3-arg bind_param like explained in DBD::SQLite POD. That 'unicode' section continues to be entirely correct with this patch.

Let me know your input in #dbd-sqlite on irc.perl.org. Testing your app with my patch and reporting it back would be highly appreciated too.

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.