Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

ddick (5726)

ddick
  (email not shown publicly)

I'm based out of Melbourne, Australia. I attend the excellent melbourne.pm.org meetings whenever i get the chance, which is not often enough.

Journal of ddick (5726)

Monday June 08, 2009
07:49 PM

UTF-8, perl and Microsoft SQL Server

[ #39102 ]

Attempting to note things without going insane with frustration. SQL Server cannot handle UTF-8. It can however handle UCS-2 (great for windows or java programmers), but in a bizarre fashion.

  • First, char, varchar and text fields do not hold UCS-2 data. nvarchar, nchar and ntext do.
  • Secondly, quoting of data is somewhat different. Instead of quoting data like so 'foo', it becomes N'foo'.
  • Now finally, when connecting with DBD::Sybase with underlying freetds libraries, you need to define "Client Charset = UTF-8" AND use the Encode module to encode your statement as 'UTF-8' before passing it to DBI->prepare.

Salutations to the DBD::Sybase and freetds team for their excellent work in allowing perl to talk UTF-8 (at least the UCS-2 compat portion of it) to a database that doesn't even support it.

As a side note, i think that when constructing a test suite for a program that uses a database for a backend, it's essential to test the database itself for edge case conditions, such as in this case, making sure that you can fill a varchar(10) with ten three byte unicode characters AND retrieve it AND the retrieved value matches your expected result. Cos maybe your definitions are behaving slightly differently than you thought.

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • I've been having some difficulties with this. In my case, I have trouble pulling data out of the database when it uses Windows special characters. Any idea how to go the opposite direction?

    --
    J. David works really hard, has a passion for writing good software, and knows many of the world's best Perl programmers
    • What sort of trouble are you having? In my case, I Encode::encode the statement before calling $dbh->prepare and when retrieving $row = $sth->fetchrow_hashref() i run Encode::decode on each field in $row.
      • It appears to blow up for me on fetchrow_arrayref. I'm not sure I get a chance to use Encode on what came back. I might be able to hack into DBD::Sybase and do it in there.

        I haven't had a need to do anything to the statement; everything I'm passing is straight ASCII. The only problems I face are when I deal with records where people have inserted Windows charset members. Then I get this message:

        DBD::Sybase::st fetchrow_arrayref failed: OpenClient message: LAYER = (0) ORIGIN = (0) SEVERITY = (9) NUMBER

        --
        J. David works really hard, has a passion for writing good software, and knows many of the world's best Perl programmers
        • the client charset is set in freetds.conf and defaults to ISO-8859-1 (i think) which would die if windows characters are used. set "Client Charset" to "UTF-8" and see what happens. also, check the TDSDUMP file. Mine has the following lines at the start

          log.c:190:Starting log file for FreeTDS 0.82
                  on 2009-06-11 04:50:38 with debug flags 0x4fff.
          iconv.c:363:iconv to convert client-side data to the "UTF-8" character set
          iconv.c:516:tds_iconv_info_init: converting "UTF-8"->"UCS-2L

          • A year later, I'm still struggling with this issue. Basically when I add client charset = UTF-8 to my freetds.conf file, I get "Out of memory!" thrown at me, with no apparent reason.

            And strangely enough, my query runs just fine from the tsql freetds command-line program.

            Do you happen to have any ideas?

            --
            J. David works really hard, has a passion for writing good software, and knows many of the world's best Perl programmers
            • First off, if you are hitting memory issues, are you trying to retrieve a text (as distinct from a /(?:var)?char/ field?

              Secondly, can you set the TDSDUMP environment variable and post the output? checkout http://www.freetds.org/userguide/logging.htm [freetds.org]

              • Found my answer. It was a text field, and the default length was massively too long. Apparently SQLServer defaults to something huge. I could change it in freetds.conf, and that fixed the problem.

                --
                J. David works really hard, has a passion for writing good software, and knows many of the world's best Perl programmers
              • Also, TDSDUMP is really helping me ... thanks!

                --
                J. David works really hard, has a passion for writing good software, and knows many of the world's best Perl programmers
            • Even better, use the TDSDUMP variable when running against DBD::Sybase (the failure) and tsql (the success) and compare the debug logs to find the difference.

    • also, i found setting

      export TDSDUMP=/tmp/freetds.log

      to be a huge help, as you can then see what your data is encoded as it travels over the network.