Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

Journal of ambs (3914)

Monday July 26, 2004
01:56 PM

DB_File and MLDBM

[ #20070 ]

I really love Ties (Perl ties, of course) and specially DB_File and MLDBM. I use them very often (specially the second, which imply I use the first too!).

Everytime I need persistence, I call MLDBM. I just hate relational databases. I know some SQL, I did some code to interact with MySQL, but I hate them.

Today, it was another MLDBM day. I was preparing a system for webpapers. WebPaper is a game where you have to answer a set of questions using the Internet, ICQ, MSN, Jabber, MOO, MUD, IRC and so on.

The problem is: I have a set of questions which use unicode characters. Things like and s with a ^ upsidedown... and such things. The same happens with answers. I was storing those answers on a MLDBM but it is storing unicode as a sequence of bytes. That means that when I get the value again, Perl will look to the string as a normal string instead of a utf-8 string. Anybody has any idea of how I can solve this (easily, please)?

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • I guess the 'right' answer would be to have the DBM layer preserve the UTF8 flag on the strings. In the absence of that solution, if you have a string containing UTF8 byte sequences but not flagged as UTF8, you can turn on the flag like this (perl 5.8 rqd):

    use Encode qw(_utf8_on);

    my $string = "\xE2\x82\xAC";  # The Euro symbol

    print length($string), "\n";

    _utf8_on($string);

    print length($string), "\n";

    Which prints:

    3
    1

    This is documented in the Encode man page

    • Thanks. This can do the trick, specially if it can be used on strings not containing utf-8 characters (I think it can). I'll try it tomorrow morning.