Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • Unfortunately, Unicode and perl still isn't as good as it should be. I've had lots of problems too.

    My current favourite is POSTing XML to a server using lwp. You send the XML, it looks fine from the client, but when the server reads it in, it's got the final few characters chopped off. Why? Because when LWP is calculating the Content-Length header, it's getting the length in characters not bytes. So you have to make sure that you convert to bytes before you use LWP to send information across a networ

    • You mention getting UTF8 into URIs not working. That's because there's no defined standard for doing so.

      HTML 4.01 spec says [w3.org]:

      We recommend that user agents adopt the following convention for handling non-ASCII characters in such cases: 1. Represent each character in UTF-8 (see [RFC2279]) as one or more bytes. 2. Escape these bytes with the URI escaping mechanism (i.e., by converting each byte to %HH, where HH is the hexadecimal notation of the byte value).
      If you enter non-ASCII chars both latest versions of IE and Opera encode it correctly (i.e. by converting them in UTF-8 first and converting each byte to %HH). Mozilla 1.0 doesn't do it (I have not tried latest releases yet).

      And then I have no idea how to make Apache::Request or CGI do the right thing.

      I use my own wrapper of Apache::Request:

      package Datamodel::Request;

      use strict;
      use warnings;

      use base qw(Apache::Request);

      use Datamodel::Tools qw(utf8_upgrade);

      sub new {
          my $class = shift;
          my $self = bless $class->SUPER::new(@_), $class;
          return $self;
      }

      sub uri {
          my $self = shift;
          return utf8_upgrade($self->SUPER::uri(@_));
      }

      sub param {
          my $self = shift;

          my @ret = utf8_upgrade($self->SUPER::param(@_));
          if(wantarray) {
              return @ret;
          } else {
              return @ret > 1 ? [ @ret ] : $ret[0];
          }
      }
      Datamodel::Tools::utf8_upgrade is a sub that converts byte string which contains UTF-8 text into native UTF-8 Perl string. I think it can be replaced with one of subroutines from utf8 module but I have not tried it (part of this code was written before I decided it is a waste of time trying to workaround unicode problems in 5.6.1 and utf8 subroutines are only available in 5.8.0)
      --

      Ilya Martynov (http://martynov.org/ [martynov.org])