Hey internet, ⠸⠙⠱ ⠝⠉⠁⠈ ⠅⠝⠁⠕⠕⠉⠃ ⠝⠆⠏⠍⠞?
A year or more ago I was fixing work's web site to handle Unicode as entered by users into fields. We don't use CGI.pm because....? Well ok, we just don't. It also doesn't handle Unicode properly either. Or at least almost no version. Huh.
If a user types "Coatıcook" you'll probably get the dotless "i" character as either %C4%B1 or %u131 but CGI.pm as supplied by perl almost most of the time won't do something reasonable.
Wut?
for v in 5.11.3 5.10.1 5.10.0 5.8.9 5.6.2;do
/opt/perl-$v-64-thr-dbg/bin/perl\
-le '
use CGI;
my $input = "a=%u2021";
my $expect = "\x{2021}";
my $got = CGI->new( $input )->param( "a" );
print $expect eq $got
? "ok $] $CGI::VERSION"
: "not ok $] $CGI::VERSION"
';
done
CGI.pm (Score:1)
CGI.pm decodes the non-standard (and invalid according to RFC 3986) pct escape into a UTF-8 octet string, but it doesn't decode it into perl unicode string. I think the current behavior is desirable since the data can contain any octets in any encoding.
--
chansen
Re: Unicode URLs, wtf? (Score:1)
What sort of encoding is that? I mean, I can see it's the Unicode codepoint preceded by %u, but which standard backs this? I've never encountered this before.
Here's my take on it:
Re: (Score:1)
It usually comes from broken javascript applications that uses escape() instead of encodeURI()
escape("\u263A") -> %u263A
encodeURI("\u263A") -> %E2%98%BA
--
chansen
:utf-8 ? (Score:1)
Did you try using use 'CGI qw/ :utf8 /;'? That seems to work the way you want with CGI 3.49 (at least it seems to on my box).
Re: (Score:1)
Nope. I'd never noticed the option. My bad!