"The mod_perl debugger you've always wanted" is on my new blog at: http://bit.ly/agzWSU
Hi,
This blog post is actually at http://blogs.perl.org/users/josh_ben_jore/2010/07/unicode-in-perl-ftw.html.
Hi, I posted to my new blog at blogs.perl.org:
http://blogs.perl.org/users/josh_ben_jore/2010/07/under-the-covers-of-perldoc.h
In http://nerds-central.blogspot.com/2009/09/tuning-jvm-for-unusual-uses-have-some
Allegedly adding -XX:-DontCompileHugeMethods to the command line eliminates this limit.
In "current" versions of JVM which don't have the very new invokedynamic bytecode operation and languages implemented on top of the JVM like JRuby, I assume their implementation does all the dynamism as ordinary code. I assume then this means the bytecode sizes are artificially inflated and then possibly just missing out on this nice optimization.
Allegedly, this is one of those heuristics that moved Java from the "slow" category to "faster than C++".
Last year I found that MRI Ruby-1.9 and JRuby were equivalently fast as Perl 5 but now I wonder if the JRuby implementation I was using was missing out and possibly could go even *faster*.
FWIW, MRI Ruby-1.8 is 1/10 the speed of Perl 5.
I was debugging some test failures earlier this morning but they were in the middle of a long test script. I thought it'd be nice if I could step through the test with the debugger and just stop whenever a test failed.
Here's the result. A test failure immediately followed by being in the debugger:
not ok 236
14: return $result;
DB<1>
and then I could ask for the stack trace which is nice:
DB<1> T
. = main::ok(undef) called from file `t/02_methods.t' line 796
. = main::check_tar_extract(ref(Archive::Tar), ref(ARRAY)) called from file `t/02_methods.t' line 507
I wrote the below snippet to automatically stop the debugger whenever something like ok() failed. Currently you can just drop this right after your "use Test::More" statement. I think possibly this kind of idea should be packaged in a no-muss CPAN module or perhaps directly right into Test::More.
# Wrap Test::More functions with debugger breakpoints
BEGIN {
use B;
# Enumerate over all symbols in the main:: symbol table.
SYMBOL:
for my $symbol ( sort keys %main:: ) {
# Choose only things that are functions.
{
no strict 'refs';
next SYMBOL if ! defined &$symbol;
}
# Get a reference to the function.
my $code = do {
no strict 'refs';
\&$symbol;
};
# Get a B::CV object so I can get at meta-data about the
# function.
my $cv = B::svref_2object( $code );
# Get the path to the compilation file. Will often be a path like
# '/usr/share/perl/5.10/Test/More.pm'.
#
# To visually inspect other meta-data possibly available about this function:
#
# use Devel::Peek;
# Dump( $code );
#
my $src_file = $cv->FILE;
# Select only functions coming from files named like
#.../Test/More.pm.
next SYMBOL if $src_file !~ m{/Test/More\.pm$};
# Propagate the old function's prototype to the new function.
my $prototype = prototype $code;
my $new_prototype =
defined $prototype
? "($prototype)"
: '';
# Generate the new function and replace the old function. The new function
# has access to the original via the closed-over $old variable.
my $src = qq{
no warnings 'redefine';
sub $symbol $new_prototype {
# Call the original function and get the test pass/fail.
my \$result = \$code->( \@_ );
# Debugger single-stepping mode if the test failed.
if ( ! \$result ) {
\$DB::single = 1;
}
# Return the test/failure.
return \$result;
}
# Compilation of the new function succeeded?
1;
};
eval $src
or die $@;
}
}
I figured out last night that I could use allocate memory within another process for the purposes of using more complicated APIs. In the below example, I attached to one of my shells and queried the groups out of it.
In this snippet, the target shell tells me its process id is 40728.
Last login: Fri Jan 8 16:51:00 on ttys000
w3m211a:~ jjore$ echo $$
40728
I'm calling the (int)getgroups( int, int* ) kernel function. I start by getting the size of an integer because I'm going to allocate some multiple of this.
w3m211a:~ jjore$ gdb -p 40728
(gdb) print sizeof(int)
$1 = 4
Call (int)getgroups(int,int*) once to get the number of groups I'm a member of.
(gdb) print (int)getgroups(0,0)
$2 = 16
Allocate 64 bytes of memory to hold my groups[16] array.
(gdb) print (void*)malloc(64)
$3 = (void *) 0x10010fca0
Call (int)getgroups(int,int*) again but this time the kernel will write to my memory with the group ids.
(gdb) print (int)getgroups($2,$3)
$4 = 16
Print the 16 decimal numbers out of memory
(gdb) x/16d $3
0x10010fca0: 2042662593 401 1612991420 1178352327
0x10010fcb0: 102 800006232 204 100
0x10010fcc0: 98 81 80 79
0x10010fcd0: 62 12 155049148 2133303823
Clean up
(gdb) print (void)free($3)
$5 = void
Hey internet, ⠸⠙⠱ ⠝⠉⠁⠈ ⠅⠝⠁⠕⠕⠉⠃ ⠝⠆⠏⠍⠞?
A year or more ago I was fixing work's web site to handle Unicode as entered by users into fields. We don't use CGI.pm because....? Well ok, we just don't. It also doesn't handle Unicode properly either. Or at least almost no version. Huh.
If a user types "Coatıcook" you'll probably get the dotless "i" character as either %C4%B1 or %u131 but CGI.pm as supplied by perl almost most of the time won't do something reasonable.
Wut?
for v in 5.11.3 5.10.1 5.10.0 5.8.9 5.6.2;do
/opt/perl-$v-64-thr-dbg/bin/perl\
-le '
use CGI;
my $input = "a=%u2021";
my $expect = "\x{2021}";
my $got = CGI->new( $input )->param( "a" );
print $expect eq $got
? "ok $] $CGI::VERSION"
: "not ok $] $CGI::VERSION"
';
done
I'd recently encountered some data that had been re-encoded five times. Ugh. The key was to guess that if a character 0xC0-0xFF is followed by 0x80-xBF, it's likely that the bytes are actually utf-8. What follows is a function which guessed a reasonable way to deal with the data and turn it into the right utf-8.
use Test::More tests => 1;
use Encode ();
my $bad = "\xc3\x83\xc2\x83\xc3\x82\xc2\x83\xc3\x83\xc2\x82\xc3\x82\xc2\x83\xc3\x83\xc2\x8 3\xc3\x82\xc2\x82\xc3\x83\xc2\x82\xc3\x82\xc2\xa9";
my $good = "\xe9";
is( multiple_decode( $bad ), $good );
sub multiple_decode {
my ( $str ) = @_;
Encode::_utf8_on( $str );
while ( $str =~/[\xc0-\xff][\x80-\xbf]/ ) {
utf8::downgrade( $str );
Encode::_utf8_on( $str );
}
return $str;
}
I tried doing this in Ruby because I actually needed this for an EventMachine (http://rubyeventmachine.com/) server but never quite got it working. Iconv seemed to want to be strict about rejecting the originally ostensibly invalid input.
# Iconv::IllegalSequence: "\303\203\302\203\303\202\302\203\303\203\302\202\303\202\302\203"...
require 'test/unit'
require 'iconv'
class MDecode < Test::Unit::TestCase
def test_multiple_decode
conv = Iconv.new( 'UTF-8', 'ASCII' )
bad = "\xc3\x83\xc2\x83\xc3\x82\xc2\x83\xc3\x83\xc2\x82\xc3\x82\xc2\x83\xc3\x83\xc2\x8 3\xc3\x82\xc2\x82\xc3\x83\xc2\x82\xc3\x82\xc2\xa9"
good = "\xe9"
assert_equal( good, conv.multiple_decode( bad ) )
end
end
class Iconv
def multiple_decode( str )
while str =~/[\xc0-\xff][\x80-\xbf]/
str = iconv( str )
end
return str
end
end
#!/bin/bash
set -x
set -e
perl="$HOME/src/perl"
function build-it () {
bversion=$1
shift 1
dir=/opt/$bversion
tar=/opt/$bversion.tar.bz2
if [[ ! -e $tar ]]; then
echo "Clobbering $dir"
rm -rf $dir
mkdir $dir
cd $HOME/src/perl
echo "Configure "$(date) > $dir/stamp.log
./Configure -des -Dcc='ccache gcc' -Dprefix=$dir $* 2>&1 | tee $dir/config.log
echo "make "$(date) >> $dir/stamp.log
make 2>&1 | tee $dir/make.log
echo "make test "$(date) >> $dir/stamp.log
make test 2>&1 | tee $dir/test.log
echo "make install "$(date) >> $dir/stamp.log
make install 2>&1 | tee $dir/install.log
echo "End "$(date) >> $dir/stamp.log
chown -R jbenjore $dir/
cd/
tar cjf $tar $dir
cd $HOME/src/perl
git clean -xdf
git reset --hard
fi
}
git clean -xdf
git reset --hard
for tag in\
v5.11.3\
perl-5.10.1 perl-5.10.1\
perl-5.8.9 perl-5.8.8 perl-5.8.7 perl-5.8.6 perl-5.8.5 perl-5.8.4 perl-5.8.3 perl-5.8.2 perl-5.8.2 perl-5.8.1 perl-5.8.0
do
git checkout $tag
build-it $tag-64-dbg -DDEBUGGING -Duse64bitint
build-it $tag-thr-dbg -DDEBUGGING -Dusethreads
build-it $tag-64-thr-dbg -DDEBUGGING -Duse64bitint -Dusethreads
build-it $tag-dbg -DDEBUGGING
build-it $tag -DDEBUGGING=-g
done
for tag in\
perl-5.6.2 perl-5.005_04
do
git checkout $tag
build-it $tag-64-dbg -DDEBUGGING -Duse64bitint
build-it $tag-thr-dbg -DDEBUGGING -Dusethreads
build-it $tag-64-thr-dbg -DDEBUGGING -Duse64bitint -Dusethreads
build-it $tag-dbg -DDEBUGGING
build-it $tag -g
done
I've been seriously neglecting my CPAN code. It's been pretty overwhelming having too many modules up there and no time to really get into anything. I figured it was worth at least an hour and a half's time to read the CPAN tester reports.
I just filed 34 bugs based purely on CPAN testers output. Bleagh.
Acme-Anything
Acme-Nothing
Alien-Judy
App-Perldoc-Search
B-Deobfuscate
B-Generate
B-Lint
B-Utils
Devel-Dt
Devel-OptreeDiff
Devel-StackBlech
ESPPlus-Storage
Enbugger
Geo-TigerLine-Abbreviations
Judy
Regexp-NamedCaptures
Runops-Trace
Term-HiliteDiff
UNIVERSAL-ref
overload-eval
perl-lint-mode