Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

Journal of jjore (6662)

Thursday July 22, 2010
12:53 PM

The mod_perl debugger you've always wanted

"The mod_perl debugger you've always wanted" is on my new blog at: http://bit.ly/agzWSU

Wednesday July 07, 2010
01:01 AM

Unicode in Perl, FTW (Я очень ра

12:44 AM

Under the covers of perldoc

Wednesday May 26, 2010
05:02 PM

JVM's -XX:-DontCompileHugeMethods & dynamic languages

In http://nerds-central.blogspot.com/2009/09/tuning-jvm-for-unusual-uses-have-some. html which I just picked out of reddit.com's front page I notice there's an ordinary 8K bytecode size limit for JVM heuristics.

Allegedly adding -XX:-DontCompileHugeMethods to the command line eliminates this limit.

In "current" versions of JVM which don't have the very new invokedynamic bytecode operation and languages implemented on top of the JVM like JRuby, I assume their implementation does all the dynamism as ordinary code. I assume then this means the bytecode sizes are artificially inflated and then possibly just missing out on this nice optimization.

Allegedly, this is one of those heuristics that moved Java from the "slow" category to "faster than C++".

Last year I found that MRI Ruby-1.9 and JRuby were equivalently fast as Perl 5 but now I wonder if the JRuby implementation I was using was missing out and possibly could go even *faster*.

FWIW, MRI Ruby-1.8 is 1/10 the speed of Perl 5.

Sunday May 09, 2010
12:43 PM

Stop the debugger when a test fails

I was debugging some test failures earlier this morning but they were in the middle of a long test script. I thought it'd be nice if I could step through the test with the debugger and just stop whenever a test failed.

Here's the result. A test failure immediately followed by being in the debugger:

not ok 236
14:                    return $result;
  DB<1>

and then I could ask for the stack trace which is nice:

  DB<1> T
. = main::ok(undef) called from file `t/02_methods.t' line 796
. = main::check_tar_extract(ref(Archive::Tar), ref(ARRAY)) called from file `t/02_methods.t' line 507

I wrote the below snippet to automatically stop the debugger whenever something like ok() failed. Currently you can just drop this right after your "use Test::More" statement. I think possibly this kind of idea should be packaged in a no-muss CPAN module or perhaps directly right into Test::More.

# Wrap Test::More functions with debugger breakpoints
BEGIN {
    use B;
 
    # Enumerate over all symbols in the main:: symbol table.
    SYMBOL:
    for my $symbol ( sort keys %main:: ) {
 
        # Choose only things that are functions.
        {
            no strict 'refs';
            next SYMBOL if ! defined &$symbol;
        }
 
        # Get a reference to the function.
        my $code = do {
            no strict 'refs';
            \&$symbol;
        };
 
        # Get a B::CV object so I can get at meta-data about the
        # function.
        my $cv   = B::svref_2object( $code );
 
        # Get the path to the compilation file. Will often be a path like
        # '/usr/share/perl/5.10/Test/More.pm'.
        #
        # To visually inspect other meta-data possibly available about this function:
        #
        #     use Devel::Peek;
        #     Dump( $code );
        #
        my $src_file = $cv->FILE;
 
        # Select only functions coming from files named like
        # .../Test/More.pm.
        next SYMBOL if $src_file !~ m{/Test/More\.pm$};
 
        # Propagate the old function's prototype to the new function.
        my $prototype = prototype $code;
        my $new_prototype =
            defined $prototype
            ? "($prototype)"
            : '';
 
        # Generate the new function and replace the old function. The new function
        # has access to the original via the closed-over $old variable.
        my $src = qq{
            no warnings 'redefine';
            sub $symbol $new_prototype {
 
                # Call the original function and get the test pass/fail.
                my \$result = \$code->( \@_ );
 
                # Debugger single-stepping mode if the test failed.
                if ( ! \$result ) {
                    \$DB::single = 1;
                }
 
                # Return the test/failure.
                return \$result;
            }
 
            # Compilation of the new function succeeded?
            1;
        };
        eval $src
            or die $@;
    }
}

Friday January 08, 2010
08:30 PM

Attaching to a process and calling a kernel function

I figured out last night that I could use allocate memory within another process for the purposes of using more complicated APIs. In the below example, I attached to one of my shells and queried the groups out of it.

In this snippet, the target shell tells me its process id is 40728.

Last login: Fri Jan  8 16:51:00 on ttys000
w3m211a:~ jjore$ echo $$
40728

I'm calling the (int)getgroups( int, int* ) kernel function. I start by getting the size of an integer because I'm going to allocate some multiple of this.

w3m211a:~ jjore$ gdb -p 40728
 
(gdb) print sizeof(int)
$1 = 4

Call (int)getgroups(int,int*) once to get the number of groups I'm a member of.

(gdb) print (int)getgroups(0,0)
$2 = 16

Allocate 64 bytes of memory to hold my groups[16] array.

(gdb) print (void*)malloc(64)
$3 = (void *) 0x10010fca0

Call (int)getgroups(int,int*) again but this time the kernel will write to my memory with the group ids.

(gdb) print (int)getgroups($2,$3)
$4 = 16

Print the 16 decimal numbers out of memory

(gdb) x/16d $3
0x10010fca0:    2042662593    401    1612991420    1178352327
0x10010fcb0:    102    800006232    204    100
0x10010fcc0:    98    81    80    79
0x10010fcd0:    62    12    155049148    2133303823

Clean up

(gdb) print (void)free($3)
$5 = void

Thursday January 07, 2010
12:43 AM

Unicode URLs, wtf?

Hey internet, ⠸⠙⠱ ⠝⠉⠁⠈ ⠅⠝⠁⠕⠕⠉⠃ ⠝⠆⠏⠍⠞?

A year or more ago I was fixing work's web site to handle Unicode as entered by users into fields. We don't use CGI.pm because....? Well ok, we just don't. It also doesn't handle Unicode properly either. Or at least almost no version. Huh.

If a user types "Coatıcook" you'll probably get the dotless "i" character as either %C4%B1 or %u131 but CGI.pm as supplied by perl almost most of the time won't do something reasonable.

  • not ok 5.11.3 CGI-3.48
  • not ok 5.10.1 CGI-3.43
  • ok 5.10.0 CGI-3.29
  • not ok 5.8.9 CGI-3.42
  • not ok 5.6.2 CGI-2.752

Wut?

for v in 5.11.3 5.10.1 5.10.0 5.8.9 5.6.2;do
  /opt/perl-$v-64-thr-dbg/bin/perl\
    -le '
      use CGI;
 
      my $input  = "a=%u2021";
      my $expect = "\x{2021}";
      my $got = CGI->new( $input )->param( "a" );
 
      print $expect eq $got
        ? "ok $] $CGI::VERSION"
        : "not ok $] $CGI::VERSION"
    ';
done

Wednesday January 06, 2010
10:57 PM

Decoding multiple encoded utf-8 in perl or ruby

I'd recently encountered some data that had been re-encoded five times. Ugh. The key was to guess that if a character 0xC0-0xFF is followed by 0x80-xBF, it's likely that the bytes are actually utf-8. What follows is a function which guessed a reasonable way to deal with the data and turn it into the right utf-8.

use Test::More tests => 1;
use Encode ();
 
my $bad = "\xc3\x83\xc2\x83\xc3\x82\xc2\x83\xc3\x83\xc2\x82\xc3\x82\xc2\x83\xc3\x83\xc2\x8 3\xc3\x82\xc2\x82\xc3\x83\xc2\x82\xc3\x82\xc2\xa9";
my $good = "\xe9";
is( multiple_decode( $bad ), $good );
 
sub multiple_decode {
  my ( $str ) = @_;
 
  Encode::_utf8_on( $str );
  while ( $str =~ /[\xc0-\xff][\x80-\xbf]/ ) {
    utf8::downgrade( $str );
    Encode::_utf8_on( $str );
  }
 
  return $str;
}

I tried doing this in Ruby because I actually needed this for an EventMachine (http://rubyeventmachine.com/) server but never quite got it working. Iconv seemed to want to be strict about rejecting the originally ostensibly invalid input.

# Iconv::IllegalSequence: "\303\203\302\203\303\202\302\203\303\203\302\202\303\202\302\203"...
 
require 'test/unit'
require 'iconv'
 
class MDecode < Test::Unit::TestCase
  def test_multiple_decode
    conv = Iconv.new( 'UTF-8', 'ASCII' )
 
    bad = "\xc3\x83\xc2\x83\xc3\x82\xc2\x83\xc3\x83\xc2\x82\xc3\x82\xc2\x83\xc3\x83\xc2\x8 3\xc3\x82\xc2\x82\xc3\x83\xc2\x82\xc3\x82\xc2\xa9"
    good = "\xe9"
 
    assert_equal( good, conv.multiple_decode( bad ) )
  end
end
 
class Iconv
  def multiple_decode( str )
    while str =~ /[\xc0-\xff][\x80-\xbf]/
      str = iconv( str )
    end
 
    return str
  end
end

Tuesday December 29, 2009
04:49 PM

Building a lot of perl today...

#!/bin/bash
 
set -x
set -e
perl="$HOME/src/perl"
 
function build-it () {
  bversion=$1
  shift 1
 
  dir=/opt/$bversion
  tar=/opt/$bversion.tar.bz2
 
  if [[ ! -e $tar ]]; then
    echo "Clobbering $dir"
 
    rm -rf $dir
    mkdir $dir
 
    cd $HOME/src/perl
    echo "Configure "$(date) > $dir/stamp.log
    ./Configure -des -Dcc='ccache gcc' -Dprefix=$dir $* 2>&1 | tee $dir/config.log
    echo "make "$(date) >> $dir/stamp.log
    make 2>&1 | tee $dir/make.log
    echo "make test "$(date) >> $dir/stamp.log
    make test 2>&1 | tee $dir/test.log
    echo "make install "$(date) >> $dir/stamp.log
    make install 2>&1 | tee $dir/install.log
    echo "End "$(date) >> $dir/stamp.log
    chown -R jbenjore $dir/
 
    cd /
    tar cjf $tar $dir
 
    cd $HOME/src/perl
    git clean -xdf
    git reset --hard
  fi
}
 
git clean -xdf
git reset --hard
 
for tag in\
    v5.11.3\
    perl-5.10.1 perl-5.10.1\
    perl-5.8.9 perl-5.8.8 perl-5.8.7 perl-5.8.6 perl-5.8.5 perl-5.8.4 perl-5.8.3 perl-5.8.2 perl-5.8.2 perl-5.8.1 perl-5.8.0
  do
  git checkout $tag
 
  build-it $tag-64-dbg      -DDEBUGGING -Duse64bitint
  build-it $tag-thr-dbg     -DDEBUGGING -Dusethreads
  build-it $tag-64-thr-dbg  -DDEBUGGING -Duse64bitint -Dusethreads
  build-it $tag-dbg         -DDEBUGGING
  build-it $tag             -DDEBUGGING=-g
done
 
for tag in\
    perl-5.6.2 perl-5.005_04
  do
  git checkout $tag
 
  build-it $tag-64-dbg      -DDEBUGGING -Duse64bitint
  build-it $tag-thr-dbg     -DDEBUGGING -Dusethreads
  build-it $tag-64-thr-dbg  -DDEBUGGING -Duse64bitint -Dusethreads
  build-it $tag-dbg         -DDEBUGGING
  build-it $tag             -g
done

Sunday December 06, 2009
06:11 PM

Bugging myself

I've been seriously neglecting my CPAN code. It's been pretty overwhelming having too many modules up there and no time to really get into anything. I figured it was worth at least an hour and a half's time to read the CPAN tester reports.

I just filed 34 bugs based purely on CPAN testers output. Bleagh. :-(

Acme-Anything

Acme-Nothing

Alien-Judy

App-Perldoc-Search

B-Deobfuscate

B-Generate

B-Lint

B-Utils

Devel-Dt

Devel-OptreeDiff

Devel-StackBlech

ESPPlus-Storage

Enbugger

Geo-TigerLine-Abbreviations

Judy

Regexp-NamedCaptures

Runops-Trace

Term-HiliteDiff

UNIVERSAL-ref

overload-eval

perl-lint-mode