Try out this test program in a Perl prior to 5.8.8:
use Test::More tests => 3;
my $line = "\x{4E00}();" . ' ';
is(length substr($line, 1, 1), 1);
is(length substr($line, 1, 4), 4);
is(length substr($line, 1, 1), 1);
You'd expect that substrings of length 1 are always length 1, right? On my Mac (perl5.8.6) it produces:
1..3
ok 1
ok 2
not ok 3
# Failed test at utf8_substr.t line 5.
# got: '4'
# expected: '1'
# Looks like you failed 1 test of 3.
This should surprise you, unless perhaps you were aware of the UTF-8 length caching bug(s) that haunted much of the 5.8.x series.
This program above is a minimal reduction of a failure in the PPI test suite (see RT#35917 - charsets.t eats all available VM). This bug is only triggered in the following case:
We would probably have never noticed, except 5.8.6 is the default Perl for Mac OS X 10.4 (i.e., a popular point release) and a PPI side effect of the bug was a infinite loop with a memory leak.
I'm VERY grateful that the core Perl developers include people smart enough to find and fix subtle bugs in the Unicode implementation like this one.
Yahtzee (Score:1)
Re: (Score:1)
“SCSI is *not* magic. There are fundamental technical reasons why it is necessary to sacrifice a young goat to your SCSI chain now and then.” —John Woods