bart's Journal bart's use Perl Journal en-us use Perl; is Copyright 1998-2006, Chris Nandor. Stories, comments, journals, and other submissions posted on use Perl; are Copyright their respective owners. 2012-01-25T02:07:07+00:00 pudge Technology hourly 1 1970-01-01T00:00+00:00 bart's Journal Headsup on command line for shortcuts in Windows XP <p>For Strawberry Perl, and Padre, I use a custom entry in the Start Menu, which technically is a shortcut (*.LNK file). For example, for Padre the command line in the shortcut file was:</p><blockquote><div><p> <tt>C:\WINDOWS\system32\cmd.exe<nobr> <wbr></nobr>/c PATH=c:\strawberry\perl\bin;c:\strawberry\c\bin;%PATH% &amp;&amp; padre</tt></p></div> </blockquote><p>Likewise, the command line for my Strawberry shell was:</p><blockquote><div><p> <tt>C:\WINDOWS\system32\cmd.exe<nobr> <wbr></nobr>/k PATH=c:\strawberry\perl\bin;c:\strawberry\c\bin;%PATH%</tt></p></div> </blockquote><p>Overnight, these both stopped working.</p><p>After a bit of puzzling, I figured out that a new program had been installed by Windows Update, and this had added a new directory to PATH. As a result, after the environment variable was substituted with its real value, the length of the expanded command line was now longer than 256 bytes, and now PATH got truncated.</p><p>Remember, folks:</p><blockquote><div><p> <b>The length of the command line for a shortcut, after expansion, should never be longer than 256 bytes.</b></p></div> </blockquote><p>If you put the code for modification of PATH in a *.BAT file, no such restriction applies.</p><p>So now, my shortcut to start the command shell is:</p><blockquote><div><p> <tt>C:\WINDOWS\system32\cmd.exe<nobr> <wbr></nobr>/k c:\strawberry\strawberry_path.bat</tt></p></div> </blockquote><p>where c:\strawberry\strawberry_path.bat contains the lines</p><blockquote><div><p> <tt>@echo off<br>PATH=C:\strawberry\c\bin;C:\strawberry\perl\bin;%PATH%</tt></p></div> </blockquote><p>which makes it easier for me to add more Perl based tools that depend on these entries in PATH: I now have a central location for the definition, if I ever need to add or modify a directory.</p> bart 2010-03-02T14:26:35+00:00 journal Installing Wx on ActivePerl 5.8.9 <p>Two days ago, after successfully installing <a href="">Alien::wxWidgets</a>, <a href="">Wx</a> and <a href="">Wx::Demo</a> on Strawberry Perl, with a bit of trouble and a lot oit f time, I was curious to see how <a href="">Wx::Demo</a> works on ActivePerl, and if it shows the same screwed up result in the wxComboCtrl demo. (For the impatient: it does.)</p><p>ActivePerl has <code>PPM</code>, right? So this should be a piece of cake. Let me see... Uh, nothing. Ooh yes, I forgot to add repositories. I'll add my favorites: TCool, Bribes, Trouchelle, UWinnipeg.</p><p>Good, now both <a href="">Alien::wxWidgets</a> and <a href="">Wx</a> appear. Smooth, in less than a few minutes, both are installed. Now to check, run</p><p> <code>perl -MWx -le "print Wx-&gt;VERSION" </code> </p><p>Uh oh... I get a Windows dialog box telling me some DLL (I forgot its exact name, something with "custom" in its name) can't be found, and the above command line just produces a syntax error, saying the module can't be loaded.</p><p>What next? Well, ActivePerl now supports the MinGW compiler, and I've got one installed as it came with the Strawberry distro... just add "<code>c:\strawberry\c\bin</code>" back to <code>PATH</code>, and I'm good to go.</p><p>So, uninstall Wx again, and try to install it with <code>CPAN</code>. Wait a minute, now it says I don't have a (usable) <a href="">Alien::wxWidgets</a>? But I installed 0.45 with PPM? Aargh! So if you install <a href="">Alien::wxWidgets</a> with PPM, you <em>can't use</em> it to install <a href="">Wx</a> with <code>CPAN</code>, and that's the main purpose of that module! Right, uninstall that too, and install it with <code>CPAN</code>, too.</p><p>I was not expecting a smooth ride with <code>CPAN</code>, and that's exactly what I've got. Installing in one go didn't work, obviously, so I broke it down into smaller tasks installing troublesome dependencies first. That did go smoothly, apart from one module that failed tests: <a href="">Module::Build</a>. I don't get why a module that is seen by many as the future of <code>CPAN</code>, can be so much trouble. I assumed it would work well enough, so I <code>force install</code>ed it.</p><p>Installing <a href="">Alien::wxWidgets</a> took... forever. I gave up waiting and went doing something else, even forgetting all about it. I was surprised to see, accidently stumbling back to this console window, that it was still actively chugging along. Anyway, it tested smoothly, and installed with no problems.</p><p>Installing <a href="">Wx</a> next, went a lot faster. In a matter of minutes it was installed. Well, sortof... </p><p>I got some trouble trying to open <a href="">Wx</a>. Even just "<code>look Wx</code>" tried to unpack the package, ending in the error message</p><blockquote><div><p> <tt>gzip: stdout: broken pipe</tt></p></div> </blockquote><p> <code>CPAN</code> barfing and refusing to open a shell.</p><p>I assume this implies that pipe between the external <code>gzip</code> and <code>tar</code> program are probably treated in text mode by mistake; maybe that my ports of these programs are broken. But replacing the <a href="">UnxUtils</a> programs with those from <a href="">GnuWin32</a> offers no improvement whatsoever.</p><p>Strawberry Perl didn't have that problem. Let me see what's in its "<code>o conf</code>" settings... a space?? In ActivePerl's <code>CPAN</code>, both are empty. How can you do that...</p><blockquote><div><p> <tt>o conf gzip ' '<br>o conf tar ' '</tt></p></div> </blockquote><p>That works. I don't know what it means, I assume it now will try to use Perl modules instead of external programs, but at least, it no longer produces a broken pipe. That's what matters.</p><p>And now: no more broken pipe!</p><p>There was a problem with <a href="">ExtUtils::XSpp</a>, which refused to be installed as a dependency, it looks like it didn't even try (??), so I had to install that manually, and retry.</p><p>Argh, <code>CPAN</code>, I hate you! If in the <code>CPAN</code> shell, some step in the build process fails, and you retry, it'll happily assume that step worked and go to the next step, and then croak. Exiting <code>CPAN</code> and relaunching it is often enough to really make it start over, but sometimes you may have to manually delete the built files.</p><p>Anyway, installing <a href="">Wx</a> finally worked. So did installing <a href="">Wx::Demo</a>. Running the demos shows that it all worked.</p><p>Conclusion: <strong>yes</strong>, you <em>can</em> install Wx on ActivePerl and the MinGW C compiler, but it'll take a lot of time, and some kicking of the (<code>CPAN</code>) engine in strategic spots.</p><p> Oh, and yes, the combo control shows the exact same weird behavior as it did in Strawberry Perl, here's a <a href="">screenshot</a> to show what I mean). I'm curious to hear if it works better on other platforms than Windows it does. I'm trying to decide if it's a bug in <a href="">Wx</a>, or in <a href="">Wx::Demo</a>, and if it's tied to the platform. </p><p>p.s. Just earlier today I found there's a PPM repository on WxPerl's own site. Aargh! Well, it was an interesting experience anyway. And now I've got Wx version 0.93, which is a bit more recent than the version on that repository (0.89)...<nobr> <wbr></nobr>:)</p> bart 2009-10-31T23:19:55+00:00 journal Strawberry Perl and the nightmare of installing Padre <p>Yesterday evening I felt like it was time for something new, and I decided to install <a href="">Strawberry Perl</a> and <a href="">Padre</a>. So I grabbed the MSI installer file from Strawberry Perl's website, and ran it. Soon enough it finished, and I felt simply lost. Was that it?</p><p>I glanced in the Start Menu, and I found a few links to docs, 2 to CPAN, and one to the help channel on IRC. I've never used IRC in my life, I don't even have IRC software, so that didn't feel welcoming. No welcome message, no introduction message "What now?", nothing. I'm an experienced Perl user, and as I felt lost, I can't imagine what kind of hell it must feel like to people who are new to Perl.</p><p>So I opened a command line window, and I typed "<code>perl -v</code>". Wow! At least that was something: apparently Strawberry Perl had put itself in my <code>PATH</code>, and in front of my default Perl install. (The reason for that is because my default perl is in the user environment variable PATH, and Strawberry put itself in the system environment variable PATH, and the latter comes in front of the former. It doesn't feel right to me, but at least, it's a annoyance caused by Windows.)</p><p>But I had, half and half, expected that Padre would have been there. It wasn't. So I went to the win32 Wiki (following the link in the Start Menu and initially ending up in the wrong place), and tried to find a "What now?" page. Still nothing. There's a (very incomplete) page on editors usable for Perl (I can name, off the top of my head, 3 free Windows text editors that aren't listed). I needed an editor, this is a fresh Windows install and there's no editor there yet apart from Notepad. </p><p>I decided to grab <a href="">Padre</a> anyway. So I dropped down to the command line window, and typed "<code>CPAN</code>", followed by "<code>install Padre</code>". That should go swiftly, shouldn't it? It didn't. I got all sorts of weird errors, most of them relating to "missing prerequisite", at least most due to a module that failed to install, due to a "missing file" error message. Say what??</p><p>So I decided to figure out <em>which</em> module failed to install, and installed, by hand, the first one that produced such an error: <a href="">IPC::Run</a>. Hmm, that installed without glitch. But "<code>install Padre</code>" still doesn't work. So, digging down, I decided to try "<code>install Wx</code>" first. Big mistake. After a very long compilation time, Wx appears to exist out of trillions of little C files, it still failed to install successfully. Again, it appears the reason is because of prerequisites that failed to install.</p><p>So I dug further down, installing module after module by hand: starting up with the huge ones like <a href="">PPI</a>, and ending up in really tiny ones, like <a href="">Class::Accessor</a>. Eventually, they all installed. </p><p>So what was the culprit? <em>Nothing</em>. They all installed. But I get the distinct impression <em> <a href="">CPAN</a> throws away part of the build before it even finishes</em>, and therefore, installing huge dependency trees fails. You have to split it up in smaller chunks and install them one at a time.</p><p>Seriously, I can't expect somebody new to perl and CPAN to get this far. From installing Strawberry Perl to finishing installing Padre through CPAN took <em>over 2 hours</em>.</p><p>So after everything installed fine, I sighed a sigh of relief, and typed "padre" at the command line. A spinning cursor for 1 second, and the command line prompt was back. That was it. No error, no error message. Nothing.</p><p>After a pause of more than an hour, I decided to tackle the problem, and started by the most likely culprit: <a href="">Wx</a>, which I had more or less forced to install. I dropped back to the CPAN command line and entered "<code>test Wx</code>". Again, some mysterious error message about a missing file (something like "can't copy file<nobr> <wbr></nobr>... to<nobr> <wbr></nobr>...: file does not exist). "<code>look Wx</code>", in order to unpack the distro, died with the same error. After trying a few times, it finally worked, as it seems <a href="">CPAN</a> had decided to start afresh. This reinforces the idea that <a href="">CPAN</a> <em>is</em> the culprit, apparently cleaning up before it even completes building. Eventually, after going through "<code>perl Makefile.PL</code>" and "<code>dmake</code>" manually, "<code>dmake test</code>"... succeeded? So, back in the CPAN shell, I installed it again. I dropped out of CPAN, hopefully typed "<code>padre</code>" and... still nothing.</p><p>So I tried one more module: <a href="">Wx::Demo</a>. (Seriously, guys, the docs of that demo module are seriously lacking. You run the demo through the script it installs: <a href=""></a>, but I really had to browse the repository to figure that out.) Anyway: the demos all worked. (One demo looks like shit: "WxComboCtrl", but that's a problem for another day.)</p><p>Back to "<code>padre</code>": still nothing. Okay, debugging time. "<code>perl -x -S padre.bat</code>". Huh, I get an editor window?? "<code>padre</code>"... It works??</p><p>It no longer fails. It all works.</p><p>Okay, the nightmare is over. It took me nearly 3 hours to get this far. This is not something you want to make everybody go through. And the fault is, most likely, in the module <a href="">CPAN</a>.</p><p> <small>p.s. This post was written using <a href="">Markdown</a> using <a href="">Showdown</a> and posted after conversion to HTML.</small> </p> bart 2009-10-28T08:56:09+00:00 journal Death of a newsgroup <p>When I downloaded the headers of the usenet groups I follow, this morning, I saw no new headers in comp.lang.perl.misc. <em>None at all??</em> </p><p>So I doublechecked. In the last 24 hours, there have been 3 (no, 4) automated FAQ postings, and 2 spam messages. That is all.</p> bart 2009-09-14T07:30:57+00:00 journal Bands I'd like to see in a summer music festival <p>If I were a festival promotor (but I'm not) these are the bands I'd like to see, because they did impressive things (more than 1 song) in the last year:</p><ul> <li> <a href="">School of Seven Bells</a> </li><li> <a href="">Friendly Fires</a> </li><li> <a href="">Fever Ray</a> </li><li> <a href="">Juana Molina</a> </li></ul><p>and for the dance tent:</p><ul> <li> <a href="">Pogo</a> </li><li> <a href="">Deadmau5</a> </li></ul><p>However, it doesn't look very likely to see them all (or even,, half of them) in a single event.</p> bart 2009-07-02T14:01:59+00:00 journal "avoid grep in boolean context" is premature optimization <p> <em>(Note: a first draft of this post has been lying on my shelf for over a year. Now, I finally got around to finishing it.)</em> </p><p>Now and then in Perl forums, the discussion comes up about how bad it is to use grep in boolean context. And now it's even been poured into a <a href="">Perl::Critic rule</a>, based on Damian Conway's PBP book, and the argument they bring up is always the same (see the POD for the module):</p><blockquote><div><p>Using <code>grep</code> in boolean context is a common idiom for checking if any elements in a list match a condition. This works because boolean context is a subset of scalar context, and grep returns the number of matches in scalar context. A non-zero number of matches means a match.</p><p>But consider the case of a long array where the first element is a match. Boolean <code>grep</code> still checks all of the rest of the elements needlessly. Instead, a better solution is to use the <code>any</code> function from <a href="">List::MoreUtils</a>, which short-circuits after the first successful match to save time.</p></div> </blockquote><p>Now, <em>why</em> would you expect the item you're looking for to be the first one in the array? I see no reason for such an assumption, at all. On average, you still have to look through about half of the array items, <em>if</em> the item is even in the array. If it isn't, you still have to look through all items, anyway.</p><p>So, depending on the likelihood that an item is in the array, you might save between 0% and 50% of execution time by leaving the loop early. Personally, I don't find that overly impressive... As O(n) is still O(n).</p><p>The implied assumption is that the overhead of <code>grep</code> and <code>any</code> is ignorable, or at least, that it is the same for both, somebody that nobody actually bothered to verify. Well, I bothered. I decided to benchmark <code>grep</code> vs. <code>any</code>.</p><p>The kind of code that I benchmarked is the simple common problem of testing for the presence of a string in an array, just like <code>IN</code> in (some dialects of) SQL, and <code>in_array</code> of PHP. Here are the prerequisites and the functions I tested:</p><blockquote><div><p> <tt>my @letters = 'A'<nobr> <wbr></nobr>.. 'Z';<br>my %letter; $letter{$_} = 1 foreach @letters;<br> <br># List::MoreUtils' any<br>any =&amp;gt; sub { my $x = any { $_ eq $search } @letters },<br># grep with expression<br>grepE =&amp;gt; sub { my $x = grep $_ eq $search, @letters },<br># grep with block<br>grepB =&amp;gt; sub { my $x = grep { $_ eq $search } @letters },<br># explicitly written code with foreach and last<br>foreach =&amp;gt; sub { my $x=0; $_ eq $search and $x=1, last foreach @letters; },<br># the expected overall winner: prefilled hash<br>hash =&amp;gt; sub { my $x = exists $letter{$search} },<br># hash on the fly, rebuilt on every test<br>temphash =&amp;gt; sub { my %h; @h{@letters}=(); my $x = exists $h{$search} },</tt></p></div> </blockquote><p>I searched for 'Z' (last item), 'M' (center item), 'C' (pretty up front), 'A' (first item), and 'banana' (not in the list). And here are the results (ActivePerl 5.8.8, Windows XP, 2.4GHz):</p><blockquote><div><p> <tt>Search for Z<br>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Rate&nbsp; &nbsp; &nbsp;Time&nbsp; &nbsp;any temphash foreach grepB grepE hash<br>any&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;50172/s&nbsp; 19.93&amp;#181;s&nbsp; &nbsp; --&nbsp; &nbsp; &nbsp;-60%&nbsp; &nbsp; -65%&nbsp; -69%&nbsp; -71% -98%<br>temphash&nbsp; &nbsp;125555/s&nbsp; 7.965&amp;#181;s&nbsp; 150%&nbsp; &nbsp; &nbsp; &nbsp;--&nbsp; &nbsp; -12%&nbsp; -22%&nbsp; -27% -96%<br>foreach&nbsp; &nbsp; 142037/s&nbsp; &nbsp;7.04&amp;#181;s&nbsp; 183%&nbsp; &nbsp; &nbsp; 13%&nbsp; &nbsp; &nbsp; --&nbsp; -11%&nbsp; -18% -95%<br>grepB&nbsp; &nbsp; &nbsp; 160313/s&nbsp; 6.238&amp;#181;s&nbsp; 220%&nbsp; &nbsp; &nbsp; 28%&nbsp; &nbsp; &nbsp;13%&nbsp; &nbsp; --&nbsp; &nbsp;-7% -94%<br>grepE&nbsp; &nbsp; &nbsp; 172645/s&nbsp; 5.792&amp;#181;s&nbsp; 244%&nbsp; &nbsp; &nbsp; 38%&nbsp; &nbsp; &nbsp;22%&nbsp; &nbsp; 8%&nbsp; &nbsp; -- -94%<br>hash&nbsp; &nbsp; &nbsp; 2828893/s 0.3535&amp;#181;s 5538%&nbsp; &nbsp; 2153%&nbsp; &nbsp;1892% 1665% 1539%&nbsp; &nbsp;--<br> <br>Search for M<br>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Rate&nbsp; &nbsp; &nbsp;Time&nbsp; &nbsp;any temphash grepB grepE foreach hash<br>any&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;65938/s&nbsp; 15.17&amp;#181;s&nbsp; &nbsp; --&nbsp; &nbsp; &nbsp;-47%&nbsp; -59%&nbsp; -62%&nbsp; &nbsp; -74% -98%<br>temphash&nbsp; &nbsp;124203/s&nbsp; 8.051&amp;#181;s&nbsp; &nbsp;88%&nbsp; &nbsp; &nbsp; &nbsp;--&nbsp; -22%&nbsp; -29%&nbsp; &nbsp; -51% -95%<br>grepB&nbsp; &nbsp; &nbsp; 160049/s&nbsp; 6.248&amp;#181;s&nbsp; 143%&nbsp; &nbsp; &nbsp; 29%&nbsp; &nbsp; --&nbsp; &nbsp;-8%&nbsp; &nbsp; -36% -94%<br>grepE&nbsp; &nbsp; &nbsp; 174201/s&nbsp; &nbsp;5.74&amp;#181;s&nbsp; 164%&nbsp; &nbsp; &nbsp; 40%&nbsp; &nbsp; 9%&nbsp; &nbsp; --&nbsp; &nbsp; -31% -94%<br>foreach&nbsp; &nbsp; 251362/s&nbsp; 3.978&amp;#181;s&nbsp; 281%&nbsp; &nbsp; &nbsp;102%&nbsp; &nbsp;57%&nbsp; &nbsp;44%&nbsp; &nbsp; &nbsp; -- -91%<br>hash&nbsp; &nbsp; &nbsp; 2733577/s 0.3658&amp;#181;s 4046%&nbsp; &nbsp; 2101% 1608% 1469%&nbsp; &nbsp; 988%&nbsp; &nbsp;--<br> <br>Search for C<br>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Rate&nbsp; &nbsp; &nbsp;Time&nbsp; &nbsp;any temphash grepB grepE foreach hash<br>any&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;87975/s&nbsp; 11.37&amp;#181;s&nbsp; &nbsp; --&nbsp; &nbsp; &nbsp;-29%&nbsp; -45%&nbsp; -50%&nbsp; &nbsp; -85% -97%<br>temphash&nbsp; &nbsp;124136/s&nbsp; 8.056&amp;#181;s&nbsp; &nbsp;41%&nbsp; &nbsp; &nbsp; &nbsp;--&nbsp; -22%&nbsp; -29%&nbsp; &nbsp; -78% -96%<br>grepB&nbsp; &nbsp; &nbsp; 158854/s&nbsp; 6.295&amp;#181;s&nbsp; &nbsp;81%&nbsp; &nbsp; &nbsp; 28%&nbsp; &nbsp; --&nbsp; &nbsp;-9%&nbsp; &nbsp; -72% -95%<br>grepE&nbsp; &nbsp; &nbsp; 175068/s&nbsp; 5.712&amp;#181;s&nbsp; &nbsp;99%&nbsp; &nbsp; &nbsp; 41%&nbsp; &nbsp;10%&nbsp; &nbsp; --&nbsp; &nbsp; -69% -94%<br>foreach&nbsp; &nbsp; 571092/s&nbsp; 1.751&amp;#181;s&nbsp; 549%&nbsp; &nbsp; &nbsp;360%&nbsp; 260%&nbsp; 226%&nbsp; &nbsp; &nbsp; -- -80%<br>hash&nbsp; &nbsp; &nbsp; 2926393/s 0.3417&amp;#181;s 3226%&nbsp; &nbsp; 2257% 1742% 1572%&nbsp; &nbsp; 412%&nbsp; &nbsp;--<br> <br>Search for A<br>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Rate&nbsp; &nbsp; &nbsp;Time&nbsp; &nbsp;any temphash grepB grepE foreach hash<br>any&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;94486/s&nbsp; 10.58&amp;#181;s&nbsp; &nbsp; --&nbsp; &nbsp; &nbsp;-23%&nbsp; -40%&nbsp; -47%&nbsp; &nbsp; -90% -96%<br>temphash&nbsp; &nbsp;123096/s&nbsp; 8.124&amp;#181;s&nbsp; &nbsp;30%&nbsp; &nbsp; &nbsp; &nbsp;--&nbsp; -22%&nbsp; -30%&nbsp; &nbsp; -87% -95%<br>grepB&nbsp; &nbsp; &nbsp; 158407/s&nbsp; 6.313&amp;#181;s&nbsp; &nbsp;68%&nbsp; &nbsp; &nbsp; 29%&nbsp; &nbsp; --&nbsp; -10%&nbsp; &nbsp; -84% -94%<br>grepE&nbsp; &nbsp; &nbsp; 176637/s&nbsp; 5.661&amp;#181;s&nbsp; &nbsp;87%&nbsp; &nbsp; &nbsp; 43%&nbsp; &nbsp;12%&nbsp; &nbsp; --&nbsp; &nbsp; -82% -93%<br>foreach&nbsp; &nbsp; 978762/s&nbsp; 1.022&amp;#181;s&nbsp; 936%&nbsp; &nbsp; &nbsp;695%&nbsp; 518%&nbsp; 454%&nbsp; &nbsp; &nbsp; -- -64%<br>hash&nbsp; &nbsp; &nbsp; 2687559/s 0.3721&amp;#181;s 2744%&nbsp; &nbsp; 2083% 1597% 1422%&nbsp; &nbsp; 175%&nbsp; &nbsp;--<br> <br>Search for banana<br>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Rate&nbsp; &nbsp; &nbsp;Time&nbsp; &nbsp;any temphash foreach grepB grepE hash<br>any&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;51867/s&nbsp; 19.28&amp;#181;s&nbsp; &nbsp; --&nbsp; &nbsp; &nbsp;-58%&nbsp; &nbsp; -62%&nbsp; -69%&nbsp; -72% -98%<br>temphash&nbsp; &nbsp;123717/s&nbsp; 8.083&amp;#181;s&nbsp; 139%&nbsp; &nbsp; &nbsp; &nbsp;--&nbsp; &nbsp; &nbsp;-9%&nbsp; -25%&nbsp; -34% -95%<br>foreach&nbsp; &nbsp; 136015/s&nbsp; 7.352&amp;#181;s&nbsp; 162%&nbsp; &nbsp; &nbsp; 10%&nbsp; &nbsp; &nbsp; --&nbsp; -18%&nbsp; -28% -95%<br>grepB&nbsp; &nbsp; &nbsp; 165077/s&nbsp; 6.058&amp;#181;s&nbsp; 218%&nbsp; &nbsp; &nbsp; 33%&nbsp; &nbsp; &nbsp;21%&nbsp; &nbsp; --&nbsp; -12% -94%<br>grepE&nbsp; &nbsp; &nbsp; 187691/s&nbsp; 5.328&amp;#181;s&nbsp; 262%&nbsp; &nbsp; &nbsp; 52%&nbsp; &nbsp; &nbsp;38%&nbsp; &nbsp;14%&nbsp; &nbsp; -- -93%<br>hash&nbsp; &nbsp; &nbsp; 2670255/s 0.3745&amp;#181;s 5048%&nbsp; &nbsp; 2058%&nbsp; &nbsp;1863% 1518% 1323%&nbsp; &nbsp;--</tt></p></div> </blockquote><p>Well, that is looking bad for <code>any</code>: it is the worst performer in all cases. In the average case ('M'), it is almost 3 times slower than <code>grep</code> (grep with expression is, as I expected, a bit better than grep with a block, as the latter has an overhead of entering/exiting a lexical block). Even <em>in its best case</em>, <code>any</code> is <em>still</em> almost twice as slow as <code>grep</code>. So much for saving. The manually written loop is, in the average case, about a third faster than <code>grep</code>. But, if the item isn't found, it is actually slower.</p><p>No surprise that, if you <em>really</em> want a high speed test, and you need to test against the same array often, it is best to prepare a hash and simply test if the item is in the hash.</p><p>What I found rather surprising, is that populating a hash and using it once ('temphash'), isn't such a bad performer.</p><p>For completeness' sake, here's the same benchmark on Strawberry Perl 5.10.</p><blockquote><div><p> <tt>Search for Z<br>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Rate&nbsp; &nbsp; &nbsp;Time&nbsp; &nbsp;any temphash foreach grepE grepB hash<br>any&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;46995/s&nbsp; 21.28&amp;#181;s&nbsp; &nbsp; --&nbsp; &nbsp; &nbsp;-60%&nbsp; &nbsp; -62%&nbsp; -65%&nbsp; -67% -98%<br>temphash&nbsp; &nbsp;117471/s&nbsp; 8.513&amp;#181;s&nbsp; 150%&nbsp; &nbsp; &nbsp; &nbsp;--&nbsp; &nbsp; &nbsp;-6%&nbsp; -13%&nbsp; -19% -96%<br>foreach&nbsp; &nbsp; 125313/s&nbsp; &nbsp;7.98&amp;#181;s&nbsp; 167%&nbsp; &nbsp; &nbsp; &nbsp;7%&nbsp; &nbsp; &nbsp; --&nbsp; &nbsp;-7%&nbsp; -13% -95%<br>grepE&nbsp; &nbsp; &nbsp; 135294/s&nbsp; 7.391&amp;#181;s&nbsp; 188%&nbsp; &nbsp; &nbsp; 15%&nbsp; &nbsp; &nbsp; 8%&nbsp; &nbsp; --&nbsp; &nbsp;-6% -95%<br>grepB&nbsp; &nbsp; &nbsp; 144589/s&nbsp; 6.916&amp;#181;s&nbsp; 208%&nbsp; &nbsp; &nbsp; 23%&nbsp; &nbsp; &nbsp;15%&nbsp; &nbsp; 7%&nbsp; &nbsp; -- -95%<br>hash&nbsp; &nbsp; &nbsp; 2740628/s 0.3649&amp;#181;s 5732%&nbsp; &nbsp; 2233%&nbsp; &nbsp;2087% 1926% 1795%&nbsp; &nbsp;--<br> <br>Search for M<br>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Rate&nbsp; &nbsp; &nbsp;Time&nbsp; &nbsp;any temphash grepE grepB foreach hash<br>any&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;63959/s&nbsp; 15.64&amp;#181;s&nbsp; &nbsp; --&nbsp; &nbsp; &nbsp;-46%&nbsp; -54%&nbsp; -55%&nbsp; &nbsp; -71% -97%<br>temphash&nbsp; &nbsp;118040/s&nbsp; 8.472&amp;#181;s&nbsp; &nbsp;85%&nbsp; &nbsp; &nbsp; &nbsp;--&nbsp; -15%&nbsp; -18%&nbsp; &nbsp; -46% -95%<br>grepE&nbsp; &nbsp; &nbsp; 138646/s&nbsp; 7.213&amp;#181;s&nbsp; 117%&nbsp; &nbsp; &nbsp; 17%&nbsp; &nbsp; --&nbsp; &nbsp;-3%&nbsp; &nbsp; -36% -94%<br>grepB&nbsp; &nbsp; &nbsp; 143388/s&nbsp; 6.974&amp;#181;s&nbsp; 124%&nbsp; &nbsp; &nbsp; 21%&nbsp; &nbsp; 3%&nbsp; &nbsp; --&nbsp; &nbsp; -34% -94%<br>foreach&nbsp; &nbsp; 218040/s&nbsp; 4.586&amp;#181;s&nbsp; 241%&nbsp; &nbsp; &nbsp; 85%&nbsp; &nbsp;57%&nbsp; &nbsp;52%&nbsp; &nbsp; &nbsp; -- -91%<br>hash&nbsp; &nbsp; &nbsp; 2516377/s 0.3974&amp;#181;s 3834%&nbsp; &nbsp; 2032% 1715% 1655%&nbsp; &nbsp;1054%&nbsp; &nbsp;--<br> <br>Search for C<br>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Rate&nbsp; &nbsp; &nbsp;Time&nbsp; &nbsp;any temphash grepE grepB foreach hash<br>any&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;86710/s&nbsp; 11.53&amp;#181;s&nbsp; &nbsp; --&nbsp; &nbsp; &nbsp;-25%&nbsp; -36%&nbsp; -39%&nbsp; &nbsp; -85% -96%<br>temphash&nbsp; &nbsp;115756/s&nbsp; 8.639&amp;#181;s&nbsp; &nbsp;33%&nbsp; &nbsp; &nbsp; &nbsp;--&nbsp; -15%&nbsp; -19%&nbsp; &nbsp; -79% -95%<br>grepE&nbsp; &nbsp; &nbsp; 135977/s&nbsp; 7.354&amp;#181;s&nbsp; &nbsp;57%&nbsp; &nbsp; &nbsp; 17%&nbsp; &nbsp; --&nbsp; &nbsp;-5%&nbsp; &nbsp; -76% -94%<br>grepB&nbsp; &nbsp; &nbsp; 142657/s&nbsp; &nbsp;7.01&amp;#181;s&nbsp; &nbsp;65%&nbsp; &nbsp; &nbsp; 23%&nbsp; &nbsp; 5%&nbsp; &nbsp; --&nbsp; &nbsp; -75% -94%<br>foreach&nbsp; &nbsp; 559639/s&nbsp; 1.787&amp;#181;s&nbsp; 545%&nbsp; &nbsp; &nbsp;383%&nbsp; 312%&nbsp; 292%&nbsp; &nbsp; &nbsp; -- -76%<br>hash&nbsp; &nbsp; &nbsp; 2315599/s 0.4319&amp;#181;s 2571%&nbsp; &nbsp; 1900% 1603% 1523%&nbsp; &nbsp; 314%&nbsp; &nbsp;--<br> <br>Search for A<br>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Rate&nbsp; &nbsp; &nbsp;Time&nbsp; &nbsp;any temphash grepE grepB foreach hash<br>any&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;85902/s&nbsp; 11.64&amp;#181;s&nbsp; &nbsp; --&nbsp; &nbsp; &nbsp;-26%&nbsp; -36%&nbsp; -41%&nbsp; &nbsp; -90% -96%<br>temphash&nbsp; &nbsp;115752/s&nbsp; 8.639&amp;#181;s&nbsp; &nbsp;35%&nbsp; &nbsp; &nbsp; &nbsp;--&nbsp; -14%&nbsp; -21%&nbsp; &nbsp; -87% -95%<br>grepE&nbsp; &nbsp; &nbsp; 134479/s&nbsp; 7.436&amp;#181;s&nbsp; &nbsp;57%&nbsp; &nbsp; &nbsp; 16%&nbsp; &nbsp; --&nbsp; &nbsp;-8%&nbsp; &nbsp; -85% -94%<br>grepB&nbsp; &nbsp; &nbsp; 146193/s&nbsp; &nbsp;6.84&amp;#181;s&nbsp; &nbsp;70%&nbsp; &nbsp; &nbsp; 26%&nbsp; &nbsp; 9%&nbsp; &nbsp; --&nbsp; &nbsp; -84% -94%<br>foreach&nbsp; &nbsp; 902933/s&nbsp; 1.108&amp;#181;s&nbsp; 951%&nbsp; &nbsp; &nbsp;680%&nbsp; 571%&nbsp; 518%&nbsp; &nbsp; &nbsp; -- -62%<br>hash&nbsp; &nbsp; &nbsp; 2405894/s 0.4156&amp;#181;s 2701%&nbsp; &nbsp; 1978% 1689% 1546%&nbsp; &nbsp; 166%&nbsp; &nbsp;--<br> <br>Search for banana<br>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Rate&nbsp; &nbsp; &nbsp;Time&nbsp; &nbsp;any temphash foreach grepE grepB hash<br>any&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;51088/s&nbsp; 19.57&amp;#181;s&nbsp; &nbsp; --&nbsp; &nbsp; &nbsp;-56%&nbsp; &nbsp; -61%&nbsp; -69%&nbsp; -71% -98%<br>temphash&nbsp; &nbsp;116528/s&nbsp; 8.582&amp;#181;s&nbsp; 128%&nbsp; &nbsp; &nbsp; &nbsp;--&nbsp; &nbsp; -12%&nbsp; -29%&nbsp; -33% -96%<br>foreach&nbsp; &nbsp; 132626/s&nbsp; &nbsp;7.54&amp;#181;s&nbsp; 160%&nbsp; &nbsp; &nbsp; 14%&nbsp; &nbsp; &nbsp; --&nbsp; -19%&nbsp; -24% -95%<br>grepE&nbsp; &nbsp; &nbsp; 164236/s&nbsp; 6.089&amp;#181;s&nbsp; 221%&nbsp; &nbsp; &nbsp; 41%&nbsp; &nbsp; &nbsp;24%&nbsp; &nbsp; --&nbsp; &nbsp;-6% -94%<br>grepB&nbsp; &nbsp; &nbsp; 175040/s&nbsp; 5.713&amp;#181;s&nbsp; 243%&nbsp; &nbsp; &nbsp; 50%&nbsp; &nbsp; &nbsp;32%&nbsp; &nbsp; 7%&nbsp; &nbsp; -- -93%<br>hash&nbsp; &nbsp; &nbsp; 2655952/s 0.3765&amp;#181;s 5099%&nbsp; &nbsp; 2179%&nbsp; &nbsp;1903% 1517% 1417%&nbsp; &nbsp;--</tt></p></div> </blockquote><p>As an aside: I've got the distinct impression that overall, Strawberry Perl 5.10 is slower than ActivePerl 5.10, by about 5-10%. The biggest surprise to me, however, is that <code>grep BLOCK</code> is no longer <em>faster</em> than <code>grep EXPRESSION</code>. So weird.</p><p>Now, the thing you would test for must not always be that simple. If it is very slow or otherwise expensive, you might still, and rightfully so, want to leave the testing loop early. But, although elegant as a solution, you should not blindly assume that <code>any</code> is the best solution for every such problem... Though I wish that <code>grep</code> would be made smarter, and that it either <em>knows</em> to leave the test loop early, or that you could manually do so, for example, with <code>last</code>.</p><p>p.s. The code to convert Rate to Time was kindly supplied by GrandFather on Perlmonks.</p><p>p.p.s. I'm sorry about the formatting problem, that "&amp;#181;" should look like "&#181;", but even though it is a "&#181;" character in the text I posted, the journaling system shows it as "&amp;#181;". It's a bug, I'm sure.</p> bart 2009-02-13T22:23:16+00:00 journal Why is such a pig? <p>Running the CPAN shell on ActivePerl 5.8.x on Windows XP, I can see in <a href="">Process Explorer</a> that the bare CPAN shell uses a whopping 195MB of RAM (size of working set: 120MB). I mean... wauw!!</p><p>That does not even include the programs, including perl, that are invoked from <code>make</code>, that actually test or install stuff. No, just the bare shell.</p><p>WTF is going on? Does the CPAN shell perhaps try to keep the whole module database in RAM? And if so: why? It's not as if looking up a distro by module name is <em>so</em> system critical, that it requires a sub-millisecond response time. So a slightly slower system, that greps through the data on file, would work just as well.</p><p>Fact is: memory consumption is 3 times lower before the metadata is first loaded. And it doesn't ever go down again... (As if it could.)</p><p>CPAN is now using several times more RAM than the average computer had, when first came out. I doubt it used that much RAM, in those days.</p><p>p.s. For some bizarre reason, Strawberry Perl 5.10.x uses "only" 120MB. I have no idea why.</p> bart 2009-02-11T21:23:43+00:00 journal From markdown to POD <p>The verbose and arcane syntax of POD always distracts me from what I want to write, whenever I write POD directly. I prefer <a href="">markdown</a>, which doesn't get in the way. </p><p>And with going from <a href="">markdown</a> through <a href="">html2pod</a>, I get a reasonable headstart. It works pretty well.</p><p>The one thing I commonly need in POD that markdown is lacking, is itemized bulleted lists, that in HTML you'd write with DL/DT/DD lists, and in POD you write as</p><blockquote><div><p> <tt>=over<br> <br>=item one<br> <br>This is item one.<br> <br>=item two<br> <br>This is item two.<br> <br>=back</tt></p></div> </blockquote><p>(Gah! POD is ghastly!)</p><p>In a reference manual, I need them a lot. </p><p>But, starting from a plain bulleted list, you can get the basic POD list syntax, and by just tweaking the generated POD a little (replacing the bullets with the item text), I get where I want.</p> bart 2009-02-10T04:22:26+00:00 journal I like Markdown <p>In the last few weeks I've taken up the habit of writing my posts and replies on as plain text, using <a href="">markdown</a> markup. I then run the markdown script from the CPAN module redirecting the output to an HTML file, preview in a browser, and finally copy/paste the HTML into the textarea, and submit.</p><p>You may know markdown as the markup system utilized by websites such as <a href="">Reddit</a> and <a href="">StackOverflow</a>.</p><p>The difference in effort in marking up prose text in markdown rather than in any other system allows (such as the allowed subset of plain HTML) would appear to be slight, yet it is quite relevant to me. Even my rather clumsy workflow is still preferable to me, than using the site's standard HTML markup.</p><p>It's a really nice thought that <a href="">markdown</a> <a href="">originated in Perl</a>, although it has been ported to Javascript, <a href="">complete with a live preview</a>, which is still primitive compared to the system used on <a href="">StackOverflow</a>.</p><p>And maybe it's an idea to modernize the comment system in <a href=""></a> a little, as more and more people apparently utter their dislike for the current system.</p><p>But likely they think there's still a lot more wrong with it, than just the markup system.<nobr> <wbr></nobr>:)</p> bart 2009-01-18T21:44:06+00:00 journal CPAN like it's 1995 <p>(title inspired by a <a href="">blog post</a>)</p><p> used to ask a bunch of questions the first time it is run. One of the questions is what CPAN mirrors to use.</p><p>Now it doesn't any more: it comes preconfigured. But that comes at a price: a lot of distros simply assume use of <a href=""></a> or of <a href=""></a> (while several perl ports use their own private CPAN mirror by default, such as <a href="">Strawberry Perl</a> for Windows, and, I thought, <a href="">ActiveState's ActivePerl</a>.</p><p>Is the idea of using CPAN mirrors simply outdated? Or, should the CPAN client be smarter, and figure out for itself which mirrors to use? The latter feels like overkill to me. It presumes inclusion of a geolocator module and database, like <a href="">Geo::IP</a> (the free version of that database is far more than sufficient for this purpose, so the license price is no objection). But having that module and database on every Perl installation, just to get a list of mirrors once, or maybe a few times, in the lifetime of a perl installation, really is far too much.</p><p>I can remember how <a href=""></a>, thanks to Tom Christiansen IIRC, used to have a built in redirector, where <em>it</em> figured out where in the world you are, and hence, which (single) mirror to use. But if that one mirror was offline, you were out of luck. It didn't check the status of the mirror, it just redirected you there.</p><p>If we still wish to use mirrors, why not drag CPAN into the age of webservices? (actually we're already late for that, as the age of webservices seems to have passed, already...<nobr> <wbr></nobr>:)) Set up a main page on a site, for example on <a href=""></a>, where can simply ask "Can you suggest me what mirrors to use?" (pun intended). Then only the central site needs to have this geolocation database, to check what part of the world the request comes from, and compose a list of preferable mirrors. The output could be as simple as a <code>text/plain</code> page with one URL of a mirror per line, returning maybe 5 or 10 URLs in total. Easy to generate, and dead easy to parse.</p><p>(Note: the order of mirrors that are close to each other in level of preference could be randomly shuffled for each request, to avoid that all users in one area all hammer the same mirror.)</p><p> can still be made a bit smarter, and for example, use <code>ping</code> to test responsiveness of the mirror, or, simpler still, time the fetch time of a page from the currently chosen mirror, and check if it's fast enough (depending on your internet connection; it should keep track of responsiveness of the mirrors, so it can compare them); and switch the order of mirrors, if that may, likely, seriously improve matters.</p> bart 2009-01-08T22:29:25+00:00 journal Fixing world writable files in tarball before upload to CPAN <p>Fairly recently, CPAN changed its policy regarding uploaded distributions: if the distribution contains world writable files and/or directories (I'm not entirely clear about its exact rules), then <a href="">CPAN won't index it</a>.</p><p>That is a problem that bites authors who create their distributions on Windows: as Windows doesn't know Unix file permissions, a typical <code>tar</code> on Windows will simply set all file modes to 0777. Well, duh!</p><p>Some people have reconsidered fixes, such as <a href="">Burak</a> who <a href="">claims</a> that if you exclude directories from explicitly mentioning them, when creating the tar file, that then the problem will not occur.</p><p>My idea instead would be to fix the stupid behaviour in <code>tar</code>.</p><p>A second best approach, for now, until it gets a definite solution, is to clean up the tarball you just created, going over every file and directory in it, and fix its file mode.</p><p>And that's what I did <a href="">here</a>. I've used <a href="">Archive::Tar</a>, which turned out to be slightly more problematic than I thought, but I seem to have gotten it to behave. One nasty problem is backward compatibility of the tar files: by default <a href="">Archive::Tar</a> strips the path away from the file name, and stuffs it in a nonstandard "prefix" field. I've seen tar archive tools fall over this. Setting <code>$Archive::Tar::DO_NOT_USE_PREFIX</code> to 1 stops this behaviour, and you get backward compatible tar files, as long as the full name of the entry (including relative path) is at most 100 Ascii characters long. I do not expect this to be a problem in a typical CPAN upload.</p><p> <a href="">Archive::Tar</a> keeps the entire archive in memory, which may pose a problem for huge tar files, but most likely not for any archive to be uploaded to CPAN.</p> bart 2008-12-22T00:38:08+00:00 journal The Perl Advent Calendar 2008 is up! Did anybody yet mention that the <a href="">Perl Advent Calendar 2008</a> is live? Take a look: one article a introducing a module that is not as well known as it deserves, per day, until Christmas. <p>Thanks to the hard work of belg4mit and several volunteers, with support from the group.</p> bart 2008-12-05T07:49:27+00:00 journal YAPC::EU first impressions I just came home from my first YAPC, in Copenhagen, a few days ago. <p>Overall impressions are good: I've had a very busy schedule, I've been to a talk for every single time slot, and there is not a talk I've been to that I considered a waste of time. So that is good. </p><p>To be honest: I didn't really expect catering, so that was a pleasant surprise, not in the least because Denmark is so expensive. What was even a better surprise is that the quality of the food was good. It was a lot better than anything I've ever been offered to eat on an airplane, for example. </p><p>I'm not going to discuss "the incident", not even while I was in the middle of it, because I had already largely forgotten about it. It's not that important. </p><p>But there still is something that irks me. I feel that some people who have been to YAPC more than once, use it to put themselves in the spotlight. I'm not going to mention names, I'm not even going to say about how many people I am talking, because I do not want this to degenerate into a mindless, meritless flamewar, plus, I am not the one who is going to draw the line of what is or is not acceptable. You have to think about that for yourself. </p><p>In short, I do think that not always the same people should be in the center of attention. Especially people of whom I think they don't really deserve it. I think it's time to put a stop to the ego-tripping. </p><p>So, for next time... Please don't always let the same people present the show. And don't let all the talks be given by all the same people every year, if they have nothing new to say. It is time for fresh blood.</p> bart 2008-08-19T16:47:14+00:00 journal Frustrations about Oracle <p>I've been a professional user of the Oracle database (now at 10g2) for almost 2 years now. It appears to me to be a very solid and fast database. Its query analysis tools to profile slow SQL queries are excellent. PL/SQL is a nice, "modern" programming language with good features, high speed, and a nice integration of SQL in between procedural programming statements.</p><p>Its price does not really bother me... because I don't have to pay for it. I do think that if a company can afford to pay a several professional programmers to work with Oracle full-time, they ought to be able to afford Oracle itself too...</p><p>And yet, there are a few things that I find rather frustrating... </p><p>Oracle comes with PL/SQL libraries for virtually everything that you can think of: fetching web pages over HTTP, sending mail over SMTP, processing XML with XSLT... But these libraries are not exactly bug free. </p><p>Take the XSLT processor as an example. I don't know what it is based on... (probably some open source project, as appears to often be the case with Oracle...<nobr> <wbr></nobr>:)). It claims to be XSLT 2.0 compatible, yet several basic functions (like <a href="">lower-case</a> ) are simply not working. <code>disable-output-escaping</code> does not work, and it always indents the output html tree, even when you try to turn it off. </p><p>Oracle provides a special way to store XML files, in what it calls "XMLDB". On the surface it's quite impressive... (which is its main purpose, that I can see...<nobr> <wbr></nobr>:)) You can access it through WebDAV, i.e. using a http URL in a Windows Explorer window, which supports drag and drop to manage the files in it. </p><p>And in XSLT, you can access the files in XMLDB using that http URL. But in these files, for example in a <code>document()</code> call, relative URLs to other files simply do not work. You <em>have</em> to use an absolute URL to link to files next to it. That is very impractical, as it requires that you modify the content of the files if you move the repository, or simply, if you add a set of files after you tested them with an XSLT processor on the command line... </p><p>What is quite unpleasant, is dealing with VARCHAR2 (strings up to 32k) vs. CLOB (any size) (and their counterparts for binary strings: RAW and BLOB). Those LOBs are a pain to work with: you have to use file-access-like library function calls to work with them, while for VARCHAR2, you can use simple straightforward string manipulation operators just like in other languages. LOBs are just too low level. </p><p>And what's <em>really</em> frustrating is that Oracle's PL/SQL libraries virtually all only work with VARCHAR2 and RAW. For example, if you are building a MIME mail message with attachments, there is a library to base64-encode your binary strings at your disposal.... but its parameters and return values are limited to 32k, so you can't use it for larger files than about 24k! </p><p>For any real world tasks that goes beyond a simple demo, every single developer has to begin by spending hours writing routines for simple housekeeping tasks, routines that IMO just ought to be part of a standard library. Sure, you <em>can</em> find example code for about any task on the internet, but it's often of dubious quality. </p><p>For example, look at the function <code>replaceClob()</code> on <a href=""></a> (which is, AFAIK, quite a respectable site). It implements an emulation of the core VARCHAR2 function <code>replace()</code>, but for a CLOB. It works by doing the replacements in one chunk at a time, and then simply concatenates the results. Thus, it will skip matching substrings that overlap the edges between chunks. That is quite a serious bug. This is a typical simple example. </p><p>So not only do developers have to waste hours of time on housekeeping tasks, the results of their efforts are commonly still of a rather poor quality I doubt that the code I wrote for it is so much better. That is a doubly unpleasant price, for whoever has to pay for it.</p> bart 2008-06-25T11:15:42+00:00 journal Images in Spreadsheet files <p>Earlier this week, somebody who shall remain anonymous, was wondering out loud on the Perlmonks Chatterbox why he can't put an image in a CSV file.</p><p>I was simply baffled.</p><p>How can anybody claim to be a (junior) programmer, and so totally lack any insight in why this is not possible, or not even imaginable?</p><p>I asked him if he even knew what a CSV file was. Yes, it's a text file containing the text to put in fields in a table. <em>Just the text.</em></p><p>It now makes me wonder if he knew what an image was, then.</p><p>It's things like that that really make me wonder if there can be <em>any</em> hope at all that anybody like that, may ever pick up enough technical insight and skills, to have any future as a programmer, at all.</p> bart 2008-06-14T00:02:03+00:00 journal Last-Modified and If-Modified-Since Recently I have been experimenting with the behavior of browsers (all on Windows XP) to the presence a Last-Modified header, in the HTTP reply on a web server, in the context of generating semi-static content. I found the response of Firefox 2 most intriguing. <p> It appears that Firefox doesn't even look at the contents of this header, it just stores it for later. You can put a nonsense string in the Last-Modified header (from the server to the browser), and the next time a browser tries to fetch the file, it'll send the <em> exact</em> same string back in a If-Modified-Since header (from the browser to the server). I used "The bananas in my cellar are still quite green" as a test value, which, I hope you agree, looks nothing like a date. And that is exactly what I got back. </p><p> As a result, it acts like a private cookie, but just for this one URL, not for the whole domain, and not even for the siblings of this URL on the same path. </p><p> I found Opera 9 apparently behaves the same. </p><p> Now, MSIE(7) and Safari are something else. MSIE <em>does</em> appear to look at the contents of this header, it simply drops it if it can't make a date out of it. The format it accepts is quite flexible, I sent it something that's close to ISO-formatted: 'YYYY-MM-DD HH24:MI:SS "GMT"', to put it in Oracle's date formatting terms (for example: '2008-06-05 12:34:56 GMT'). But what it sent back was not the same string, but a date string that is converted back to the http standard form: 'Dy, DD-Mon-YYYY HH24:MI:SS "GMT"' (for example: 'Thu, 05-Jun-2008 12:34:56 GMT')(which, BTW, doesn't make sense to me as a standard, looking at it from a date parsing point of view: it's too complex). </p><p> Safari takes this even one step further: if the header isn't a date in http standard form, then the header is simply dropped. It simply doesn't send an If- Modified-Since header on the next request. </p><p> But it is safe to say that if your date in the Last-Modified header is in http standard form, <em>you will get the exact same string back</em> in the If- Modified-Since header. No browser appears to change the value of the date. It doesn't matter if your clock is off, or you're in the wrong time zone... All that matters is that you'll get the exact same date string back as you sent out. </p><p> So, as a rule of thumb, how do I recommend using it? I am now feeling that you ought not try to convert the date back to seconds-since-epoch, or whatever internal format you may be using, and next compare it to the file modification date. Instead, you ought to convert the file modification date to a standard http date string, and compare this string to the If-Modified-Since header. If it's the exact same string, then you may safely send out a "304 Not Modified" header and not much else (no body). If it's a different string, then send the whole file, headers and body, again. </p><p> It doesn't matter if your clock is off. All that matters, is that you must be consistent, and <em>always</em> format the same date/time into the same string. And then, it'll just work. </p><p> Note that using this scheme, it'll also send out the body if the Last-Modified header is technically later than the date in the If-Modified-Since header. That' s not bad, instead, it's better: all too often I find that someone replaced a file on a webserver with a copy of an older file, and if you do check which date is the latest, then you will miss this change.</p> bart 2008-06-05T20:54:38+00:00 journal What I like/dislike about Perl Inspired by <a href="">jarich's post</a> (and thus indirectly by <a href="">brian_d_foy's request</a>) and by <a href="">a question on arrays on Perlmonks</a>, I've made a list of things I particularly like or dislike about Perl. Sometimes, items in both lists can be caused by the same things, so these things are the result of a compromise, and I don't think they can easily be improved. <p>Note that these are things from the top of my head, so it's likely that I've forgotten about some stuff that I normally have a huge axe to grind with. So, here goes: </p><p> <b>Likes about Perl:</b> </p><ul> <li>Regexps! Hashes! <br>Obviously, to me these were Perl's immediate selling points, 12 years ago. <br>But, obviously, since then, many other languages have copied both features.</li> <li>Simplicity/transparency of passing arguments to subs, using @_ <br>Data in <code>@_</code> is "passed by reference", it's assigning to local variables that makes it "pass by value"</li> <li>Flattening of lists: this makes <code>join(', ', @foo, @bar)</code> possible, and implementing a <code>min</code> function min that can be used both for <code>min(@foo)</code> and for <code>min($x, $y, z)</code></li> <li>scalar/list context</li> <li>sort function, compared to the mess in PHP (<code>array_multisort</code>) <br>Ease to implement Schwartzian Transform, or-cache etc.</li> <li>regular expressions as first class objects, as opposed to languages where a regexp is a string, until you use it as a regexp (i.e. most of them)</li> <li>interpolation (doublequotish strings)</li> <li>garbage collection, using reference counting</li> <li>general syntax: no obsession with uniformity of syntax, the "adapted to humans" ad hoc syntax of Perl generally works very well</li> <li>transparency of implementation, introspection. Extending/modifying Perl from within Perl, is generally quite easy... For example: Hook::LexWrap, Carp, Fatal, Memoize</li> <li>closures</li> <li> <code>map</code>, <code>grep</code></li> <li> <a href="">overload</a></li> <li>The fact that "use" of a module happens on the Perl level, for example import, but also <code>BEGIN</code> (<code>INIT</code>, <code>CHECK</code>)</li> <li> <code>AUTOLOAD</code> for possibly loading modules or generating subs on the fly</li> </ul><p> <b>Dislikes:</b> </p><ul> <li>nested subs and lexicals in the outer sub just don't work well together (DWIM? Are you kidding me?) <p>What I really want, is the way it works in Javascript: inner subs are not visible outside the outer sub, and lexical variables from the outer sub are visible/accessible/shared in the inner subs </p><p>The reason for how it works in Perl, is likely because a <code>BEGIN</code> block is a <code>sub</code> (you may even use the keyword "<code>sub</code>", though nobody does that), that's executed immediately and subs used in a <code>BEGIN</code> block must be considered as global subs. </p><p>As a result, this makes implementing a framework similar to mod_perl, where a plug-in looks like a Perl script but is loaded from file once and next can be called many times, unnecessarily hard. </p><p>Hate, hate, hate!</p></li> <li>lack of formal parameters</li> <li>complex data structures can be hard, confusion between array and array reference: <code>['a', 'b', 'c']</code> is called an "anonymous array" but actually it's an array reference The fact that Perl distinguishes between arrays and array references can be a good thing, but it has its disadvantages.</li> <li>lack of proper native support for aliasing, for scalars you can fake it with <code>for my $x ($y) {<nobr> <wbr></nobr>... }</code> and inside the block, <code>$x</code> is an alias to <code>$y</code>. There's no equivalent trick for aggregates</li> <li>Passing arrays/hashes by a reference (of course) results in the need in the sub to access the hash in the sub through the reference. The above points make that not easy to remedy.</li> <li>lack of clean way to interpolate functions/method calls in doublequotish strings, @{[...]} is a hack</li> <li>constants, which are argument-less subs, and thus, you can't interpolate them in strings</li> <li>need for $, @, % for variables in not-interpolating context (syntax pollution)</li> <li>local doesn't work with lexical scalars -- it does work on individual items in arrays/hashes, even lexical ones.</li> <li>lack of ordering in hashes, the implicit "same order as insertion order" in PHP/Javascript, and as implemented in the modules Tie::IxHash/Tie::Hash::Indexed, works very well for me.</li> <li>for OO, lack of proper instance variables. Access to attributes is low level and look ugly like <code>$obj-&gt;{x}</code>. Direct access to with <code>$x</code> instead of <code>$obj-&gt;{x}</code> would be most welcome, even if only as syntactic sugar.</li> <li>hard to parse syntax with tools If you can't modify Perl from within, it's virtually impossible to change it on the source level. Source filters are generally considered a poor idea, because the chance of getting it wrong, is huge.</li> <li>lack of embedding of custom "small languages", like SQL. For example, for DBI, I'd really prefer it if the syntax of embedded SQL could be checked at compile time (maybe assuming a broad SQL syntax, even if this particular database doesn't like it). This is not practically feasible because of the previous point (hard to parse syntax / source filters bad).</li> <li>Exceptions are a hack. "eval BLOCK" is a terrible name, only used because "eval STRING" also catches errors... so it's a historical choice, not functional It should have something like try/catch, or even the on "<code>error goto ERRLABEL</code>" from VB.</li> <li> <code>$SIG{__DIE__}</code> is called in <code>eval</code></li> <li>no functional/chaining versions of <code>s///</code> and <code>tr///</code> (as in Javascript)</li> </ul><p>Hmm, and that off the top of my head... It has become quite a long list, actually. </p><p> <b>update</b> I <em>knew</em> I had to forget someting. So, without delay, the addendum: </p><p> <b>Pro:</b> </p><ul> <li>General execution speed (apart from a slight delay at startup)</li> </ul><p> <b>Contra:</b> </p><ul> <li>Memory footprint, even for tiny scripts: at least several megabytes. It's enough to have me often convert often run short scripts into another language with a much smaller footprint.</li> </ul> bart 2008-05-10T06:39:09+00:00 journal LOLcat <p>I don't follow what's going on in the LOLcat world closely, but when I recently saw this entry, it really made me laugh. It's both so disrespectful and so... ordinary, at the same time.</p><p>Enjoy.</p><p><a href="">All he ever wanted...</a></p> bart 2008-05-08T19:40:59+00:00 journal More silliness The author used the regex snippet<nobr> <wbr></nobr><code>/(?!\n)\Z/</code> 17 times in the main source, instead of the simpler and equivalent<nobr> <wbr></nobr><code>/\z/</code>. <p>From <a href="">perlre</a>:</p><blockquote><div><dl> <dt>\Z</dt><dd>Match only at end of string, or before newline at the end</dd> <dt>\z</dt><dd>Match only at end of string</dd> </dl></div> </blockquote><p>Duh? In a core module? Doesn't anybody but the maintainers ever <em>check</em> what goes into a core module?</p> bart 2008-03-06T16:56:12+00:00 journal weirdness <p>The latest official release of, version 1.9205, contains a null byte in its source file.</p><p>I am somewhat surprised that perl doesn't trip over it, as there have been far more innocuous things that have made it stumble in the past, like line endings of another platform (Mac/Unix).</p><p>In case you're wondering: it's in sub CPAN::Shell::recent, at the start of the line with contents (that appear in the source only once):</p><blockquote><div><p> <tt>$desc =~ s/.+? -<nobr> <wbr></nobr>//;</tt></p></div> </blockquote><p>p.s. It still is there in the latest developer release (1.92_57).</p> bart 2008-03-04T20:13:10+00:00 journal The end of a meme? <p>Oh no! According to <a href="">this news article</a>, Duke Nukem Forver <em>will</em> be released at the end of this year. Will this be this the end of a meme? DNF is the <a href="">prototypical example of eternal vaporware</a>. Computer geeks love to make fun of it.</p><p>But, we're not there yet. It might still turn out right. Er, wrong. If not... we'll have to find something else to make fun of.</p> bart 2008-02-07T11:44:20+00:00 journal The pain of updating Perl <p>A few days I decided to upgrade ActivePerl on my laptop. Not the major upgrade to 5.10.0, not yet, I just wanted to have the new GUI version of PPM, just like I already had on my other computer. It's just a minor upgrade between builds of perl 5.8.8, from build 817 to build 822. That should be relatively painless... Not so. </p><p>Well, despite the fact that XS modules are binary compatible, the new build refuses to install on top of the older build. That means I'll have to uninstall perl, install the new version, and reinstall every module I had added. Ouch. </p><p>I remember having taken a Bundle snapshot with over a year ago, and it wasn't pretty: installing that bundle resulted in wanting to reinstall core modules. I didn't want to live through that again, besides, this being Windows, installing through CPAN would probably not be trivial for some modules. So this time, I was going to try to use PPM, and, preferably, automate it. </p><p>It's easy to get a list of modules installed with PPM, complete with version numbers into a file, with <code>ppm query *</code> or (is this new?) <code>ppm list</code>. (Oh, fun, apparently the output format has changed.). </p><p>But after that, I'm stuck. How the hell do you use that list to install those packages automatically? I'm stumped. I want to: </p><ul> <li>install modules I don't have yet, and</li> <li>upgrade modules that are out of date.</li> </ul><p>Simple enough. But it looks like having PPM just do that by feeding it that list, simply isn't in the list of supported features. </p><p>So I ended up installing most of these modules by hand, list in hand. Well, I tried. It turned out some of the modules were still not properly installed. For example, Crypt::SSLeay was missing its DLL, and Win32::API just didn't work. </p><p>So now, days later, I'm still stuck with an incomplete set of reinstalled, and possibly broken, modules. I now just have to install additional modules when I find some script is broken. Oh, joy. </p><p>And then, there are still some modules (WWW::Mechanize and HTML::TokeParser::Simple) of which the API had changed, so, with freshly installed (and upgraded) modules, my scripts just didn't work any more. I've had to figure out what changed, and modify the script. Not fun. </p><p>I'm not looking forward to upgrading to 5.10. </p><p>p.s. I have some vague plans, if necessary, to write a shell script, controlling ppm through the command line, to install or upgrade the whole list.</p> bart 2008-01-30T00:01:59+00:00 journal Badmouthing Perl Today, when browsing through the popular sites of the day, I found several sites where the author found it necessary to sneer at Perl, where it wasn't even the subject of the post. And I wasn't even searching for it. Is this the new custom? <ol> <li> <a href=""> GTK Hello World in Six Different Languages</a> <p>This is actually the least worrisome of the lot:</p><blockquote><div><p>Although I find writing Perl to be painful for everything but processing text files in a terminal, I found the Perl GTK bindings to be relatively straightforward.</p></div></blockquote></li> <li> <a href=""> "If you don't know how compilers work, then you don't know how computers work"</a> <p> Steve Yegge writes:</p><blockquote><div><p>You discover that jsdoc is a miserable sod of a Perl script that seg faults on about 50% of your code base, and bear with me here you've vowed never to write another line of Perl, because, well, it's Perl. Pick your favorite reason.</p></div></blockquote></li> <li> <a href=""> Can Dynamic Languages Scale?</a> <p>This is by far the worst of the lot, gratuitous Perl bashing:</p><blockquote><div><p>It's as Marx said, lo these many years ago: "From each language, according to its abilities, to each project, according to its needs." </p><p>Oh, except Perl. Perl just sucks, period.<nobr> <wbr></nobr>:-)</p></div> </blockquote></li> </ol><p> These people don't appear to even know Perl, or at least, don't appear to know it well enough. </p><p>Just stop it, please. It's not funny.</p> bart 2008-01-24T22:52:46+00:00 journal Video game skills <p>To relax, I occasionally play a little video game like Teagame's <a href="">TG Motocross 3</a>. I currently don't have any trouble finishing the game until I start to do stunts to gain more points.</p><p>My son, who is 7, tried it out, and he doesn't succeed staying on the bike for longer than about 10 seconds. It's totally unplayable to him.</p><p>I tried my hand at <a href="">Fancy Pants Adventures</a>, but this game is unplayable to me. I barely succeed to stay alive for more than 15 seconds. On the other hand, he has no trouble with that game at all, and within the first session (of a few hours, I admit), he succeeded in finishing the complete game more than once.</p><p>Different people can be more skilled in one game than in another, while it's the other way around for other people, even when they had never played these games before, and superficially, the games appear to require a similar a skillset. It just goes to show that being good at video games is not a simple one-dimensional skill, one that you could just get a single score at. Neither is the difficulty level of such a game. Am I better at video games than him, or is it the reverse? Actually it's neither. It just is not that cut and dried.</p><p>n.b. if you like Fancy Pants (and apparently a lot of peopel do), note that the second game is out: <a href="">Fancy Pants Adventures World 2</a></p><p>Another game that we recently discovered, and which my son lately really is addicted to, is <a href="">Falling Sand Game</a> (requires Java). It has no aim at all but to enjoy the experience. There's no score, and it doesn't end.</p> bart 2008-01-20T21:07:07+00:00 journal comment on MJD's Clubbing someone to death with a loaded Uzi <p>Mark Jason Dominus is a Perl hacker and book author with a blog. In one of his latest entries,<br><a href="">Clubbing someone to death with a loaded Uzi</a>, he rather harshly critiques other people's (beginners) code. But whenever you do that, you should make sure your replacement code isn't dodgy itself.</p><p>There is no way to let people comment on the blog, so I'm posting my remarks here.</p><p>He writes:</p><blockquote><div><p>It could have been written like this:</p><blockquote><div><p> <tt>printf FILE "$LOCATION{$location}\,";<br>printf FILE "%4s", "$min3\,";<br>printf FILE "%4s", "$max3\,";<br>printf FILE "%1s", "$wx3\n";</tt></p></div> </blockquote></div> </blockquote><p>Eww. There's a few red flags in there:</p><ul> <li>Don't use <code>printf</code> where your <em>data</em> is used as a <em>template</em>. Granted, in this example, the data comes from a hash that is initialized with literal data stored in the script, but projects tend to evolve, and data just is not a template. If ever the data would contain a "%" sign, this code will blow up.</li><li>Why leave in first joining variables with other characters (such as commas), and <em>then</em> format the result with <code>printf</code>? Put the comma in the template.</li><li>There's no need for this to be broken up into 4 statements.</li></ul><p>In summary: this code could have become:</p><blockquote><div><p> <tt>printf FILE "%s,%3s,%3s,%1s\n", $LOCATION{$location}, $min3, $max, $wx3;</tt></p></div> </blockquote><p>which is quite a bit shorter, and cleaner too, IMO.</p><p>But there are still more things wrong with the original code that he didn't discuss. For example:</p><blockquote><div><p> <tt>foreach $location_name (%LOCATION ) {<br>&nbsp; &nbsp; $location_code = $LOCATION{$location_name};<br>&nbsp; &nbsp;<nobr> <wbr></nobr>...</tt></p></div> </blockquote><p>H*ll, if you want to loop over the keys of a hash, at least don't loop over both the keys and the values! Granted, in MJD's replacement code, the loop is gone, so this problem has disappeared too, but this is an major mistake that shouldn't just be skipped over.</p><blockquote><div><p> <tt>foreach $location_name (keys %LOCATION ) {<br>&nbsp; &nbsp;<nobr> <wbr></nobr>...</tt></p></div> </blockquote><p>Otherwise, if ever one of your hash values is also used as a key in this hash, you'll get noise output.</p><p>I am guessing that one reason why this person bothers to loop over a hash trying to find a particular hash item, may be to avoid outputting anything in case the item isn't in the hash. MJD just drops this, and outputs a line <em>anyway</em>. So I'm adding that conditional back in:</p><blockquote><div><p> <tt>printf FILE "%s,%3s,%3s,%1s\n", $LOCATION{$location}, $min3, $max, $wx3<br>&nbsp; if exists $LOCATION{$location};</tt></p></div> </blockquote><p>There. Comments? Anything that <em>I</em> overlooked?</p> bart 2008-01-10T22:51:42+00:00 journal tinyperl and perlbin (perltobin) (and PAR) <p>For my job I need to do the same few repetitive actions on about 35 Windows PCs. It's a kind of work that's quite tricky to do using the normal Windows GUI, but that is quite easy to script in Perl. There's still one problem: none of all these PCs has Perl, and I'm not going to install it for a simple, one time task.</p><p>So I was thinking of building a binary package from the script. I've always been quite impressed with <a href="">Graciliano M. P.</a>'s work on <a href="">TinyPerl</a>. So I decided to try to use it for this task.</p><p>I found it works pretty well, though there are a few warts:</p><ol> <li>The special variable <code>$0</code> has the value "-e", which makes it impossible for the script to find out its own directory, as I haven't found any other way to find that out</li><li>Use of <code>glob</code> makes it crash. I suspect there's an incompatibility between the DLL of <code>File::Glob</code> and <code>tinyperl.exe</code>, because it crashes also when you just use <code>tinyperl.exe</code> instead of plain <code>perl.exe</code>, and not just when running the packed script.</li><li>Its perl version is 5.8.0, which is not only quite old, but you know what they say about first releases of programs... that goes for Perl 5.8, too.</li><li>It decompresses its lib archive (which is in a separate ZIP file) in the directory the executable is in, which is an activity that's frowned upon ever since Windows XP came out, now 5 years ago... yet all too many programs still try to write to the directory the executable is in.<p>At least, PAR decompresses its archive (that is embedded in the executable itself instead of in a separate file) in a directory for temporary files. (Why do these files need to be decompressed to a file anyway? Why can't they be loaded from RAM instead?)</p></li><li>The project appears abandoned, in fact, everything GMPASSOS has put on CPAN hasn't been updated since somewhere in 2005. He hasn't been on Perlmonks in 2 years, either. Where has he gone to? I have no idea...</li></ol><p>Well that shatters any hope of getting any bugs fixed, at least, in a timely manner...</p><p>But somewhere on this project's homepage was a link to another, related Sourceforge project: <a href="">PerlBin</a>, from the formerly very active <a href="">crazyinsomniac</a> AKA <a href="">PodMaster</a>, who also vanished from the face of this earth...</p><p>I find it quite impressive. The project is only a Perl module, with a package of just 16k... and a script, called <code>perltobin</code>. You can allegedly use it for a vast array of platforms, provided you have a C compiler... or you can use the "binary" package for ActivePerl on Windows, either 5.6.x or 5.8.x, which doesn't require a C compiler at all the pre-compiled to-be-linked-in library is included in the distribution.</p><p>It has a different set of properties, from the same (limited) point of view (for this project):</p><ol><li> <code>glob</code> works, though you need to explicitely invoke <code>File::Glob</code> to include the necessary files that appears to be caused because <a href=";mode=module">Module::Dependency</a> doesn't seem to catch it by default.</li><li> <code>$0</code> is set to the path of the executable file, so it can deduce its own file path</li><li>Module files are copied into a file tree, not into a ZIP file. At least it doesn't require decompression into temp files, though for distribution it is quite large</li><li>Even though the project hasn't been updated since, what? 4 years, it actually still works, combined with the most recent distribution of ActivePerl</li><li>You have to manually delete the lib tree when you want to recompile the binary, after an edit.</li></ol><p>BTW it appears to be built upon GMPassos' work, but with a different focus, apparently.</p><p>And then, there is <a href="">PAR</a>. I haven't tried it for this project, but even though it has the great advantage of producing just one executable file, it also has the disadvantage of building temp files for the modules when it runs, which makes the behaviour less than professional; just like TinyPerl, but unlike <a href="">PerlApp</a>/<a href="">perl2exe</a> (AFAIK). Somehow I always found <code>PAR</code> quite daunting, so this has always been a bit of a showstopper to me.</p><p>But I like having the choice between alternatives, and I can dream of a merged project, with a "best of all worlds", and one that doesn't need any tempfiles, as I'm quite sure that it must be possible. I just don't know how, yet.</p> bart 2007-10-31T00:19:08+00:00 journal Belgian Perl Workshop Oct 27... anybody I know going? <p>In two weeks time, the <a href="">first Belgian Perl Workshop</a> is going to take place. There are a few talks that interest me, although the schedule keeps shifting all the time... Are there going to be 2 rooms, or just one? I'm not sure. But that would not be the main reason for me to go, <em>if</em> I go, I'm still undecided.</p><p>The main reason to go for me would be maybe to finally meet up with some other Perl people that I may have known online sometimes for many years already.</p><p>But it feels to me like I'm one of just a handful of people in Belgium that are actually using Perl and are engaged in the international Perl community. So... Are you going?</p> bart 2007-10-13T16:20:45+00:00 journal Installing ActivePerl and MinGW from scratch <p>As you may well know, in modern releases of ActivePerl 5.8.x, ActiveState has provided automatic support for (free) alternative C compilers to build XS modules in. I've been doing that for a while now... but I had no idea how hard it is these days, to install them both from scratch.</p><p>It's now the second time in about a month time, that I'm doing this. This time, I took extra care to carefully remove old installations, so I can safely say I'm doing a blank install. The only thing I already have, is MS nmake, and it's in PATH.</p><p>First I'm installing MinGW. Boy has that become easy! Just go to the <a href="">MinGW download page</a>, get the installer (MinGW-5.1.3.exe, the top link in the table), and run it. Choose what you want installed, and it'll just fetch the archives and install them. Wow. There's no longer a need for a "metre of beer" contest, this is just too simple.</p><p>The previous time I installed ActivePerl (build 820) I had a few problems solvable, but problems nevertheless. This time, most (all?) of these glitches seem to have gone. That is good news.</p><p>I'm still using Win98 on my secondary computer, the primary being an XP laptop. Last time, I had problems running scripts (PPM, perldoc, CPAN) straight from the command line. It refused to run, the only error message I got was "syntax error", which doesn't sound like a Perl error message. I could still run it using</p><blockquote><div><p> <tt>perl -S PPM</tt></p></div> </blockquote><p>so I think it may have been related to something pl2bat did. I thought the resulting scripts were Unix text files, or mixed half Unix text files, half Windows text files. But that seems to have been fixed, it works now.</p><p>Next test: CPAN. The previous time building XS files didn't work out of the box, I got weird C compiler errors. After some hours of digging and finally fixing the problem, I found out on ActiveState's bug tracking system it <a href="">had already been solved months earlier</a>. What the...? But, looking at the <a href="">patch list for this build</a>, it looks like Jan Dubois has <a href="">followed his own advice</a>, as all compiler related values have been disabled in configpm, the Perl script that in the source distribution is used to build That implies that now, at least, MinGW <em>should</em> be able to build XS modules out of the box. Excellent.</p><p>Except: the version of that comes with it (1.9102), is broken. It complains about an unimplemented flock call. So I downgraded by copying the previous version out of the older distribution (1.7602). Did I mention I use the AS distro, not the MSI file? I do. It's a plain ZIP file with a relocation script. Nice and transparent for cases like this. After undergoing a bit of panicking by about the<nobr> <wbr></nobr><code>.lock</code> file, and manually deleting it a few times, it finally ran.</p><p>I had to first manually add the bin directory for MinGW to my PATH, because ActivePerl didn't see my gcc. Perhaps that is better in XP the mechanism to permanently set environment variables is different there.</p><p>But then, MinGW can indeed build XS modules out of the box. I tried Text::CSV_XS, HTML::Parser, DBI, and DBD::SQLite as test cases; only DBI had some tests fail but I suppose that is, again, because of the platform. There were some complaints about flock not being implemented.</p><p>All in all, things are looking good except that the included CPAN version refuses to work on Win98. But you probably will not have that problem on XP.</p> bart 2007-08-22T00:58:44+00:00 journal CMS So you're thinking about using a content management system for a new website? Here's a few thoughts on some popular CMS systems on the <a href=""></a> site: <a href="">poofygoof hates software: content management systems</a>. <p>A few quotes I particularly liked: </p><dl> <dt>Plone</dt><dd>however, after reading further, it appears that plone has its own idea of how databases should work, and doesn't use a SQL backend out of the box. thanks, zopers, but I have better things to do than wank in python.</dd><dt>Drupal</dt><dd>web input methods are _STILL_ the simplistic hack they always were, and the last thing I want to do is handle content generation WITHIN A WEB BROWSER. I want to view it in a browser, but not generate it there. I'll edit XML, even. drupal does not appear to accomodate.</dd></dl> bart 2007-07-12T22:58:46+00:00 journal Functional approach for weighted random picks out of a list <p>I have been trying to find a good way to pick a random item from a list, where items do not have the same probability of being picked: using weighted probabilities.</p><p>A way to visualize the problem, which is also a way to tackle it, is by representing each item by books (or planks) of unequal thickness. You make a pile of all the books, measure the total height of the pile, randomly choose a height between zero and that total height, and see which book lies at that height.</p><p>But I feel this is a very clumsy way: you have to assign an order to the items, calculate a running total for the heights of the items that come in front of it to determine at what height each item lies, and all that just to pick one item. If an item is added or removed from the list, you have to start over with all the administrative work of ordering and summing. What a contrast with equally weighted probabilities, where you just have to pick an index number.</p><p>So I thought, surely there must a more straightforward, functional way? One that could possibly even work in plain SQL? Because there is such a way to randomly shuffle a list of items with equal probabilities: just add an extra column with a random value (<code>rnd</code>, or <code>rand</code> in Perl, <code>dbms_random.value</code> in Oracle), and sort the rows according to this column. In Oracle:</p><blockquote><div><p> <tt>select dbms_random.value, T.* from mytable T order by 1</tt></p></div> </blockquote><p>All items have an equal chance of winning, i.e. coming out first, or getting the lowest random value, irrespective of the distribution of the randomness, provided the subsequent random values are sufficiently uncorrelated (meaning you can't really predict the next random value based on the previous values, which actually is an illusion in pseudo-random generators) and the probabilities are the same for all, as there is nothing skewing the chance of winning in favor of any of the items. Hence: they must all have an equal chance of winning.</p><p>But is there such a way to do the same with weighted probabilities? There must be. So I decided to explore. I am planning on comparing the values of <code>$rand[$i]</code> which is a weighted random function: a random number where the probability distribution depends on the value of the index <code>$i</code>. So I'm dusting off my math skill. As my confidence in them is not too great (I know I'm bound to make plenty of mistakes), I'll use experimental results to verify the results so any blatant mistakes should stand out.</p><p>Eventually, by pure luck, I <em>did</em> find a formula that works, but not at first. Here's how I found it.</p><p>I'll start with just 2 items. What is the probability that item 2 will win? The way you calculate that is like this: you find out what values X you can possibly have for item 1's random value, determining the probability that the value is X (or, for continuous functions, in a small range around X between X and X+dx, which for small enough values of dx becomes virtually proportional to dx). Next, you have to determine the probability that item 2 beats this. You multiply the two, to calculate the probability of both happening at the same time, and finally, you add all these results for all possible values of X (integrate, in case of continuous functions) to take care of all possible cases: and you end up with the probability of item 2 winning, period.</p><p>The common generic representation is a formula</p><blockquote><div><p> <tt>Sum(P(A)*P(B|A))</tt></p></div> </blockquote><p>(the probability of A happening, times the probability of B happening provided A did happen, summed over all possible cases for A) which is, applied here:</p><blockquote><div><p> <tt>Sum(P(x1 between X and X+dx)*P(x2 &lt; X))</tt></p></div> </blockquote><p>To work. My first thought was to just multiply (or divide) <code>rand()</code> by its weight: </p><blockquote><div><p> <tt>$rand[$i] = $weight[$i]*rand;</tt></p></div> </blockquote><p> Surely the result would result in a weighted probability, only, I had no idea what the relation is to the weight factors. With just 2 items, the formula is equivalent to:</p><p>Item 1 wins if <code>$rand[1], which is the same result I would get with</code></p><blockquote><div><p><code> <tt>$k = $weight[2]/$weight[1];<br>$rand[1] = rand;<br>$rand[2] = $k*rand;</tt></code></p></div> </blockquote><p>With <code>k &gt;= 1</code> (if this isn't the case, just swap the two items), with <code>X=x1</code>, the chance that <code>x2&lt;X</code> is <code>X/k</code>. Eventually, this leads to a probability of item 2 winning of <code>X/(2*k)</code>. This doesn't feel right, with a factor <code>k=2</code> item 2 has a probability 1/4 of winning, and item 1 has 3/4. That's a ratio of 1/3, not 1/2. Experiments confirm this result.</p><p>So let's try again for <code>k&lt;=1</code>. There's an asymmetry for the value of <code>k</code> above or below 1, and the reason for this asymmetry is clipping: with <code>k&lt;1</code>, there's a threshold for $rand[1] above which item 2 cannot possibly lose, and that threshold value is k. So the probability of item 2 winning for a value x for $rand[1] is broken into 2 pieces:</p><blockquote><div><p> <tt>x/k&nbsp; &nbsp; for 0 &lt;= x &lt;= k<br>1&nbsp; &nbsp; &nbsp; for k &lt;= x &lt; 1</tt></p></div> </blockquote><p>Integration over the range 0 to 1 for x yields <code>k**2/(2*k)+(1-k) = 1-k/2</code>. (for <code>k=1/2</code>, this is 3/4)</p><p>Time to confirm the results:</p><blockquote><div><p> <tt>use Math::Random::MT qw(rand);<br>my $k = 2;<br>my @n = ( 0 , 0 );<br>for my $i (1<nobr> <wbr></nobr>.. 1E5) {<br>&nbsp; &nbsp; my @rand;<br>&nbsp; &nbsp; $rand[0] = rand;<br>&nbsp; &nbsp; $rand[1] = $k*rand;<br>&nbsp; &nbsp; $n[index_with_min_value(@rand)]++;<br>}<br>use Data::Dumper;<br>print Dumper \@n;<br> &nbsp; <br>sub index_with_min_value {<br>&nbsp; &nbsp; # what index has the lowest value in a list?<br>&nbsp; &nbsp; my $ix = 0;<br>&nbsp; &nbsp; my $min = $_[0];<br>&nbsp; &nbsp; for my $i (1<nobr> <wbr></nobr>.. $#_) {<br>&nbsp; &nbsp; &nbsp; &nbsp; if($_[$i] &lt; $min) {<br>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; $ix = $i;<br>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; $min = $_[$i];<br>&nbsp; &nbsp; &nbsp; &nbsp; }<br>&nbsp; &nbsp; }<br>&nbsp; &nbsp; return $ix;<br>}</tt></p></div> </blockquote><p>The result of a test run is:</p><blockquote><div><p> <tt>$VAR1 = [<br>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 75088,<br>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 24912<br>&nbsp; &nbsp; &nbsp; &nbsp; ];</tt></p></div> </blockquote><p>I'm using a better random number generator than the one that comes with Perl. The standard random number generator in ActivePerl/Win32 has a meager 15 bits resolution, so you get at most 32000 something different values, and it has a repetition period in the same area. Thus for the above experiment, you'd go through the sequence of all the possible random values 3 times. The generator I chose, an implementation of <a href="">Mersenne Twister</a>, has a period of 4.315425E6001 which is like forever, and it has (in this implementation) a resolution of 32 bits. By just importing the function <code>rand</code>, it can replace the built-in <code>rand</code>. So I'm hoping it is better.<nobr> <wbr></nobr>:)</p><p>The results are very skewed in favor of the values with the smaller factor: a sample run with 3 items with weight factors (1, 2, 3) produced the result:</p><blockquote><div><p> <tt>$VAR1 = [<br>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 63890,<br>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 22216,<br>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 13894<br>&nbsp; &nbsp; &nbsp; &nbsp; ];</tt></p></div> </blockquote><p>Thus item 3 with a weight factor that's 3 times larger, has about a 5 times smaller likeliness of winning.</p><p>So I was thinking, what if I picked a distribution that is free of clipping, one that has no upper limit? A nice candidate distribution is the exponentially damped function, a function that you can see a lot in nature:</p><blockquote><div><p> <tt>P[X &gt; x] = exp(-x)</tt></p></div> </blockquote><p>with <code>X &gt; 0</code> and <code>x &gt; 0</code>, but with no upper bound.</p><p>This function's graph looks pretty much alike independent of the actual value of x: P just scales when x shifts.</p><p>So I wanted to compare results for probability distributions of <code>exp(-x)</code> for item 1, with <code>exp(-k*x)</code> for item 2. (This time, I decided to let the larger value for x win, because P[x &lt; X] doesn't look as nice as a function.)</p><p>The probability for x1 (value for item 1) to be between x and x+dx is <code>exp(-x)*dx</code>. The probability of item 2 beating that (having a bigger value) is <code>exp(-k*x)</code>. (A larger k results in a faster decaying probability function.)</p><p>Integration of <tt>&#8747; exp(-k*x)*exp(-x)*dx [0 &#8734;]</tt> yields the value <code>1/(1+k)</code>.</p><p>That is nice: with k=2 I get 1/3, and its counterpart, with k=0.5, gets 1/(1.5) = 2/3. Those are the numbers I'm after, as their ratio is 2! But do note that the larger factor yields a smaller probability, so you have to <em>divide</em> by the weight, not <em>multiply</em>.</p><p>So how can you get a distribution that's exponential, out of a plain uniform distribution? By using a transforming function: <code>x=fn(rand)</code>. With $y=rand; that is plainly following a uniform distribution between 0 and 1, and the required equivalence <code>$y==exp(-$x)</code>, we end up with the transforming function <code>$x=-log($y)</code>. Since the probability that <code>$y&lt;Y</code> (with Y between 0 and 1) is Y, the probability that <code>($x=-log($y)) &gt; (X=-log(Y))</code> is <code>Y == exp(-X)</code>. (Note the swapping of the directions: a smaller $y yields a larger $x.) So this is exactly the result I am after, with just one nitpick: the border cases. <code>rand</code> can become zero, which is a value that <code>log()</code> chokes on; but it'll never be exactly 1, which is a value that <code>log()</code> doesn't mind. So I'm reversing the direction for y, and replace <code>-log(rand)</code> with <code>-log(1-rand)</code>. For the rest, nothing changes, as the probability for in between values remains unchanged.</p><p>That's enough theory, so let's confirm these results through experimenting. As item 2 has a weight of 2, and item 3 a weight of 3:</p><blockquote><div><p> <tt>use Math::Random::MT qw(rand);<br>my @w = (1, 2, 3);<br>my @n = (0, 0, 0);<br>for my $i (1<nobr> <wbr></nobr>.. 1E5) {<br>&nbsp; &nbsp; my @x;<br>&nbsp; &nbsp; $x[$_] = -log(1 - rand)/$w[$_] for 0<nobr> <wbr></nobr>.. $#w;<br>&nbsp; &nbsp; $n[index_with_min_value(@x)]++;<br>}<br>use Data::Dumper;<br>print Dumper \@n;<br> &nbsp; <br>sub index_with_min_value {<br>&nbsp; &nbsp; # what index has the lowest value in a list?<br>&nbsp; &nbsp; my $ix = 0;<br>&nbsp; &nbsp; my $min = $_[0];<br>&nbsp; &nbsp; for my $i (1<nobr> <wbr></nobr>.. $#_) {<br>&nbsp; &nbsp; &nbsp; &nbsp; if($_[$i] &lt; $min) {<br>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; $ix = $i;<br>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; $min = $_[$i];<br>&nbsp; &nbsp; &nbsp; &nbsp; }<br>&nbsp; &nbsp; }<br>&nbsp; &nbsp; return $ix;<br>}</tt></p></div> </blockquote><p>Results:</p><blockquote><div><p> <tt>$VAR1 = [<br>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 16621,<br>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 33235,<br>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 50144<br>&nbsp; &nbsp; &nbsp; &nbsp; ];</tt></p></div> </blockquote><p>Bingo. As you can see, the second item has twice the chance of winning than the first item, with a weighing factor of 2 vs. 1; and the third item has 3/2 times the chance of winning compared to the second one, with weighing factors 3 vs. 2. It's just perfect.</p><p>And I'm sure you can use this safely in (Oracle) SQL.</p> bart 2007-06-29T00:30:17+00:00 journal