ethan's Journal ethan's use Perl Journal en-us use Perl; is Copyright 1998-2006, Chris Nandor. Stories, comments, journals, and other submissions posted on use Perl; are Copyright their respective owners. 2012-01-25T02:10:10+00:00 pudge Technology hourly 1 1970-01-01T00:00+00:00 ethan's Journal CPAN detergent needed <p>There were times when a pathetic addition to the CPAN was merely a minor annoyance, easy to tolerate as such a module did not interfer with one's own work.</p><p>Nowadays however, <a href="">one rotten CPAN egg</a> is capable of making tests of your own modules fail, as can be seen <a href="">here</a> and <a href="">here</a>.</p><p>Mind you, the offending module isn't used nor mentioned anywhere in the modules whose tests it can make fail. The failures happen because this pathetic piece of shit <a href="">doesn't adhere</a> to any CPAN packaging standards and its files and directories somehow get merged with those of other modules on tarball-extraction.</p><p>And now I am waiting for those who still claim that CPAN's policy should be as liberal as possible when it comes to uploads. No, it shouldn't. Instead, too obvious breaches against CPAN conventions should be denied their upload. Furthermore, those that were uploaded in the past ought to be deleted immediately.</p><p>I suppose it's not going to happen. That means that my modules will continue to fail their tests on that particular machine until its admin finally removes the offending files by hand.</p> ethan 2005-09-29T05:56:39+00:00 journal Test::LongString <p>For a long time I didn't quite see the purpose of <code>Test::LongString</code>.</p><p>Until I had to compare two binary strings of around 170K with each other and C garbled my terminal when they<br>turned out to be unequal. Rafael++.</p> ethan 2005-05-16T08:34:35+00:00 journal toke.c <p>I never quite understood why Perl offered no hooks into its lexer and parser. They're contained in the interpreter, the very same program that runs my Perl scripts.</p><p>So I snuck a peek at the dreaded <code>toke.c</code>. My initial thought was that it was merely a matter of calling <code>yylex()</code> after initializing a few of the global <code>PL_*</code> variables appropriately. Only that on closer inspection there turned out to be exactly 99 of these global variables involved in the lexing process, including those dealing with the various perl stacks, control OPs and symbol tables.</p><p>So what I did was create a C++ class with 99 member variables. Each function in <code>toke.c</code> became a method that no longer works on <code>PL_variable</code> but <code>this-&gt;pl_variable</code> instead. Some non-lexer related functions had to be modified thusly, too, such as <code>Perl_init_stacks()</code> and a handful of those <code>Perl_save_*()</code> functions in <code>scope.c</code>. The whole purpose of that was to make the lexer re-entrant.</p><p>With these adjustments (and a few hundred #undefs/#defines), the actual XS code is very tiny:</p><blockquote><div><p> <tt>MODULE = Perl::Lexer&nbsp; &nbsp; &nbsp; &nbsp; PACKAGE = Perl::Lexer<br> &nbsp; <br>Lexer *<br>Lexer::new ()<br>&nbsp; &nbsp; CODE:<br>&nbsp; &nbsp; {<br>&nbsp; &nbsp; &nbsp; &nbsp; RETVAL = new Lexer();<br>&nbsp; &nbsp; &nbsp; &nbsp; RETVAL-&gt;Pinit_stacks(aTHX);<br>&nbsp; &nbsp; }<br>&nbsp; &nbsp; OUTPUT:<br>&nbsp; &nbsp; &nbsp; &nbsp; RETVAL<br>&nbsp; &nbsp; CLEANUP:<br>&nbsp; &nbsp; &nbsp; &nbsp; RETVAL-&gt;ME = newSVsv(ST(0));<br> &nbsp; <br>void<br>Lexer::set_string (SV *line)<br>&nbsp; &nbsp; CODE:<br>&nbsp; &nbsp; {<br>&nbsp; &nbsp; &nbsp; &nbsp; THIS-&gt;lex_start(aTHX_ line);<br>&nbsp; &nbsp; }<br> &nbsp; <br>void<br>Lexer::next_token ()<br>&nbsp; &nbsp; CODE:<br>&nbsp; &nbsp; {<br>&nbsp; &nbsp; &nbsp; &nbsp; int tok = THIS-&gt;yylex(aTHX);<br> &nbsp; <br>&nbsp; &nbsp; &nbsp; &nbsp;<nobr> <wbr></nobr>/* skip empty lines */<br>&nbsp; &nbsp; &nbsp; &nbsp; if (tok &amp;&amp; THIS-&gt;bufptr)<br>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; while (THIS-&gt;bufptr == '\n') THIS-&gt;bufptr++;<br> &nbsp; <br>&nbsp; &nbsp; &nbsp; &nbsp; if (tok == 0)<br>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; XSRETURN_EMPTY;<br> &nbsp; <br>&nbsp; &nbsp; &nbsp; &nbsp; EXTEND(SP, 2);<br>&nbsp; &nbsp; &nbsp; &nbsp; ST(0) = sv_2mortal(newSViv(tok));<br>&nbsp; &nbsp; &nbsp; &nbsp; ST(1) = sv_2mortal(newSVpv(TOKENNAME(tok), 0));<br>&nbsp; &nbsp; &nbsp; &nbsp; XSRETURN(2);<br>&nbsp; &nbsp; }<br> &nbsp; <br>void<br>Lexer::DESTROY ()</tt></p></div> </blockquote><p>And a sample script along with its output looks like this:</p><blockquote><div><p> <tt>use blib;<br>use Perl::Lexer;<br> &nbsp; <br>my $string = &lt;&lt;'EOS';<br>$a{1} = 1;<br>print keys %a;<br>EOS<br> &nbsp; <br>my $lexer = Perl::Lexer-&gt;new;<br>$lexer-&gt;set_string($string);<br>while (my $l = $lexer-&gt;next_token) {<br>&nbsp; &nbsp; print $l, " ";<br>}<br>print "\n";<br> &nbsp; <br>__END__<br>$ WORD { THING ; } ASSIGNOP THING ; LSTOP UNIOP % WORD ;<nobr> <wbr></nobr>;</tt></p></div> </blockquote><p>A couple of problems still exist: Once the lexer sees a comment, an empty line or a shebang line, it seems to gobble up all characters up to the end of the string and thus finishes scanning. The shebang-line stuff is done in <code>S_find_beginning()</code> in <code>perl.c</code> before parsing even starts. As for empty lines, I suppose they are handled by perl's parser and not its lexer.</p><p>The last thing that needs to be done is making the actual attributes belonging to a token available. Ideally, this is just a matter of exposing <code>yylval</code> to the outside world.</p> ethan 2005-04-18T07:55:19+00:00 journal Kwalitee <p>Lately I poked around a bit at <a href="">CPANTS</a>. I should maybe say that I am not really believing in this kwalitee thing and am mostly d'accord with what Schwern wrote about it on <a href="">cpanratings</a>.</p><p>Nonetheless I couldn't resist of looking up the scores of my modules. They weren't quite as high as possible because so far I haven't done any pod-coverage in my tests. This can be easily rectified. Then I noticed an annoying thing about <code>Test::Pod</code> and <code>Test::Pod::Coverage</code>: They claim to rely on <code>Test::More</code>. This is bad for some of my modules which are supposed to run on older perls, too, so I only use the plain <code>Test</code> module for their tests. Yet, CPANTS somehow tickled my vanity so I came up with test suites that use <code>Test::Pod</code> and <code>Test::Pod::Coverage</code> respectively without any need for <code>Test::More</code>:</p><blockquote><div><p> <tt>eval "use Test::Pod";<br>if ($@) {<br>&nbsp; &nbsp; print "1..0 # Skip Test::Pod not installed\n";<br>&nbsp; &nbsp; exit;<br>}<br> &nbsp; <br>my @PODS = qw#../blib#;<br> &nbsp; <br>all_pod_files_ok( all_pod_files(@PODS) );</tt></p></div> </blockquote><p>and in a similar fashion for <code>Test::Pod::Coverage</code>.</p><p>Of course, this exposes my modules to other potential problems, such as when the <code>Test::Harness</code> protocol changes, and thus makes it more flakey. However, the kwalitee score is now higher which already says something about the value of the kwalitee measurement.</p><p>Also, I think this test from CPANTS:</p><blockquote><div><p> <tt>is_prereq<br>&nbsp; &nbsp; Shortcoming: This distribution is only required by 2 or less other distributions.<br>&nbsp; &nbsp; Defined in: Module::CPANTS::Generator::Prereq</tt></p></div> </blockquote><p>is plain wrong and worthless. Why should it be a good thing that a module is used as a prerequisite by other modules? There's a whole class of modules on the CPAN that provide very high-level functionality and will therefore never be a prerequisite (think of modules such as <a href="">Mail::Box</a> or <a href="">MPEG::MP3Play</a>). This applies to all modules that are used to write applications instead of other modules.</p> ethan 2005-03-31T11:51:27+00:00 journal Increasing turn-around times <p>What is this with the g++?</p><p>I am right now wrapping the fairly huge <a href="">SndObj library</a> written in C++ into an XS module. I'm done with maybe a third of all the classes and compilation is already painful:</p><blockquote><div><p> <tt>ethan@ethan:~/Projects/dists/sound-object/Sound-Object$ time make<br>/usr/bin/perl<nobr> <wbr></nobr>/usr/share/perl/5.8/ExtUtils/xsubpp&nbsp; -C++ -typemap<nobr> <wbr></nobr>/usr/share/perl/5.8/ExtUtils/typemap -typemap typemap&nbsp; Object.xs &gt; Object.xsc &amp;&amp; mv Object.xsc Object.c<br>g++ -c&nbsp; -I. -D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBIAN -fno-strict-aliasing -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -O2&nbsp; &nbsp;-DVERSION=\"0.01\" -DXS_VERSION=\"0.01\" -fPIC "-I/usr/lib/perl/5.8/CORE"&nbsp; -DOSS Object.c<br>rm -f blib/arch/auto/Sound/Object/<br>LD_RUN_PATH="" g++&nbsp; -shared -L/usr/local/lib Object.o&nbsp; -o blib/arch/auto/Sound/Object/; &nbsp;-lsndobj -lpthread<br>chmod 755 blib/arch/auto/Sound/Object/<br> &nbsp; <br>real&nbsp; &nbsp; 0m38.023s<br>user&nbsp; &nbsp; 0m31.130s<br>sys&nbsp; &nbsp; &nbsp;0m0.570s</tt></p></div> </blockquote><p>One additional method added to the XS adds roughly one second of compilation time. Once in a while (mostly after rearranging the order of the packages in the XS file), compilation may take up to one minute. I am quite aware that what really slows down C++ compilation is the extensive use of templates. Now, SndObj uses next to no templates (and that includes no string objects, too). The XS stuff uses only one template worthy mentioning, a map of maps for the ref-counting of the Perl objects when a C++ object is constructed on behalf of another one.</p><p>Also, this slowness seems to be most significant with the g++. Microsoft's C++ compiler is by an order of quite a few magnitudes faster: It hardly ever needs more than 5 seconds for a far more complex WindowsForms<nobr> <wbr></nobr>.NET application which uses templates all over the place. This might be due to the use of precompiled headers.</p> ethan 2005-03-15T08:24:51+00:00 journal TIOCLINUX and gpm <p>For ages I've been looking for a way to make gpm's selection on the console available under X and vice versa. Yesterday I decided that I should take steps to rectify this. Looking at gpm's sources, it all looked straight-forward. With the <code>TIOCLINUX</code> ioctl one can put the current selection into the kernel's selection buffer or make the kernel paste it to a given file-descriptor.</p><p>At this point however the idiocy begins: There is no way to get a copy of the kernel's selection buffer. All you can do with it is having the kernel write it to a file-descriptor which has to be attached to a device capable of TIOCLINUX. This is not very helpful. I want to get the actual content of this buffer and pass it to an application such as <code>xclip</code>. But there is no infrastructure in the kernel for that. Even worse, after googling around a little I noticed that a few years ago someone actually provided a patch to <code>console.c</code> which would allow that. For some reason the kernel nitwits rejected it.</p><p>Fortunately my kernel is out-of-date anyway so I have to compile a new one. This time I'll make the necessary adjustments to the source, extend gpm a bit to pass the selection to xclip and then hopefully have a more useful mouse.</p> ethan 2005-03-03T06:56:17+00:00 journal Benchmarking perls <p>Partly due to the fact that I didn't have anything interesting to do, I wrote a little set of modules to benchmark perls against each other. There already is <a href="">perlbench</a> on the CPAN but I found the way tests had to be written inconvenient.</p><p>According to the results I get, realworld programs seem to get faster on recent perls. Some other things on the other hand are much slower, most notably regexes (almost by a factor 2 when comparing 5.5.4 with a threaded 5.8.6). The results:</p><blockquote><div><p> <tt>Benchmarks&nbsp; &nbsp; &nbsp;| perl5.5.4 | perl5.6.2 | perl | perl5.8.6 | perl5.8.6th | Weight<br>---------------+-----------+-----------+------+-----------+------------<nobr>-<wbr></nobr> +-------<br>bench/loops&nbsp; &nbsp; |&nbsp; &nbsp;1000&nbsp; &nbsp; |&nbsp; &nbsp; 906&nbsp; &nbsp; |&nbsp; 814 |&nbsp; &nbsp; 731&nbsp; &nbsp; |&nbsp; &nbsp; &nbsp;782&nbsp; &nbsp; &nbsp;|&nbsp; &nbsp; &nbsp;70<br>bench/regex&nbsp; &nbsp; |&nbsp; &nbsp;1000&nbsp; &nbsp; |&nbsp; &nbsp;1258&nbsp; &nbsp; | 1978 |&nbsp; &nbsp;1457&nbsp; &nbsp; |&nbsp; &nbsp; 1977&nbsp; &nbsp; &nbsp;|&nbsp; &nbsp; 116<br>bench/recurse&nbsp; |&nbsp; &nbsp;1000&nbsp; &nbsp; |&nbsp; &nbsp; 961&nbsp; &nbsp; |&nbsp; 983 |&nbsp; &nbsp; 956&nbsp; &nbsp; |&nbsp; &nbsp; &nbsp;991&nbsp; &nbsp; &nbsp;|&nbsp; &nbsp; &nbsp;73<br>bench/wave&nbsp; &nbsp; &nbsp;|&nbsp; &nbsp;1000&nbsp; &nbsp; |&nbsp; &nbsp; 860&nbsp; &nbsp; |&nbsp; 994 |&nbsp; &nbsp; 894&nbsp; &nbsp; |&nbsp; &nbsp; &nbsp;925&nbsp; &nbsp; &nbsp;|&nbsp; &nbsp; 236<br>bench/autoload |&nbsp; &nbsp;1000&nbsp; &nbsp; |&nbsp; &nbsp; 968&nbsp; &nbsp; | 1095 |&nbsp; &nbsp;1078&nbsp; &nbsp; |&nbsp; &nbsp; 1159&nbsp; &nbsp; &nbsp;|&nbsp; &nbsp; 139<br>bench/substr&nbsp; &nbsp;|&nbsp; &nbsp;1000&nbsp; &nbsp; |&nbsp; &nbsp; 947&nbsp; &nbsp; |&nbsp; 984 |&nbsp; &nbsp; 986&nbsp; &nbsp; |&nbsp; &nbsp; &nbsp;957&nbsp; &nbsp; &nbsp;|&nbsp; &nbsp; 164<br>bench/mail&nbsp; &nbsp; &nbsp;|&nbsp; &nbsp;1000&nbsp; &nbsp; |&nbsp; &nbsp; 698&nbsp; &nbsp; |&nbsp; 903 |&nbsp; &nbsp; 748&nbsp; &nbsp; |&nbsp; &nbsp; &nbsp;959&nbsp; &nbsp; &nbsp;|&nbsp; &nbsp; 198<br>---------------+-----------+-----------+------+-----------+-------------+-<nobr>-<wbr></nobr> -----<br>Overall&nbsp; &nbsp; &nbsp; &nbsp; |&nbsp; &nbsp;1000&nbsp; &nbsp; |&nbsp; &nbsp; 910&nbsp; &nbsp; | 1085 |&nbsp; &nbsp; 960&nbsp; &nbsp; |&nbsp; &nbsp; 1082&nbsp; &nbsp; &nbsp;|&nbsp; &nbsp;1000</tt></p></div> </blockquote><p>The fourth column is the ordinary Debian testing-perl (perl5.8.4 with threads). The benchmarks can be found <a href="">here</a>. Some tests I consider almost irrelevant, namely <code>autoload</code> which is an OO version of the Fibonacci-number generator where each recursive instance is autoloaded. Also, <code>loops</code> is not interesting because empty loops are unlikely to show up that often. Real world programs are <code>wave</code> which decreases the volume of a 18meg WAV file, <code>mail</code> which walks through a 28meg mailbox and <code>substr</code> which calculates the length of the longest common substring of <code>perldoc -tT perlfunc</code>. It's 647 by the way, would you have guessed?</p><p>The rightmost column is the relative amount of time this test took on the leftmost perl. This number is used for calculating the weighted mean in the bottom row. In case the Weight column doesn't add up to 1000, that's due to some unclever rounding of my code.</p><p>Finally, the module which does the timing and that is included from each benchmark script is this:</p><blockquote><div><p> <tt>package Perl::Benchmark::Lib;<br> &nbsp; <br>use strict;<br>use Time::HiRes;<br> &nbsp; <br>use base qw/Exporter/;<br>use vars qw/$VERSION @EXPORT/;<br>$VERSION = '0.01';<br> &nbsp; <br>$SIG{__WARN__} = sub {};<br>$SIG{__DIE__} = sub {};<br> &nbsp; <br>@EXPORT = qw/bench_adjust/;<br> &nbsp; <br>my $corrective = 0;<br> &nbsp; <br>bench_adjust();<br> &nbsp; <br>sub bench_adjust {<br>&nbsp; &nbsp; my ($code, $times) = @_;<br>&nbsp; &nbsp; if (defined $code) {<br>&nbsp; &nbsp; $times ||= 1;<br>&nbsp; &nbsp; $corrective += eval sprintf &lt;&lt;EOEVAL, "$code\n" x $times;<br>my \@td = Time::HiRes::gettimeofday();<br>%s<br>tv_interval(\@td);<br>EOEVAL<br>&nbsp; &nbsp; }<br>&nbsp; &nbsp; @Perl::Benchmark::Lib::PB_timer = Time::HiRes::gettimeofday();<br>}<br> &nbsp; <br>sub report_timing {<br>&nbsp; &nbsp; print Time::HiRes::tv_interval(\@Perl::Benchmark::Lib::PB_timer) - $corrective;<br>&nbsp; &nbsp; print "\n";<br>}<br> &nbsp; <br>END {<br>&nbsp; &nbsp; report_timing();<br>}<br> &nbsp; <br>1;</tt></p></div> </blockquote><p>It dumps the number of seconds of the runtime to stdout where it is picked up by another module that does all the bookkeepting and eventually spits out the table you see further above with the help of <code>Text::Table</code>. A very nice module, by the way.</p> ethan 2004-12-14T08:11:55+00:00 journal Two Larries and one Groucho <p>Compare <a href="">him</a> with <a href=";method=display_body">him</a> and <a href="">this chap</a>.</p> ethan 2004-12-08T17:40:19+00:00 journal RSS-feeds <p>Nowadays there are RSS-feeds for virtually every crap conceivable. However, you are doomed when you look for German TV-listings as RSS-feeds. The various TV stations have one, but I'd like to have a feed covering all the 20-ish German stations. I suppose I have to do some website-parsing again (and this is the year 2004, mind you!).</p> ethan 2004-12-07T07:26:23+00:00 journal Will it ever end? <p>For over a week now I am preparing the next <code>List::MoreUtils</code> release. It takes so long because I am incorporating all stuff from <code>List::MoreUtil</code> into it. For each new function I had to write an equivalent XSUB and some of the turned out to be a bit tricky.</p><p>Then I noticed that <code>List::MoreUtil</code>'s tests (that I simply took over) are incomplete in that they don't test some of the key characteristis of each function. Some map-like functions pass aliases to the original values to their code argument. Others don't. As it happened, I also noticed that the pure-Perl implementation and the XSUB one sometimes differed with respect to this so I had to make that consistent, too.</p><p>The only thing left is now copying the documentation for these functions over from <code>List::MoreUtil</code>. Most probably I will be deeply unhappy with it so I might end up rewriting it.</p> ethan 2004-11-28T06:54:44+00:00 journal Busy again <p>It never ceases to amaze me how business is something that happens irregularly and unexpectedly. For a couple of weeks I didn't have anything particular to do programming-wise. That has changed this week. Eric J. Roode, author of the ill-named List::MoreUtil, apparently became aware of my List::MoreUtils (which was there before), and sent me a note that I should take the stuff out of his module and put it into mine. Similarly, I received a suggestino from Slaven Rezic to add a python-alike function <code>zip</code>. So now I have to translate both to XS.</p><p>Apart from that, I thought it might be useful to add a new language to my reportoir. Go's C# this time. It's really a very nice language. It's often compared to Java and has been titled a plagiarism which is entirely untrue. The only thing familiar from Java is it's sluggishness: You get a noticeable delay before a C# application would start. Runtime-wise however, it feels faster.</p><p>The good things about C#: It's convenient in that it has some syntactic sugar which makes the code less chatty than its Java equivalent. It hooks very tightly into the operating system so you can access all the things Java doesn't know about. With Java, I always had the impression that I am really programming a toy-machine and not a real one in that I may only do what the Java-API offers. To a certain exent, the same is true for C# only that this time I may do anything the operating system offers me. Also, it can be easily extended by embedding C/C++ code (I think the official lingo for that is 'unsafe code' which is entirely fine with me). And finally, you can use the familiar Windows widget-set and don't have to work with the Swing-cripple.</p><p>One remark about<nobr> <wbr></nobr>.NET: I simply can't understand why people hate it so much. It's really just a large API with the additional bonus that it is identical for C++, Visual Basic, C# and J# (whatever that is). Cunning as I am, I already came up with a useful pattern that can be deduced from that: Write your prototype in C# and once it's all working, translate to C++ which is a relatively smooth transition under<nobr> <wbr></nobr>.NET. Additional bonus is the fact that you're probably using Visual Studio. I am not much of an IDE guy in that I try to confine myself to vim and a couple of consoles but I have to say there's something quite appealing about that IDE in the way it can bundle source files in the various<nobr> <wbr></nobr>.NET languages, documentation, GUI design, Debugger etc. all together.</p><p>I also have a few rants, of course, but I save those for another entry a little later.</p> ethan 2004-11-17T06:38:52+00:00 journal The occasional bugreport rant <p>This morning I received:</p><blockquote><div><p> <tt>Subject: Mail::Transport:Dbx<br>Date: Sat, 13 Nov 2004 20:07:06 -0500<br> &nbsp; <br>Hi I use your program and i like it but I have found that when my<nobr> <wbr></nobr>.dbx file is<br>at a very large file size it does not work. So that may be a bug right?<br>thought you might like to know if you don't already.<br> &nbsp; <br>xxxxx xxxxxxx</tt></p></div> </blockquote><p>Well, no, I didn't know about it. Furthermore, I don't think I ever will with reports like that.</p><p>Even better, I don't seem to be able to reply. My mail just bounced. Interestingly enough though, the bounce mail gave me a different recipient address from the one that I sent it to. So there's probably a mail forwarding service out of control.</p> ethan 2004-11-14T05:28:28+00:00 journal I love XP <p>Windows XP, that is. And no, I am not ashamed of admitting it.</p><p>The question is why I should be running XP in the first place. It had to do with my university taking part in Microsoft's MSDNAA program which makes most of their software available to academic circles. For the first time in my life, this gave me the chance to use a Microsoft product legally. So I copied the Windows XS Professional installation CD from a friend of mine who was so nice to have already downloaded it, got me a registration key and installed it.</p><p>It didn't go very well at first. XP was unable to talk to my DSL-router at all and it somehow made it block so that not even Debian could connect to it any longer without a prior restart of the router. Then, after XP was installed, WindowsME was gone. There was no way to tell lilo that there were WindowsME, XP and Debian on my two hard-disks. It turned out that the XP hard-disk partitioner insisted on creating the NTFS partition in a logical partition embedded in an extended one (ME was on the first partition of this disk and that was a primary one). So I screwed up the partitioning and and had to wipe out XP again.</p><p>After that, ME would still not boot. It was recognized by lilo again, but apparently XP made changes to the ME installation. Fortunately, I found an old ME bootdisk and was able to do a <code>sys c:</code> to fix things.</p><p>Now being smarter with partitions, I partitioned the ME-disk from Debian with cfdisk, making sure that this time I only had two primary partitions. Upon starting the XP installer, it did not accept the thusly created partition. I tried all kind of funny things to make it work, to no avail of course. I even managed to wipe out the MBR temporarily. I say temporarily because it re-apparead after a while. That must have happened when I was feverishly copying some data over to Debian from the ME partition because I was very sure I had destroyed the ME installation. Why it reapparead, I can't say. But I was relieved.</p><p>After a while, I figured out why XP refused to accept my partitions: I forgot to make a reboot after partitioning. I somehow assumed that repartitioning the disk and then starting the XP installer by booting from CD would register the new disk-layout. But apparently that was not the case. Anyway, after those annoyances, I installed XP again and this time was delighted to see that it had none of the previous problems with my router.</p><p>So why do I like XP so much? It seems to be a very well conceived system. Each and every bit of hardware in my machine was detected. I particularly loved it when I switched on my printer for the first time (a very old OKI laster-printer). It game me a plock-sound and a pop-up at the systray informing me that it detected and configured a new printer-device. And indeed it did. Lovely!</p><p>Also, the whole user-interface makes a lot more sense than the old ME/2000 style. It manages to hide away distracting information from me without sacrificing vital ones. It feels very lean and responsive even though one would assume that it's more bloated than previous Windows releases. Also, I feel reasonably safe even when using the Internet Explorer (which is due to the fact that I am behind a router and Microsoft's new firewall appear to work). Thanks to SP2, IE finally has a popup-block that works just as well as the one from Firefox that I got so familiar with.</p><p>There are only a few annoyances, mostly related to software-installations. I install as admin and most of the time the software has been properly set up for my user account, too. But some software fails in this respect. This is probably true for software that has been written with the old 98/ME setup in mind. It's not such a big problem, though.</p> ethan 2004-11-10T17:53:44+00:00 journal An 'unreportable' perl bug revisited <p>In my last entry I mumbled something about a tricky bug in perl that was allegedly unreportable. Good so because as it turned out, it wasn't a bug at all. I had this kind of setup:</p><blockquote><div><p> <tt>&nbsp; &nbsp; sub search {<br>&nbsp; &nbsp; my ($self, $state) = @_;<br>&nbsp; &nbsp;<nobr> <wbr></nobr>...<br>&nbsp; &nbsp; # $state-&gt;{last_search} being an array-ref of previous<br>&nbsp; &nbsp; # search patterns. You were supposed to cycle through them<br>&nbsp; &nbsp; # with TAB<br>&nbsp; &nbsp; my $pattern = $self-&gt;get_input($state, $state-&gt;{last_search});<br>&nbsp; &nbsp;<nobr> <wbr></nobr>...<br>&nbsp; &nbsp; # once a list of matchings songs has been returned and the user<br>&nbsp; &nbsp; # chose one of them to play<br>&nbsp; &nbsp; $state-&gt;playlist-&gt;play;<br>&nbsp; &nbsp; }<br> &nbsp; <br>&nbsp; &nbsp; sub get_input {<br>&nbsp; &nbsp; my ($self, $state, $searches) = @_;<br>&nbsp; &nbsp;<nobr> <wbr></nobr>...<br>&nbsp; &nbsp; if ($hit_key == KEY_TAB) {<br>&nbsp; &nbsp; &nbsp; &nbsp; push @$searches, shift @$searches;<br>&nbsp; &nbsp; &nbsp; &nbsp; goto &amp;search;<br>&nbsp; &nbsp; }<br>&nbsp; &nbsp;<nobr> <wbr></nobr>...<br>&nbsp; &nbsp; return $search_pattern;<br>&nbsp; &nbsp; }</tt></p></div> </blockquote><p>The above is naturally just a simplification. There are several event-loops going on at once so I kind of had to chose such an awkward way. Now the thing is that <code>goto &amp;func</code> replaces the running instance of a function with <code>&amp;func</code>. When looking at <code>search()</code> it becomes obvious that this would eventually expand to:</p><blockquote><div><p> <tt>&nbsp; &nbsp; my $pattern = $self-&gt;search($state);</tt></p></div> </blockquote><p>which is wrong as now <code>$pattern</code> would contain the return value of <code>search</code> which would be its last value evaluated (<code>$state-&gt;playlist-&gt;play</code> always returns 1). Now I changed things so that <code>get_input</code> does a <code>goto &amp;get_input</code> instead.</p><p>So far I always thought that <code>goto &amp;func</code> would be one of those less harmful things. But in fact, it's even worse than anything Dijkstra described in his pamphlet.</p> ethan 2004-10-23T06:11:37+00:00 journal An 'unreportable' perl bug <p>Over the last two days I was chasing a bug in my re-written mp3-player. Eventually with the help of <code>Carp::confess</code> I was able to see what was going on: A method received an additional argument although this was impossible. The line that <code>confess</code> included in its stacktrace was this:</p><blockquote><div><p> <tt>&nbsp; &nbsp; my $pattern = $self-&gt;get_input($state, $compl ? $compl : $state-&gt;search);</tt></p></div> </blockquote><p>This call, according to <code>confess</code>, resulted in <code>ActionLoop::search(ActionLoop=HASH(0x814cbb8), 1)</code>. I wonder under which contrived circumstances <code>$obj-&gt;method</code> can produce an additional argument. <code>$state-&gt;search()</code> did not fix it. The only way was passing <code>undef</code> explicitely.</p><p>Unfortunately, there's no obvious way for me to report this bug as the bigger picture includes three methods of which one calls the other sort of tail-recursively using <code>goto &amp;func</code>. I am very sure that the bug immediately disappears when I try to strip it down enough to include it in a bug report.</p><p>On an entirely different front, I polished up my freshly installed Debian a little. I can finally watch TV. It turned out that these deadlocks were produced by an apparently buggy Nvidia driver. Using an older one fixed it. The amazing thing is that my whole Debian partition is only filled with 1.3GB of data and it can do everything that my Linux from scratch could do with 26GB. As a result, it's blazingly fast even though Debian comes with no packages that were compiled with the optimization settings that I used for Linux from scratch. Even the default perl (which is an ithreaded one) feels quicker than the non-threaded Linux-from-scratch-counterpart.</p> ethan 2004-10-21T05:04:32+00:00 journal Back to Debian <p>Ever since the smaller of my two hard-disks went up in smoke, I had a new hard-disk and installing a new Linux rather high on my to-do list. Both is now done.</p><p>As for a hard-drive, I got one of those nice Seagate Barracudas. Speed-wise they are only avergage, but you can hardly hear them operate which is a huge progress if you compare that to my other drive (a Maxtor one).</p><p>Installing linux unfortunately became a neccessity, too. The setup I've been using so far was a Linux-from-Scratch. I have to admit that it worked splendidly well but the fact that I needed to compile everything by hand and that upgrading any packages is virtually impossible made it impracticle on the long run. But as I did like the idea of having a lean system that was specifically compiled for my machine, I installed Gentoo over the weekend. At first it went all fine. It was a snap to make a network install over DSL.</p><p>But yesterday it became apparent that many things aren't that bright at all. It was impossible to get my precious mp3-player working. It's a bit heavy on dependencies in that it requires MPEG::MP3Play, Ogg::Vorbis::Decoder, Ogg::Vorbis::Header, MP3::Info, some terminal related modules and a piece of Inline::C I wrote around libao. Second bad thing about gentoo is xawtv: Depending on which version I installed, it either segfaulted right on start-up or it merely produced a blue screen. I was able to get it working without Xvideo support but that meant I couldn't make it run fullscreen at a higher resolution without black areas around the screen. Thirdly, Gentoo makes it virtually impossible to set up mail properly. It does offer the various MTAs such as sendmail, postfix or exim. But apparently I need to configure it myself. For exim, this would be fine if it was exim3, but they only have exim4 which I was never able to get running. Their suggestion of using nbsmtp will prevent my newsreader slrn from installing as it requires a proper sendmail binary. So that is inacceptable as well.</p><p>So eventually I turned back to Debian which I had used happily for years (until this hard-disk was nuked). Installation wasn't quite as nice as the minimal installer image does allow you to install it over modem but strangely enough not over DSL. Some manual interventions were therefore necessary. Another thing that I hated was the bootloader. By default it tries to install grub. After telling me that it detected two more bootable partitions (windows, linux from scratch) and ensuring that it could easily integrate those into the MBR, I hit 'yes' and upon reboot ended up with error message 18 or so. I knew this was would happen. One reason why I always stick with lilo is that I know it works.</p><p>After a while those initial anoyances were cleared. Everything's now up and running (including exim even though it's exim4 but Debian kindly offered to configure it for me). There's one thing that's still troubling me: xawtv does work but it will lock up the machine after a few minutes. And with lock-up I really mean lock-up. Not even Magic SysRq will work any longer. Things like that, I am sure, will be notoriously hard to fix with no helpful logfile entries whatsoever. But apart from that, my old opinion stands: Debian isn't quite as flashy and hip as other distributions, but it's always the one that reliably works.</p> ethan 2004-10-20T05:11:01+00:00 journal ogg-vorbis stinks <p>I have to say that pretty much everything about ogg-vorbis files and streams stinks as far as Perl modules are concerned. For years, I've been using my own mp3-player backed up by the wonderful <a href="">MPEG::MP3Play</a> module. Unfortunately, the player itself was written in a dreadful way which made extending it impossible.</p><p>Now it happened that I got hold on a few<nobr> <wbr></nobr><code>.ogg</code> files that I would have liked to add to my playlist and, naturally, play them as if they were ordinary mp3s. Due to the inextensibility of my player that wasn't so easy at which point I decided to rewrite it. A song becomes a class and there's now on subclass for each filetype.</p><p>And then disaster set in for me finding a suitable module for decoding and playing back ogg-vorbis files. There's <code>Ogg::Vorbis</code> which only allows to read PCM streams. That would be fine in its own right and I immedately added an Inline::C-section that opened<nobr> <wbr></nobr><code>/dev/dsp</code> and configured it accordingly to play PCM streams with 2 channels, 16 bits and 44100 Hz. That didn't work and I still can't say why. Also, I noticed that <code>Ogg::Vorbis</code> doesn't provide methods for reporting the position of the current bitstream so with this modules it's impossible to have a counter of the playing time for each song.</p><p>Then I found <code>Ogg::Vorbis::Decoder</code> which doesn't suffer from those limitations. Also, it actually comes with proper documentation. Yet, it still didn't work together with my Inline::C code. So I had to try two other modules, bother wrappers around libao. The first one (<a href="">Ao</a>) didn't even compile as it was evidently written for a very old version of libao. The other one (<code>Audio::Ao</code>) did compile and passed its tests. Even better, it worked for me at first.</p><p>Then I looked for OGG-tagging modules. <code>Ogg::Vorbis::Header</code> was the one I tried first. Didn't work although it passed its own tests. I found out that the reason was a strange interaction with <code>Audio::Ao</code>. Both are Inline::C modules and I received many <code>"One or more DATA sections were not processed by Inline"</code>. Apparently it's not trivial to use two such modules at the same time. I had to get rid of one of those two modules and I tried <code>Ogg::Vorbis::Header::PurePerl</code>. Now my player worked again and was capable of displaying the tags. But I couldn't edit them as <code>Ogg::Vorbis::Header::PurePerl</code> gives read-only access only.</p><p>The solution I eventually came up is clunky: I copied the Inline-code of <code>Audio::Ao</code> into my OGG-class and added a <code>Inline-&gt;init</code> right next to it. It's those kind of work-arounds that just confirm my hatred of anything Inline-related.</p><p>Now, compare that to mp3-handling in Perl: I only need two modules: One for the tag (<code>MP3::Info</code> in my case) and one that can play them back (<code>MPEG::MP3Play</code>). The latter gives me a unified event-API that reports timecode, supports pausing playback, adjusting the volume, comes with an equalizer etc. And of course it's written in proper XS so it's guaranteed to work immediately.</p> ethan 2004-10-10T05:34:40+00:00 journal Internship <p>For the next six weeks I'll be interning for Gr&#252;nenthal, a German pharmaceutical company (they have gained some infame in the late 1950s due to the Contergan (Thalidomid) affair which left many newly born babies with crippled limbs). I'll be working in their communication department which is the place to be if one wants to learn quickly how companies of that size function.</p><p>The only drawback is that my work wont involve a lot of computers. Other than that, it's splendid. There are four nice female colleagues and one other chap who's currently on holidays. What I have to get used to, though, is the working hours: It's from 8.30 in the morning to around 6 in the evening, plus two hours of public transport for me. Incidentally, I received quite a few more mails this week than usual concerning my CPAN modules which I now have to leave pending till the weekends.</p> ethan 2004-09-03T04:59:25+00:00 journal Looking for a new occupation <p>Every once in a while I reach a stage where most of my on-going work is done, such as all or at least most of the modules I am working on are released to the CPAN etc. It is then that I am seeking for something new and while doing so I am trying all different kind of things.</p><p>This happens in an unstructured way. A while ago I was talking with a fellow about some stuff, mostly computer games, and we agreed that it would be sort of cool to write ones' own. Since I've never done anything like that, I had a poke around at several available libraries and ended up having a closer look at the SDL. It's less complex than I thought, but I lack quite a bit of background knowledge on some apparently vital stuff, mostly involving image processing and such.</p><p>So I made a mental note to return to this domain later and moved on. The next item on my to-try list was writing a module for the Linux kernel. I had in mind a simple module that allowed to mount audio CDs as ordinary file-systems and thus see the tracks of the CDs as virtual WAV files that spring into existance once I copy them onto a harddisc partition (grabbing by doing a <code>cp</code>). I'm sure that this would be an entertaining thing to program, but during my first attempts I realized that it's exceptionally easy to freeze the whole system. I had a few of those lock-ups and concluded that programming with reboots every three minutes is less fun than programming without reboots.</p><p>So I returned to the SDL. Quite soon I reached a point where things - albeit mostly working - got annoying because they required a lot of code. Those are things that should not, in my estimation, require that much work. Rescanning other libraries revealed that some of them offer a slightly more high-level functionality, for example the <a href="">clanlib</a> which looks quite promising. Only drawback is that this it's a C++ and not a C library.</p><p>Since I've wanted to improve on my C++ skills for a long time anyway, I decided to start all over again, this time doing it in C++, with the prospect in mind, that thusly I can switch to the clanlib a little later. Now I am already regretting that I postponed my C++ experience for such a long time. It's all very nice and smooth. In particular, it's much easier to keep the program relatively tidy thanks to classes and namespaces. In C, most of the time is spent on coming up with a clever naming scheme in order to avoid name clashes and such. I'm glad that this is now something I have to worry about much less.</p> ethan 2004-08-08T08:18:34+00:00 journal Having fun with <p>Hmmh, let's see. A while ago, Randal L. Schwartz gave one-star ratings to all modules by Domizio Demichelis due to a rather questionable practice in all their Makefile.PLs. Now their author seems to have launched a counter-attack, by having two people (Massimiliano Ciancio and Carlos Molina Garcia, two Italian sounding names, go figure) give all of his modules five stars.</p><p>I am now almost tempted to create a couple of alter egos which will downgrade his modules. Might be fun to see new Italian CPAN raters spring into existance after that to hand out five-star ratings to these modules again.</p><p>Really very tempting, if it just wasn't so childish.</p> ethan 2004-07-09T21:13:54+00:00 journal little pigeons' photos <p>On popular demand, you can now see a <a href="">few photos</a> of the two little pigeons. snapshot0003 one is one of the less blurry shots.</p><p>Since they don't yet have all their feathers, they look a little like porcupines.</p> ethan 2004-07-06T09:02:12+00:00 journal $pigeons *= 2 <p>Four days ago the two little pigeons slipped from the two eggs on my ledge. So now I have two basball-sized furry yelloish things in the flower pot. It took them around two days to open their eyes and they've already grown considerably ever since. The first days there was always one of the parent birds sitting on them to keep them warm. Now they are apparently old enough so that they can be left alone for an hour or so in between.</p><p>They are actually quite cute. If you touch one slightly on the head, it apparently mistakes it as an attack from its brother (or sister, who knows) and they start to fight with each other for a few seconds. Another funny thing is their proportions. Their claws and peckers already have full size which gives them a rather comical look.</p> ethan 2004-07-05T09:52:17+00:00 journal Inline::Lua out <p>Finally I am through with 0.01 and have released it to the CPAN. As always with a module due to be released, I found some bugs after writing the test cases (not too many tests yet, need to provide a few more). Yesterday I did some last-minute additions tha turned out to be quite tricky, too.</p><p>I am glad that this is done. Often enough I have modules that in theory are ready for CPAN shipping but then it just doesn't happen because small things turn out to be hard or annoying to be fixed and so I put them into the postpone-queue and later forget about it.</p> ethan 2004-06-24T08:03:28+00:00 journal Working around bugs <p>Doing that is something I absolutely hate, because it can make you look quite stupid. See this code:</p><blockquote><div><p> <tt>sub validate {<br>&nbsp; &nbsp; my $o = shift;<br> &nbsp; <br>&nbsp; &nbsp; while (@_) {<br>&nbsp; &nbsp; my ($key, $val) = splice @_, 0, 2;<br>&nbsp; &nbsp; if ($key eq 'UNDEF') {<br>&nbsp; &nbsp; &nbsp; &nbsp; Inline::Lua-&gt;register_undef(\@$val), next if ref $val eq 'ARRAY';<br>&nbsp; &nbsp; &nbsp; &nbsp; Inline::Lua-&gt;register_undef(\%$val), next if ref $val eq 'HASH';<br>&nbsp; &nbsp; &nbsp; &nbsp; Inline::Lua-&gt;register_undef(\*$val), next if ref $val eq 'GLOB';<br>&nbsp; &nbsp; &nbsp; &nbsp; Inline::Lua-&gt;register_undef(\&amp;$val), next if ref $val eq 'CODE';<br>&nbsp; &nbsp; &nbsp; &nbsp; Inline::Lua-&gt;register_undef($val);<br>&nbsp; &nbsp; }<br>&nbsp; &nbsp; }<br>}</tt></p></div> </blockquote><p>Now, why do I possibly dereference <code>$val</code> just to pass a reference to the dereferenced value? <code>register_undef</code> is an XSUB. What it receives when just passing the raw reference is for some reason an SvPVIV:</p><blockquote><div><p> <tt>SV = PVIV(0x82a5ab8) at 0x81b9134<br>&nbsp; REFCNT = 1<br>&nbsp; FLAGS = (PADBUSY,PADMY,ROK)<br>&nbsp; IV = 0<br>&nbsp; RV = 0x814a75c<br>&nbsp; PV = 0x814a75c ""<br>&nbsp; CUR = 0<br>&nbsp; LEN = 0</tt></p></div> </blockquote><p>Note that the <code>ROK</code> flag is still set.</p><p>It should be, of course, an <code>SvRV</code>. I am not yet sure who to blame. The above <code>validate</code> method is triggered by <code>Inline</code> so I suspect it might do funny things to its arguments. But this cannot be confirmed when looking at its code. Odd.</p><p>On a related note, <code>Inline::Lua</code> is done. 80% of the perldocs are there as well (and 0% of the tests, naturally). My ambitious plans layed out in my last journal entry all turned out to be feasible, even with less effort than I had expected. Perl and Lua can now happily exchange basic types, arrays, hashes, tables, filehandles and functions without getting confused. Tomorrow I'll possibly be able to release it to the CPAN.</p> ethan 2004-06-22T09:13:04+00:00 journal Lua <p>A few days ago I had the bright idea of writing an Inline module, just because I was curious how that would work. I chose a language that came to my mind quite spontaneously, partly because I remembered vaguely that it was a language meant to be embedded into other applications.</p><p>What I've seen so far of Lua is extremely impressive. The language is extremely clean and, despite offering only a handful of features and concept, very powerful. Some interesting things have been integrated into it quite well, such as coroutines and closures. The latter makes it feel a bit like a functional programming language with the very nice touch of an imperative syntax. It's even object-oriented.</p><p>Its C API is a bit confusing for me as of now. That is probably because I haven't yet written a single program in this language. But the Inline stuff already works quite well for some of the basic Lua/Perl types. The nice thing about Lua is that its types map quite well onto Perl. It knows about functions as a data type so a little bit of currying looks like this:</p><blockquote><div><p> <tt>function foo (a)<br>&nbsp; &nbsp; return function (b) return a * b end<br>end<br> &nbsp; <br>io.write( foo(5)(3) )</tt></p></div> </blockquote><p>Very neat! I have already some ideas how the inlined Lua functions can return Lua closures back to Perl as in</p><blockquote><div><p> <tt>use Inline Lua;<br> &nbsp; <br>print foo(5)-&gt;(3);<br>__END__<br>__Lua__<br>function foo (a)<br>&nbsp; &nbsp; return function (b) return a * b end<br>end</tt></p></div> </blockquote> ethan 2004-06-15T07:10:37+00:00 journal Counting the minutes <p>Just 63 minutes till the kick-off of the opening match of the European football championships. Opener will be Portugal versus Greece. After that, Spain faces Russia which I am looking forward to even more. Tomorrow there'll be France playing against England which is yet better. Oh, yes, and Tuesday we (Germany) will be facing the Netherlands which will be interesting as I happen to be in Aachen which is in walking distance to the Dutch border. I think a humiliating 3-0 for Germany would be in order although I am afraid that the other way round is more likely.</p> ethan 2004-06-12T14:56:07+00:00 journal pigeons on my ledge <p>Living in the very centre of a city, I am quite accustomed to pigeons being all over the place and usually making annoying sounds etc. Lately I noticed that I always had the same couple of pigeons on my ledge before my windows. There is a flower tub (without any flowers; they shrivelled long ago) on this ledge and they seemed to be very preoccupied with this tub.</p><p>After a while I got curious and had a look into the tub to see what was so special about it. To my surprise I found one pigeon egg in it! So apparently they have chosen my ledge and tub as their breeding ground.</p><p>Pigeons itself aren't very interesting animals you may say, and I would agree mostly. They make the utmost annoying sounds and have an obnoxious way of walking (with their head moving forward on each step like a woodpecker). However, having two pigeons breed just four meters next to where I am sitting right now is quite interesting.</p><p>Of course, in the beginning I tried all the funny experiments that spring to mind immediately, like putting an additional chicken egg into the tub and see how they would react. Apparently it's not so easy to irritate them as they continued breeding without much fuss.</p><p>I looked up some things about pigeons in the dictionary and found out that they are monogamous. According to the entry, both the female and the male pigeon engage in the breeding process although I think I've only seen the female one in the past two days. It also says that there are always two eggs in the nest. As there was only one in the tub I was rather sceptical as to how successful their breeding would eventually be.</p><p>But yesterday I had a look again and now there are indeed two eggs. From now on it should take around two weeks for their offspring to slip from the eggs.</p> ethan 2004-06-12T05:35:43+00:00 journal Testing vim 6.3 <p>Just installed the latest vim and have to see that all still works. For a reason beyond me, 6.3 no longer looks at<nobr> <wbr></nobr><code>/etc/vimrc</code>, or maybe my old 6.1 only did it because it was configured to do so.</p> ethan 2004-06-09T05:11:53+00:00 journal Dealing with bounces <p>When looking at the annoyance factor of unwanted mail, bounce messages (caused by some insane worms randomly sending mails with arbitrary from-addresses) seem to have overhauled ordinary spam. The problem with those is that they pass my spam filters and now I have to take steps.</p><p>I figure that it should be possible to get an almost flawless detection of those bounces with a specially tailored bayesian filter. Note that I don't want to use the existing bayes filter (as part of SpamAssassin for example). I would first have to train them and also, I suspect that real spam and bounces don't have much in common when looking at the used words.</p><p>So what I have started doing now is writing a bayesian filter for bounces. First thing I wrote was a flex-scanner that detects valid RFC822 mail addresses. The scanner gets fed one message. It opens a pipe to another process (the one that does the actual filtering) and writes the mail to this process. The only thing the scanner does is replacing every email-address it can find in the body with <code>T_MAILADDR</code> or somesuch. When reading RFC822 correctly, the below should be the rules for a valid email-address:</p><blockquote><div><p> <tt>&nbsp; &nbsp; atom&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; [!#$%&amp;'-/0-9A-Za-z_`{}|~^]*<br>&nbsp; &nbsp; dtext&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;[\x00-\x0C\x0E-\x5A\x5E-\x7F]*<br>&nbsp; &nbsp; qtext&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;[\x00-\x0C\x0E-\x21\x23-\x5B\x5D-\x7F]*<br>&nbsp; &nbsp; quoted_pair&nbsp; &nbsp; &nbsp;"\\"[\x00-\x7F]<br>&nbsp; &nbsp; quoted_string&nbsp; &nbsp;"\""({qtext}|{quoted_pair})*"\""<br>&nbsp; &nbsp; word&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; {atom}|{quoted_string}<br> &nbsp; <br>&nbsp; &nbsp; domain_literal&nbsp; "["({dtext}|{quoted_pair})*"]"<br>&nbsp; &nbsp; domain_ref&nbsp; &nbsp; &nbsp; {atom}<br>&nbsp; &nbsp; sub_domain&nbsp; &nbsp; &nbsp; {domain_ref}|{domain_literal}<br>&nbsp; &nbsp; domain&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; {sub_domain}("."{sub_domain})*<br>&nbsp; &nbsp; local_part&nbsp; &nbsp; &nbsp; {word}("."{word})*<br> &nbsp; <br>&nbsp; &nbsp; addr_spec&nbsp; &nbsp; &nbsp; &nbsp;{local_part}"@"{domain}</tt></p></div> </blockquote><p>This should be a huge advantage for a bayesian filter since now not every single email-address is a word for its own but rather they get mapped onto one word.</p><p>The idea behind that is of course, that bounce messages tend to have a lot of email addresses in their body. Some of them even include whole header fields, so I could extend the scanner to detect those and generate another token for them.</p><p>For now I'll prototype the program that the scanner opens a pipe to in Perl and see whether the approach makes any sense at all. If it does, I can rewrite it in C and have a fairly well-performing bayesian filter that I can plug into my<nobr> <wbr></nobr><code>.procmailrc</code> before spamassassin is even triggered.</p> ethan 2004-05-29T07:07:40+00:00 journal How to test this thing? <p>I just finished wrapping <a href="">libstatgrab</a> into <code>Unix::Statgrab</code>. It would now be time to add the tests (yes, I don't write them in beforehand). But how am I to write tests for a library that is designed to return different results for each platform and each machine even?</p><p>Maybe I just have the tests call each function/method and make sure that they at least do not segfault. I think pulling out values from <code></code> and testing some of them against what the libary figures out is a bit too hairy.</p><p>On the upside though, this C library is deliberately portable among several unices so I wont have to worry about compilation issues, I hope.</p> ethan 2004-05-24T05:21:42+00:00 journal