Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

Simon (89)

Simon
  simon-use-perl@perlhacker.org
AOL IM: lathosjp (Add Buddy, Send Message)

Busy Man.

Journal of Simon (89)

Monday June 04, 2001
04:55 AM

Someone corrupted the malloc arena and all I got was this lousy segfault

[ #247 ]
Part of the advantage of keeping all my work in version control is that I can very easily tell what I've been working on; it's like a surrogate memory. So, by the power of Perforce, I can tell you that over this weekend, I've been working a lot on the bytecode compiler and, in parallel, upgrading B::Generate to provide what it needs: this generally means implementing a lot of the op-manipulation functions from op.c in Perl.

This weekend saw Perl implementations of the op list utilities append_elem and prepend_elem, convert, (which required a patch to Perl which may not get applied to the main tree; argh) a constructor for conditional operators, (implements if-then-else in bytecode) a small hack too get assignment working, and a way of creating subroutines. I also implemented binary and logical operators in B::AST.

Progress from the compiler side of things was spotty; at one point, I had it compiling subroutines and being able to run them from Perl. The next, I had it segfaulting left right and center. (That was traced to an error transcribing prepend_elem, which incidentally is some of the more twisted code in the world ever, from C into Perl.) At the moment, it's still giving strange segfaults, and I don't understand why, although if I remember rightly, I don't think they're as strange as the ones I had before.

Perhaps I should release the changes to B::Gen, because it's had a bit of a road test now.

Speaking of surrogate memory, one thing I've been thinking about recently is how to extract useful information from the 120M of mail I've got stored here. I've already glimpse indexed it, which is a help, but it strikes me that there are only a few things I use it for: stored URLs, phone numbers, addresses and so on, finding out where I've heard of someone from, and useful mails about, for instance, technical topics. It's probably impossible to automatically extract all 'useful' data from an email, but the second thing - finding out from what context I know someone - is easily performed. I came up with the idea of mail maps. The idea is simple: the mailbox is represented as a graph. If X mails Y, then X and Y are nodes on the graph with an edge between them. A Cc: gets a dotted line. The lines are coloured according to frequency. Do this for all your mailboxes, and you can see everyone who's emailed you, where they're coming from and who else they've emailed. (Hrm, maybe if it was coloured by mailbox, it would be easier to see who's where.) Neat, eh? As for implementation, Perl makes it easy. (of course)

Start with Mail::Utils for the read_mbox function; then Mail::Internet to parse the headers, and Graphviz to connect the nodes and produce the graph. All told, it took 70 lines of Perl, most of which was to sensibly map names to addresses. (Mail::Address including caching of mapped names to addresses so that if a naked address comes along, it can find a name for it.)