Slash Boxes
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

inkdroid (3294)

  (email not shown publicly)
AOL IM: inkdroid (Add Buddy, Send Message)
Yahoo! ID: summe_e (Add User, Send Message)
Jabber: inkdroid

inkdroid is a person, not a robot. however, inkdroid likes ink. inkdroid likes perl too.

Journal of inkdroid (3294)

Thursday June 19, 2003
11:18 AM

my $day = YAPC::NA->new( day => 3 );

[ #12954 ]
  • My last day in Boca started out listening to Piers Cawley talk about refactoring. Refactoring is the process of improving the design of some code, without changing its functionality. Refactoring also involves the art of looking for repetition in your code, and eliminating it. Piers spoke a bit about refactoring in general, and then set out to refactor some of his own code (I think it was Class::Builder). He used the audience as his colleague pair programmer, and people were more than happy to point out missing parens, or quotes. Piers used Test::Class to write his tests. Apparently Test::Class is very much like the JUnit framework (which my colleague Mike O'Regan really likes alot). I don't know much about JUnit, so I was interested to learn that the original was SUnit written by Kent Beck for Smalltalk. I need to check out Test::Class now. It seemed to me that Piers was talking more about testing than refactoring, but perhaps the two are so intertwined it's impossible to talk about one without the other.
  • Next I headed off to hear Peter Chines talk about exceptions. Unfortantely I missed a significant chunk of the beginning of the talk. Peter went over the various standard functions for throwing exceptions and warnings in Perl (croak, carp, confess) and provided guidance on why they were important to use. He also showed how signals could be used to automatically add information to all die messages (a handy trick) and talked about CGI::Carp which automatically logs the name of the generating program, and a timestamp to STDERR. As 2shortplanks pointed out, it would've been nice to see some examples of throwing objects as exceptions. And someone piped up at the end about Damian's Coy module which translates Perl's regular messages into soothing haikus.
  • Directly following Peter was Mark Fowler talking about extending Template::Toolkit. Mark motored through an amazing amount of material about TT. The essentials that I took away is that it's remarkably easy to extend TT by subclassing Template::Plugin. For some reason I find the TT framework much easier to grok than Mason. Perhaps it is because the writing of the templates seems like a new language, totally independent of Perl. Perhaps the comparison doesn't hold since they really are quite different in some ways: Mason is an Apache application development environment, and TT is a more generalized templating system. I'm really looking forward to seeing the new ORA book on TT.
  • After lunch I headed over to hear Ken Williams talk about Machine Learning. This was a huge topic, and Ken really could've had a whole day to talk about this fascinating stuff. Ken summarized ML as any system that improves (or changes) as it receives training examples. Clustering, categorization, recognition, and filtering are all examples of ML systems. It becomes feasible to use ML when you have too much data to sift through, people are too slow at doing the sifting, and when you can afford to be wrong ocassionally. Perl is a useful ML language since you've got CPAN at your fingertips, it is a quick prototyping language (and you may not throw away the prototype :), and if you need it you can drop down to C via XS for speed. As an example Ken described how the use of decision trees could improve Spam Assassin. SA has over 600 attributes that it uses to identify a spam message. The attributes all have (+/-) weights associated with them, which when summed together will mark an email as spam if it exceeds a certain threshold. The problem with 600 attributes is that they all need to have rules associated with them, and these rules must individually be applied to come up with the final sum...which requires a fair amount of time and processing. Ken wrote a program that uses the 600 SA attributes, and a collection of spam/ham available from SA, to generate a decision tree to identify spam. The benefit of a decision tree being that it doesn't need to process each rule, but only a subset of rules that are tuned to each email as it comes in. Ken used the AI::DecisionTree module to do the hard stuff, and plugged the tree into GraphViz for displaying the nodes. Pressed for time Ken quickly went through an example of collaborative filtering, which most people experience when they go on Amazon and are presented with a list of recommendations. Ken wrote another program which used Search::ContextGraph to analyze the CPAN non-core modules that 30 people had installed. This allowed him to say if you have LWP::UserAgent installed, you are likely to also like URI :-) which makes sense. It would've been cool to 1/2 a day or even a full day of this stuff...I felt like we were just scratching the surface.
  • I managed to stay awake with the help of some sugar-rich cookies and hear Michael Rodriguez talk about XML Modules. He provided a really nice summary of a large chunk of the XML modules that are available on CPAN. They all descend from two C ancestors: expat, and libxml2. They are genearlly divided into two camps, event based parsers (SAX) which process the XML as it comes in and throw events (callbacks); and DOM based parsers which read the entire file into an in memory data structure and then allow you to query it. One of the things that Michael said which really stuck with me is that XML should really only be used as an interchange format, and not as a live data format. What he meant was that XML is really good for exchanging data with other people, but once you've got it you should parse it and get the data into your relational database ASAP. Another thing that I liked was his joke that XML was really the revenge of Java programmers who need angly brackets around everything so that they could parse text easily. Of course Java now has Perl-like regular expressions, so things have probably improved (or have they?).
  • Unfortunately I wasn't able to make it to Damian's closing talk because I had an early flight. I took the bus from FAU to the tri-rail, got off at the wrong station, had to take a bus into Ft. Lauderdale, and then another bus to the airport. It was good to get to see the city at any rate. It was cool to hear what I thought was French on the buses, and everyone seemed so relaxed and friendly...perhaps it's the fine weather. I finally got to the airport and my flight was delayed a couple hours, so I probably could've heard Damian speak afterall! Overall it was a great/informative time. Thanks Perl Foundation, and Follett for sending me this year.
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
More | Login | Reply
Loading... please wait.
  • Ed, thanks again for the ongoing notes. I can't go to EITHER con this year (product testing, got to go to a conference already, yadda). Having eyes on the ground was very cool. Maybe you could do a 30 minute pre-talk at
  • Does anybody know of any online code examples/tutorials of Ken Williams' AI modules in use.

    I find the documentation somewhat hard to understand if you are not totally in the topic, so it'd be nice to see the modules put to work in a real live situation.

  • If I'd been cunning and prepared properly, we'd've started the session with a failing test in order to help concentrate on the process of refactoring. And the talk would have been longer.