Slash Boxes
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

chromatic (983)

  (email not shown publicly)

Blog Information [] Profile for chr0matic []

Journal of chromatic (983)

Thursday February 07, 2002
11:59 PM

Scary Perl Refactoring

[ #2715 ]

I first formally encountered the idea of refactoring while studying XP. I'd been doing it already (especially when giving advice to new programmers), but didn't have a vocabulary for it. As my study progressed, I learned that Smalltalk (among other languages) has a Refactoring Browser -- since refactoring can be considered mechanical transformations, why wouldn't a machine be able to do them?

At Schwern's Refactoring talk at TPC 5.0, he demonstrated the beginnings of a Perl parser that warned about dubious constructs. Parsing Perl with anything but perl obviously isn't a simple task -- Damian hasn't released Parse::Perl, though I'm impressed with perltidy.

Looking at Smalltalk in more detail made me think that operating on source code is the hard way. Working on bytecode would be much easier -- except for associating lines of code with opcodes. (I worked up a source filter that would insert a target with comments and code on appropriate occasions. This data can then be extracted as necessary.) Talking to Ned Konz and Simon Cozens, they both thought that bytecode was the right track.

Another piece of the puzzle came in writing an article about the Linux Kernel Janitors. The idea behind the Stanford Checker really stuck out. If you can demonstrate an error pattern, the compiler can look for it in the code. Obviously, I can take a bit of bad Perl, compile it to bytecode, and have a tree that marks a bad pattern. As Simon pointed out, though, searching a tree for a tree is a difficult problem.

I didn't entirely agree. Though it's usually stupid to disagree with someone that smart, sometimes it can lead to a good idea. For some reason, I thought it was doable. Walking near a koi pond one afternoon, it hit me.

The XML guys have, more or less, solved this.

Okay, the LISP guys may be able make a better case, but the important thing is that it's solvable. I thought about sending Matt Sergeant an e-mail, asking him how to take some of the rules of XPath and apply them to a non-XML tree. For a few months, the whole idea was on the back burner.

Nearly all of the pieces are in place already. There's a bytecode decompiler in B::Terse (and I knew a bit about it, having written tests for it). There's a bytecode generator in B::Generate (thanks to Simon). There's a bytecode to Perl converter in B::Deparse (thanks to a lot of people, especially Rafael Garcia-Suarez, lately). I just needed to find some way to apply something like XSLT to Perl bytecode.

This morning, it hit me. Maybe it was the Perl XML fans talking about SAX being important for more than XML, but I realized that if I could write a backend module to turn bytecode into XML, the tree matching and conversions would be solved. The only tricky part that's left is generating XSLT or XPathScript or whatever syntax to refactor an error pattern. The same XML guys who provided the final nudge can probably help out in that respect.

So now I have B::ToXML that can XMLize a code reference, and it works pretty well. If I knew the internals better, I'd be able to tell what kind of information is important. I don't yet have any way to say "go from this to this, keeping this but changing this", but I'm one step closer.

I could still be on the wrong track, but I really think I'm on to something here. Drop me a line if you have a strong opinion either way.

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
More | Login | Reply
Loading... please wait.
  • The problem I had with working at a bytecode level when I was writing my refactoring engine (that I must get back to working on) was that you lose comments.

    Now, I'm generally of the opinion that if your code needs comments then you have a problem right there, but some people like them...

    The approach I took (and will probably continue to take) is a bastard hybrid that uses bytecode where necessary, but which still goes and mucks about with the source as strings and is generally quite scary.

    But I do hav