Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

Thursday November 20, 2003
01:20 PM

How readable os your program?

[ #15906 ]

I read a book on a bunch a kids that beat up Vegas at blackjack with an elaborate card-counting system involving several people and lots of statistical tables. Somehow, out of that, I wondered how dense Perl programs look---that is, when we view them in an editor, just as characters.

So, I started to write a tiny program to compare the amount of visual whitespace (e.g. tabs count more than spaces) to the number of characters. That is pretty useless though, except as a rough measure of overall density. Programs are hard to read because they have islands of high density, so the overall approach doesn't work.

Next, I wrote what I call my "minesweeper" program. I would show it here but the program is on my laptop and I am using a restricted community computer. Basically, I go through a text file and look at all of the positions around a position. Each character can have eight characters around it. For each non-whitespace character around a position, I add 1 to that position. Whitespace and edges (including the parts of lines longer than the ones around it) get 0. The output from that is not very illuminating because it is denser than the programmer because it is just a big matrix. Mathematically it works, but visually it is worse.

So, from there, I created a density plot using GD::Graph. I create a canvas with the same number of rows and columns as the script, then color the pixels. Positions with higher densities show up darker. Positions with zero density show up white. Right now it is grayscale, but I would like to use colors at some point.

On one run, I ran the program using its source as input. The results were surprising. The islands of high density are where I would expect them (where all the typing is, silly), but their contour really shows where I am putting a lot of characters close together, which, I contend, makes the program harder to read there, just like it is harder to read porportional fonts (at least I think so).

Some programs are long (Shocked! Shocked I say!), so I break up the program into several images. Putting a bunch of small images on a single peice of paper can represent the entire program quickly.
Next, I want to make several images of the same script from different versions, then create a movie out of it---let's see how the density changes as we code. It probably varies from coder to coder, but I think for my stuff I would see a lot of random stuff, then points of gravity pulling code towards it, then a big bang where bits of code travel long distances as they get relegated to subroutines at the end of the script---maybe a text version of the Oregon Trail.

For some people, the gravity centers will keep attracting more and more characters, so I am also thinking about adding long distance effects. A character two positions away counts partially, although I have not decided if it should be a second or third power effect. If I really want to waste a lot of time, I can figure out how to calculate a programs Big-G gravitational constant, or shoehorn special relativity (some piece of code must bend the code around it somehow).

Some people may have density islands that seem to pulse as they add or subtract code. Who knows?

I still have a lot of small technical problems to decide. Do I keep the POD in or out? Or do I color it differently?

Oh well. Six minutes left on this computer.

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • A grayscale "minesweeper" bitmap of code sounds like a nifty visualization, and a lot more straightforward than trying to extract history from CVS to "age" code, painting lines in different colors depending on how recently they've been touched.
    • Oh no, i'm not trying to age code, just show the migration of clumps of characters. I am curious how different the clumps look from start to finish. I might even be able to identity distinct coding styles.
      • I'm trying to age code, but progress is slow. Extracting the right info from CVS and collating it is messy.
        • Have you looked at the things like viewcvs and cvsweb do? That might help you identify chunks.
          • Getting coarse-grained chunks is relatively easy. Doing finer grain, say by using Algorithm::Diff within chunks, is messier. I started this thinking I could age each character. That's proven to be very difficult.
            • what we need is radioactive labeling.

              i was thinking last night that a journaling editor would make this easier because you could see the file keystroke to keystroke.
              • Even with radioactive labeling (or the equivalent), there are some interesting edge cases. How do you score (or relabel) XY becoming YX, especially when X and Y are substantial blocks of code?
        • After I read this I had a thought that whilst it may be interesting to display the age of code visually, it may be more worthwhile to consider displaying the variability of the code in a visual form - That is, plotting the number of revisions and changes made to code against the code age. My rationale for this is that while working with code age would provide a incidencal overview of code stability, variability assessed by revisions and changes, potentially as a secondary measure to age, could provide a mo
  • The results were surprising. The islands of high density are where I would expect them (where all the typing is, silly), but their contour really shows where I am putting a lot of characters close together, which, I contend, makes the program harder to read there, just like it is harder to read porportional fonts (at least I think so).

    This sounds very similar to some image analysis which I employed for a research thesis.

    The topic of my research was investigating differences in the vasculature of benign t

    • Very interesting---I hadn't thought to look at things like distances and areas.

      Now it's looking like a GIS problem. I bet they have all sorts of nifty software to analyse this sort of stuff.

      I should be able to post some stuff next week.
  • Looking forward to the source :-)

    I'm interested in quick ways to identify problem areas of code, often being tasked with having to pile through tons of "legacy" (spelt sh*t) code.

    Have you come across Ward Cunningham's Signature Survey [c2.com] method. Competely different method, but useful.