Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

Ovid (2709)

Ovid
  (email not shown publicly)
http://publius-ovidius.livejournal.com/
AOL IM: ovidperl (Add Buddy, Send Message)

Stuff with the Perl Foundation. A couple of patches in the Perl core. A few CPAN modules. That about sums it up.

Journal of Ovid (2709)

Monday September 10, 2007
03:38 AM

Finding Unused Files

[ #34402 ]

In part of a code cleanup, we are going to eliminate unused files in a standard Perl/CGI app which has been around for 7 to 10 years. With thousands of files, my first thought was something like this hack:

#!/usr/bin/perl

use strict;
use warnings;

my @files =  map  { s{^\./}{}; $_ }
    `find . -type f | grep -v CVS`;

chomp(@files);

my $count = @files;
my $curr  = 1;

foreach my $file (@files) {
    print "Processing $file.  $curr out of $count.\n";
    $curr++;
    my $no_ext = $file;
    $no_ext =~ s{^.*?([^./]+)(?:\.\w+)?$}{$1};

    # Find all files | Exclude this file from list | find files which
    # reference it
    my $command = "find . -type f |grep -Ev '$file|text_r5_c2'|xargs grep -l '$no_ext'";
    unless ( `$command` ) {
        warn $file,$/;
    }
}

It's pretty ugly and *nix specific, but the basic idea is this:

  1. Find all files
  2. Foreach file, find all files not matching that file name
  3. For remaining files, if no file match the bare filename (without path or extension), then we have an orphan file

It seemed reasonable, but ignoring the fact that it's very slow, the obvious problem kicked in: if you have an entire section of code no longer being used, it can be a self-referential section and therefore is unlikely to show up on this list. This app doesn't have a robust enough test suite to figure this out. Time for another strategy.

A coworker suggested grepping the access logs. Now I feel really stupid since this is so obvious. If a file shows up in there, we know we probably want to keep it. If it doesn't, it merits further investigation.

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • If it's a linux box, there's probably a chance you can put the atime values to use for once.
    • Thought about that, but there are problems there. First, they completely fail if anyone's been just perusing the code in vim, yes? (I've been doing a lot of that learning the code base). Also, robots such as Googlebot might access old files which haven't been linked to in a while. Of course, I'm not much of an administrator, so if I'm wrong, let me know!