Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • Thoughts on your code:

    File::Compare already does the size comparison thing for you so there's no need for you to collect filesizes.

    You never check for symbolic links and you are missing the opportunity to compare inodes. If two filenames are links (hard or symbolic) to the same file, there's no need to compare the file to itself.

    Another reason checking for symlinks is important is that if your code encounters a symlink to .. while recursing, it will just sit there twiddling thumbs. It's probably easier to leave recursion to File::Find.

    Invoking compare() with the same file as source over and over is a horrible waste of work. You should hash your files first (Digest::SHA1 or Digest::MD5 are handy here), and then compare their hash values. That way you can compare a file to ten other files with hardly any work — could speed things up by orders of magnitude. (If you are paranoid, you can hash the files twice with different algorithms, and then the probability that different files will hash to identical values goes from “very close to zero” to “vanishingly close to zero for this universe”.)

    • Everything you say makes sense :-)

      I had no links, so that was not a problem. I managed to downsize 3.7G of a Portuguese TV show down to 2.8G with this... awesome :-)

      I'll take a look on dupmerge, as you say (it might end up on my ~/bin)

      Thanks :-)