NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.
All the Perl that's Practical to Extract and Report
Stories, comments, journals, and other submissions on use Perl; are Copyright 1998-2006, their respective owners.
Re: (Score:1)
Thoughts on your code:
File::Comparealready does the size comparison thing for you so there's no need for you to collect filesizes.You never check for symbolic links and you are missing the opportunity to compare inodes. If two filenames are links (hard or symbolic) to the same file, there's no need to compare the file to itself.
Another reason checking for symlinks is important is that if your code encounters a symlink to
..while recursing, it will just sit there twiddling thumbs. It's probably easier to leave recursion toFile::Find.Invoking
compare()with the same file as source over and over is a horrible waste of work. You should hash your files first (Digest::SHA1orDigest::MD5are handy here), and then compare their hash values. That way you can compare a file to ten other files with hardly any work — could speed things up by orders of magnitude. (If you are paranoid, you can hash the files twice with different algorithms, and then the probability that different files will hash to identical values goes from “very close to zero” to “vanishingly close to zero for this universe”.)Reply to This
Re: (Score:1)
I had no links, so that was not a problem. I managed to downsize 3.7G of a Portuguese TV show down to 2.8G with this... awesome
I'll take a look on dupmerge, as you say (it might end up on my ~/bin)
Thanks