Here's one of the tasks I came across with: a bunch of files in a directory (almost 200), of which I know some of them are the same; I just don't know which ones.
Here's samefile, the script I created to find those files:
#!/usr/bin/perl
use strict;
use warnings;
use File::Compare;
use Getopt::Std;
our %opts = get_options();
show_help() if $opts{h};
show_version() if $opts{V};
get_sizes();
find_copies();
# subroutines
sub find_copies {
our %sizes;
for (values %sizes) {
my @files = @{$_};
while (my $f = shift @files) {
@files || next;
my @copies = grep {! compare($f, $_)} @files or next;
print "\"$f\"", (map {" \"$_\""} @copies), "\n";
for my $s (@copies) {@files = grep {$_ ne $s} @files}
}
}
}
sub get_sizes {
our %sizes;
for (@ARGV) {
if (-f) {
push @{$sizes{(stat)[7]}}, $_;
}
elsif ($opts{r} && -d) {
push @ARGV, <"$_/*">;
}
}
}
sub get_options {
my %opts;
getopts('rhV', \%opts );
for my $key ( keys %opts ) {
$opts{$key} = 1 unless defined $opts{$key}
}
%opts;
}
sub show_help {
die "Usage: samefile file1 file2
or: samefile -r *
samefile: identifies equal files
Options:
-h displays this messages and exit
-r recursive mode
-v show version and exit
"
}
sub show_version {
die "samefile version 0.01\n";
}
It currently prints something like this:
$ samefile *
"file1" "file3" "file6"
"file4" "file5"
Meaning that file1, file3 and file6 are all alike and likewise for file4 and file5.
Comments on the output or anything else are welcome...
Re: (Score:1)
Re: (Score:1)
Thoughts on your code:
File::Comparealready does the size comparison thing for you so there's no need for you to collect filesizes.You never check for symbolic links and you are missing the opportunity to compare inodes. If two filenames are links (hard or symbolic) to the same file, there's no need to compare the file to itself.
Another reason checking for symlinks is important is that if your code encounters a symlink to
..while recursing, it will just sit there twiddling thumbs. It's probably easiRe: (Score:1)
I had no links, so that was not a problem. I managed to downsize 3.7G of a Portuguese TV show down to 2.8G with this... awesome
I'll take a look on dupmerge, as you say (it might end up on my ~/bin)
Thanks