Slash Boxes
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

Saturday April 24, 2004
12:18 PM

Remove duplicate files

[ #18467 ]

There is some bug in the process of transferring via Bluetooth multiple files from my phone to my PowerBook. As I transfer them individually, I end up transferring some twice when I forget to advance to the next image or video or message.

This would not be so bad if the phone did not use the same file name for any file again (like my digital camera does). Once I clear out the photos from my phone, the next photo get the name Image(01).jpg.

Now I have a problem. I have a folder full of photos. Some names are duplicates (with things like #1, #2 appended to them), and some files are duplicates (but with different names). To make this even worse, I renamed some of the good photos to have descriptive names.

I decided to fix that up today. I simply took the MD5 digest for each file, sorted the files with the same digest, and kept the one with the longest name (arbitrarily---I still have to look at them anyway).

use Digest::MD5 qw(md5_hex);
use UNIVERSAL qw(isa);
my @files = glob( "*" );
foreach my $file ( @files )
    next if -d $file;
    my $data = do { local $/; open my($fh), $file; <$fh> };
    my $digest = md5_hex($data);
    if( exists $hash{$digest} and isa( $hash{$digest}, 'ARRAY' )  )
        push @{ $hash{$digest} }, $file;
        $hash{$digest} = [ $file ];
foreach my $key ( sort keys %hash )
    my @files = sort { length $a <=> length $b } @{$hash{$key}};
    next if @files == 1;
    pop @files;
#don't run this without thinking about the next line
    #unlink @files;

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
More | Login | Reply
Loading... please wait.