Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

Ovid (2709)

Ovid
  (email not shown publicly)
http://publius-ovidius.livejournal.com/
AOL IM: ovidperl (Add Buddy, Send Message)

Stuff with the Perl Foundation. A couple of patches in the Perl core. A few CPAN modules. That about sums it up.

Journal of Ovid (2709)

Tuesday October 06, 2009
03:52 AM

Finding Duplicate Files

[ #39720 ]

I'm trying to find files in one directory which are in another directory. The following works, but I assume there's an easier way?

for file in `ls aggtests/pips/api/v1/xml/*.t | cut -d/ -f6`; do find aggtests/pips/api/builder/ -name $file; done

Update: Rewrote to fix a tiny grep bug.

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • Unless your definition of duplicate files includes files with the same name but possibly different contents, fdupes (available in Ubuntu, for instance) could be what you need. It uses md5sum to identify duplicate files, so it'll even work for identical files with different names.

    A homegrown version can be found here (http://www.perlmonks.org/?node_id=703798), it uses File::Find::Duplicate.

  • There are lots of hash based solutions that can do this, some of them are easy written in Perl...

    http://en.wikipedia.org/wiki/Fdupes [wikipedia.org], also lists alternatives (including one of mine!).

    --
    -- "It's not magic, it's work..."
  • (ls aggtests/pips/api/v1/xml/*.t; ls aggtests/pips/api/builder/*.t) | sort | uniq -d

    • "ls aggtests/pips/api/v1/xml/*.t"

      "ls... glob" FAIL. Please don't do that.

      --
      • Randal L. Schwartz
      • Stonehenge
      • I just realized that message is probably insufficient. Here's the "dangerous use of ls" message, spelled out a bit better: http://groups.google.com/group/comp.unix.shell/msg/5d19dadaf9329f87 [google.com]
        --
        • Randal L. Schwartz
        • Stonehenge
        • OK, so had I thought a bit more I might have written

          echo .../*.t .../*.t | sort | uniq -d

          The other point, that filenames can contain special characters, I was aware of, but I tend to assume that 'my' files won't (unless I know that they do). If I were working on some arbitrary set of files I would have done the job in Perl (I was going to say find -print0 and {sort,uniq} -z would work, but apparently (my) uniq doesn't have a -z option. Weird.). Thanks for the correction, though, since it's important to be

          • echo .../*.t .../*.t | sort | uniq -d

            That won’t do what you wanted because echo will output the whole shebang on a single line. What you want instead is

            printf '%s\n' .../*.t .../*.t | sort | uniq -d

            But then that still won’t do what you wanted, because you aren’t chopping the base path off the file names, so no two lines will have the same content anyway. You need to something like this:

            printf '%s\n' .../*.t .../*.t | cut -d/ -f2- | sort | uniq -d

            Of course, as mentioned, that doesn

  • ...do a find in each directory and then diff the results. cd firstdir && find . -type f > ~/first cd seconddir && find . -type f > ~/second cd ~ && diff first second You can filter the output of the diff of course :)