Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • Unless your definition of duplicate files includes files with the same name but possibly different contents, fdupes (available in Ubuntu, for instance) could be what you need. It uses md5sum to identify duplicate files, so it'll even work for identical files with different names.

    A homegrown version can be found here (http://www.perlmonks.org/?node_id=703798), it uses File::Find::Duplicate.

  • There are lots of hash based solutions that can do this, some of them are easy written in Perl...

    http://en.wikipedia.org/wiki/Fdupes [wikipedia.org], also lists alternatives (including one of mine!).

    --
    -- "It's not magic, it's work..."
  • (ls aggtests/pips/api/v1/xml/*.t; ls aggtests/pips/api/builder/*.t) | sort | uniq -d

    • "ls aggtests/pips/api/v1/xml/*.t"

      "ls... glob" FAIL. Please don't do that.

      --
      • Randal L. Schwartz
      • Stonehenge
      • I just realized that message is probably insufficient. Here's the "dangerous use of ls" message, spelled out a bit better: http://groups.google.com/group/comp.unix.shell/msg/5d19dadaf9329f87 [google.com]
        --
        • Randal L. Schwartz
        • Stonehenge
        • OK, so had I thought a bit more I might have written

          echo .../*.t .../*.t | sort | uniq -d

          The other point, that filenames can contain special characters, I was aware of, but I tend to assume that 'my' files won't (unless I know that they do). If I were working on some arbitrary set of files I would have done the job in Perl (I was going to say find -print0 and {sort,uniq} -z would work, but apparently (my) uniq doesn't have a -z option. Weird.). Thanks for the correction, though, since it's important to be

          • echo .../*.t .../*.t | sort | uniq -d

            That won’t do what you wanted because echo will output the whole shebang on a single line. What you want instead is

            printf '%s\n' .../*.t .../*.t | sort | uniq -d

            But then that still won’t do what you wanted, because you aren’t chopping the base path off the file names, so no two lines will have the same content anyway. You need to something like this:

            printf '%s\n' .../*.t .../*.t | cut -d/ -f2- | sort | uniq -d

            Of course, as mentioned, that doesn

  • ...do a find in each directory and then diff the results. cd firstdir && find . -type f > ~/first cd seconddir && find . -type f > ~/second cd ~ && diff first second You can filter the output of the diff of course :)