Stories
Slash Boxes
Comments

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

Mark Leighton Fisher (4252)

Mark Leighton Fisher
  (email not shown publicly)
http://mark-fisher.home.mindspring.com/

I am a Systems Engineer at Regenstrief Institute [regenstrief.org]. I also own Fisher's Creek Consulting [fisherscreek.com].
Friday February 17, 2006
01:19 PM

setdiff - Difference of Two Sets of Filenames

[ #28717 ]

Try to use "some filenames except those that match this pattern" is a multi-step process in common shells. You have to generate the two lists of filenames, put those lists into separate files, then perform a "fgrep -v -f set2 set1" to get the list of files in the first set that are not in the second set.

setdiff is a small shell program for obtaining the difference of two sets of filenames. You use it by:

    setdiff FIRSTSET SECONDSET

where FIRSTSET is a quoted glob for the first set of filenames, and SECONDSET is a quoted glob for the second set of filenames. (You can also use actual filename lists in place of the quoted globs.) For example, to get all files that are not C source or header files in your current directory, you would use:

    setdiff '*' '*.h *.c'

which would print all filenames not ending in .h or .c to standard output.

A slightly more complicated example is finding out what files are not source code files in a project that is a mixture of Perl, C, and Java:

    setdiff '*' '*.h *.c *.pl *.pm *.java'

A final example – find the XML files that are not XSL files (*.xml, *.xsd, etc.) mixed in with a bunch of source code files:

    setdiff '/home/mycyc-0.22/*.x*' '/home/mycyc-0.22/*.xsl'

Here is the code:

#!/usr/bin/sh
# Output difference between two sets of filenames,
# i.e. the set difference of the the two filename sets.
# Names are assumed to be canonicalized already.
#
# This is the relative complement of B relative to A,
# also known as the set theoretic difference.
# Examples:
#   { 1, 2, 4 } - { 1, 2, 5} = 4
#   { 1, 2, 5 } - { 1, 2, 4} = 5

# check arguments
if [ "$1x" = "x" -o "$2x" = "x" ]; then
  echo usage: setdiff FILESETEXPR1 FILESETEXPR2
  exit 1
fi

# get a temporary filename for set #1
set1=`mktemp -t`
if [ "${set1}x" = "x" ]; then
  echo "can't get temporary filename for set1"
  exit 1
fi

# get a temporary filename for set #1
set2=`mktemp -t`
if [ "${set2}x" = "x" ]; then
  echo "can't get temporary filename for set2"
  exit 1
fi

# get the sets into temporary files
ls -1 $1 > $set1
ls -1 $2 > $set2

# compute all elements of set #1 not in set #2
fgrep -v -f $set2 $set1

By the way, setunion is setdiff, only with "fgrep -f $set1 $set2" at the end.

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • setdiff '*' '*.h *.c'
    ls !(*.[ch])
  • BTW, you probably also want to be using comm(1) [freebsd.org] instead of fgrep.

    comm -23 $set1 $set2

    Also, you should probably add a line to delete those temp files:

    trap "rm -f $set1 $set2" EXIT HUP INT QUITE TERM

    That way, they get deleted on exit, or if some common signal gets delivered.

    -Dom