Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

Mark Leighton Fisher (4252)

Mark Leighton Fisher
  (email not shown publicly)
http://mark-fisher.home.mindspring.com/

I am a Systems Engineer at Regenstrief Institute [regenstrief.org]. I also own Fisher's Creek Consulting [comcast.net].
Friday February 17, 2006
12:19 PM

setdiff - Difference of Two Sets of Filenames

[ #28717 ]

Try to use "some filenames except those that match this pattern" is a multi-step process in common shells. You have to generate the two lists of filenames, put those lists into separate files, then perform a "fgrep -v -f set2 set1" to get the list of files in the first set that are not in the second set.

setdiff is a small shell program for obtaining the difference of two sets of filenames. You use it by:

    setdiff FIRSTSET SECONDSET

where FIRSTSET is a quoted glob for the first set of filenames, and SECONDSET is a quoted glob for the second set of filenames. (You can also use actual filename lists in place of the quoted globs.) For example, to get all files that are not C source or header files in your current directory, you would use:

    setdiff '*' '*.h *.c'

which would print all filenames not ending in .h or .c to standard output.

A slightly more complicated example is finding out what files are not source code files in a project that is a mixture of Perl, C, and Java:

    setdiff '*' '*.h *.c *.pl *.pm *.java'

A final example find the XML files that are not XSL files (*.xml, *.xsd, etc.) mixed in with a bunch of source code files:

    setdiff '/home/mycyc-0.22/*.x*' '/home/mycyc-0.22/*.xsl'

Here is the code:

#!/usr/bin/sh
# Output difference between two sets of filenames,
# i.e. the set difference of the the two filename sets.
# Names are assumed to be canonicalized already.
#
# This is the relative complement of B relative to A,
# also known as the set theoretic difference.
# Examples:
#   { 1, 2, 4 } - { 1, 2, 5} = 4
#   { 1, 2, 5 } - { 1, 2, 4} = 5

# check arguments
if [ "$1x" = "x" -o "$2x" = "x" ]; then
  echo usage: setdiff FILESETEXPR1 FILESETEXPR2
  exit 1
fi

# get a temporary filename for set #1
set1=`mktemp -t`
if [ "${set1}x" = "x" ]; then
  echo "can't get temporary filename for set1"
  exit 1
fi

# get a temporary filename for set #1
set2=`mktemp -t`
if [ "${set2}x" = "x" ]; then
  echo "can't get temporary filename for set2"
  exit 1
fi

# get the sets into temporary files
ls -1 $1 > $set1
ls -1 $2 > $set2

# compute all elements of set #1 not in set #2
fgrep -v -f $set2 $set1

By the way, setunion is setdiff, only with "fgrep -f $set1 $set2" at the end.

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • setdiff '*' '*.h *.c'
    ls !(*.[ch])
  • BTW, you probably also want to be using comm(1) [freebsd.org] instead of fgrep.

    comm -23 $set1 $set2

    Also, you should probably add a line to delete those temp files:

    trap "rm -f $set1 $set2" EXIT HUP INT QUITE TERM

    That way, they get deleted on exit, or if some common signal gets delivered.

    -Dom