Slash Boxes
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

richardc (1662)

  (email not shown publicly)

Journal of richardc (1662)

Friday July 19, 2002
12:19 AM

The curious case of File::Find

[ #6467 ]
So there was a technical meeting last night, earlier today, something like that. Leon gave a short talk about some modules he particularly liked and some he didn't. On his naughty list was File::Find

Personally, I don't have a problem with File::Find, but I know from my time on irc that people whine about it a lot. My hunch as to why people don't like it is that it makes you use callbacks, which can confuse people. To add insult to confusion, once you're in the callback you have to gyrate oddly with package variables.

The started me wondering, if so many people don't like the thing, why isn't there an alternative? Of course this took me nowhere, so I just wrote it off as insufficient JFDI in the world.

So now, until I either crack it or get bored, I'm trying to come up with a better way to do what File::Find gives you, and it's not proving terribly easy.

What I'm currently thinking of something like:
my @found = NewFind->name( '*.mp3' )->type( 'file' )->size( '<1000K' )->find( '.' ); # finds all smallish mp3s

Each method returns an object apart from the find method, which returns a list of things that matched your specification. An or-like operation could look like this:
my $f = NewFind->type( 'file' )->or( NewFind->name( '*.mp3' ), NewFind->name( '*.ogg' ) ); # search for oggs and mp3s

How does that hit people - rampant wheel reinvention, or a good start? Comments and alternative api suggestions welcome, hopefully we can get to something that will suit how people want to find files.

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
More | Login | Reply
Loading... please wait.
  • Looks good. Grouping in the search clause might be awkward syntactically, but for fairly simple stuff, I like it. You might also provide a way to provide a code block that returns a boolean for custom stuff; So, adding to your example, try looking for smallish mp3s that have some specific ID3 tag:

    my @found = NewFind->name( '*.mp3' )->type( 'file' )->size( '<1000K' )->custom( sub { for id3 tags here... } )->find( '.' );

    "custom" is probably the wrong name for it, but (hope

  • Random thoughts :

    NewFind->file() instead of ->type('file')
    NewFind->name( '*.mp3', '*.ogg' ) to get a shorter 'or' condition (in this case, can be written as NewFind->name( qr/\.(mp3|ogg)$/ ))
    Provide an ->exec( \&command ) hook, similar to the -exec option to find(1) : i.e., gets the pathname as its only parameter, returns true or false.
    Think about -prune and finddepth.

    • NewFind->file() instead of ->type('file')

      Yes, I was being very literal in a transliteration of a find(1) example, apart from making it longer of course.

      NewFind->name( '*.mp3', '*.ogg' ) to get a shorter 'or' condition (in this case, can be written as NewFind->name( qr/\.(mp3|ogg)$/ ))

      I like both of those.

      I still think there's need for a form of or. I just can't think of a good example right now.

      • my $finder = NewFind->or(
          NewFind->name( '*.pl' ),
          NewFind->exec( sub {
            my $file = shift; my $fh;
            if (open $fh, $file) {
              my $shebang = <$fh>;
              close $fh;
              return $shebang =~ /^#!.*\bperl/;
            return 0;
          } ),
      • You still need an explicit or when your conditions are not on the same variable. The form shown here works for var == val1 or val2 but not var1 == val1 or var2 == val2. For example, file is greater than 500M or older than 3 days.

        J. David works really hard, has a passion for writing good software, and knows many of the world's best Perl programmers
        • I'm sorry, I don't follow. var == val1 or val2 seems to be like name('*.mp3', '*.ogg') and var1 == val1 or var2 == val2 is:
          # files greater than 500M or older than 3 days
          F->or( F->size( '>500M' ),
                 F->age ( '> 3 days' )

          As in rafaels "Good example for 'or'" post

          • Yes, you followed what I was saying. I was giving an example where you have to have an explicit F->or method.

            J. David works really hard, has a passion for writing good software, and knows many of the world's best Perl programmers
    • My thoughts about finddepth are to ignore it, as it confuses me.

      Can I get a quick show of hands as to whether people will really miss this? If so then I'll find a way to make it work.

  • Interface (Score:2, Informative)

    Yeah, I thought about doing a "nice" version of File::Find myself. I wanted to tie in a few things as well - the ability to get structured data as well as lists from it, and to cache data.

    I'm not sure I like the stream-y interface you've got here - powerful, but violates the KISS principle which makes File::Find such a pain in the arse to use at the moment. I'd pictured more of a hash-based interface, but I hadn't thought of options so much (but I guess they could be done either by regexps or arrays).


    • I'm not sure I like the stream-y interface you've got here - powerful,

      I wasn't either, hence the posting, but then I saw Rafael's beautiful "Good example for 'or'" for which powerful is certainly one of the words I'd use.

      By stream-y I assume you mean the chaining of method calls? If you don't like it you can always do it in longhand:

      my $f = NewFind->new();
      $f->name( '*.mp3', '*.ogg' );
      $f->size( '<10000' );
      my @potayto = $f->find('.');

      my @potahto = NewFind->name( '*.mp3', '*.ogg' )

      • Whenever I see method chaining, I take the opportunity to point out Robin Houston's Want [] module. Take a look. Maybe you could use it.
        • It's an interesting module, sure enough. I don't really imagine needing to bring the big guns in.

          My current plan is to call the module File::Find::Ruleset. This seems to do some of the hard explaining for me, there are two types of methods, those that add rules to the ruleset, and those that ask questions of it.

          The methods that add new rules (name, size, exec, or) return the ruleset object, which makes chaining easier. Those that ask questions (find, find_as_superbly_complex_hash) will return what is

      • The notation of chaining method calls like that is an unfamiliar one to me at least (see, told you I was crap), and probably to the target audience of the module (viz. those who find all those icky callbacks in File::Find too tricky).

        As for the tree, I envisaged some kind of structure depending on what parameters you pass to the find involving some quantities of stat()ing etc., thusly:

        $VAR1 = {
              name => '.',
              path => '.',
              abspath => '/h

    • It was almost the same thing I came up when thinking about it.

      I'd like to see something like this:

      my @files = find('/tmp', { maxdepth => 10, mindepth => 5, name => qr/\.*\.c$/ });

      (Which I btw already have working in a small example I hacked together.) I really like your idea of using arrays for alternation, but how do you decide wheter to AND or OR? (AND doesn't make much sense in your example though).

      Instead of returning the files I'd also consider an 'exec' like option that took a s
      • I've made a quick implementation of what I'd like to see File::Find offer instead. I think it might be a good idea to do a real print function and instead call the one I've used 'return'.

        It is of course not nearly done, but if anyone feel they like the concept and want to use it please feel free to do so.

        package Find;

        use strict;
        require Exporter;

        @ISA = qw(Exporter);
        @EXPORT = qw(find);
        @EXPORT_OK = qw(find);

        $VERSION = '0.1';

        sub find {
      • my @files = find('/tmp', { maxdepth => 10, mindepth => 5, name => qr/\.*\.c$/ });

        Warning: hashes found not to have a predictable order. Evaluating the rules in a known order can have a huge effect on efficiency. Please look at rafael's "Good example for 'or'" comment, if the exec happened first you'd really feel a hit from it just shortcutting that happens at name('*.pl').

        (Which I btw already have working in a small example I hacked together.)

        Please, don't let me stop you releasing your

        • I thunk it out a bit more, and now it's possible to do this:

          # extract from the test suite
          # procedural form of the prune CVS demo
          use File::Find::Ruleset 'find_ruleset';
          $f = find_ruleset(or => [ find_ruleset( directory =>
                                                  name      => 'CVS',

  • MJD presents a File::Find replacement in his Programming with Iterators and Generators [] tutorial (at least he did when he gave his practice session at Perhaps his approach is worthy of investigating as well.
  • You need both OR and AND for sufficiently complex queries...

        (A or B) and ( (C and D) OR E )

        You could use chaining, but you'd need
        two objects in the above, and the logic
        might be clearer is you just had an AND

    you may want to include both OR and AND and NOT as well
    because some folks will find it easier to compose

        (A or B) and NOT( C or D or E)
        (A or B) and ( NOT C and NOT D and NOT E)

    ...especially when C, D, E may be mod
  • If there is a link to a directory, with File::Find::Rule, we are getting the actual names and not the link names. Is there any way to get the links also along with the real names for directories. We are getting the links also for file names, ofcourse.
    Ex : abc -> cde in directory /test/ and there is a file in /test/cde/2222.
    When we do a File::Find::Rule->new()->name->(qr/\d\d\d\d/)->in("/test/"), we are getting only /test/cde/2222.
    Is there any way we can get /test/abc/2222 also ?