Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • We've been using swish-e. http://www.swish-e.org/ [swish-e.org]. Easy to setup and install, and it's fast.
    • It's hard to tell, but looks like swish-e is only set up to index files. I don't have files!
      • The last time I've used swish-e you could call some external programm to 'fake' files. Something like swish-e -S prog.
      • Last time I used Swish-E I was indexing files, but they included things like MS Word and PDF documents so we used an indexing script to filter the files through X_to_text programs and feed the results to the indexer. There's no reason why your indexing script couldn't get its data from DBI or similar rather than files.

        The other thing I liked about the Swish-E indexing process was that you could feed arbitrary metadata fields to the indexer. This allowed you to get things like author name, publication da

        • Right, Xapian allows you to store abitrary metadata and works find under incremental indexing, which I consider key.
    • Is Swish-E working for you with Unicode? We've found it unsatisfactory once non-Ascii characters are being used in the data being searched (and the search terms and results).

      Smylers

  • We liked what Lucene had to offer, but Plucene left much to be desired. So, we ended up creating a java servlet so we could use Lucene proper as a web service ( lucene-ws.net [lucene-ws.net]).

    There's a Perl client in the SVN repository, though it requires an as-yet-unreleased version of WWW::OpenSearch. Indexing is a bit slow mostly due to the HTTP overhead, but searching is pretty slick and it now includes search suggestions.

    We'd like to replace it, eventually, with something more native to Perl. KinoSearch [rectangular.com] is relatively

  • HyperEstraier [sf.net] with a little help from Search::Estraier [cpan.org] fits my needs quite nicely.

    I started using search engines with swish-e (which I still use quite a bit), but threre is also another very interesting project: KinoSearch [cpan.org] which looks very promising full control from perl is required (it somewhat reminds me of WAIT which powered CPAN).