Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • We've been using swish-e. http://www.swish-e.org/ [swish-e.org]. Easy to setup and install, and it's fast.
    • It's hard to tell, but looks like swish-e is only set up to index files. I don't have files!
      • by grantm (164) on 2006.05.02 17:17 (#47518) Homepage Journal

        Last time I used Swish-E I was indexing files, but they included things like MS Word and PDF documents so we used an indexing script to filter the files through X_to_text programs and feed the results to the indexer. There's no reason why your indexing script couldn't get its data from DBI or similar rather than files.

        The other thing I liked about the Swish-E indexing process was that you could feed arbitrary metadata fields to the indexer. This allowed you to get things like author name, publication date, title (and in our case all sorts of business-unit meta-fluff) directly in your search results so you didn't have to go back to the source documents when displaying a search results screen. Do you know if Xapian does that too?

        The downside of the version of Swish-E that I was using is that it didn't support incremental indexing. You created an index by feeding in a bunch of documents. If you later wanted to add a document then you'd 'simply' recreate the index by feeding in all the original documents plus the new one. I know development versions of Swish-E claim to support incremental indexing but I don't know if that's in a stable release and I've never actually used it. Presumably Xapian supports this.

        • Right, Xapian allows you to store abitrary metadata and works find under incremental indexing, which I consider key.