Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

Mark Leighton Fisher (4252)

Mark Leighton Fisher
  (email not shown publicly)
http://mark-fisher.home.mindspring.com/

I am a Systems Engineer at Regenstrief Institute [regenstrief.org]. I also own Fisher's Creek Consulting [comcast.net].
Thursday June 17, 2010
05:58 AM

Stupid Lucene Tricks: Search case-insensitive, Retrieve ca

[ #40402 ]

Sometimes when you build an index in Lucene, you want to structure the index so that people can search without worrying about case (case-insensitive search), but you want the display to contain the original mixed-case data (case-sensitive display). The trick is to split each Lucene field into 2 versions:

  1. A case-insensitive field that is indexed but not stored (Lucene.Net.Documents.Field.Index.ANALYZED and Lucene.Net.Documents.Field.Store.NO).
  2. A case-sensitive field that is stored but not indexed, preferably with a field name similar to that of its case-insensitive cousin field like "Display_Title" and "Title" (Lucene.Net.Documents.Field.Index.NOT_ANALYZED and Lucene.Net.Documents.Field.Store.YES).

Storing only the case-sensitive version reduces the index storage requirement (I have seen around a 40% increase in index size with this trick as compared to both storing and indexing one field).

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.