Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

Mark Leighton Fisher (4252)

Mark Leighton Fisher
  (email not shown publicly)
http://mark-fisher.home.mindspring.com/

I am a Systems Engineer at Regenstrief Institute [regenstrief.org]. I also own Fisher's Creek Consulting [comcast.net].
Thursday July 29, 2010
11:13 AM

Stupid Lucene Tricks: Document Frequencies and NOT

[ #40471 ]
  1. You can get the document frequency of a term (i.e. how many documents have that term) through Lucene.Index.IndexReader.DocFreq(t As Term) As Integer.
  2. You can get the IndexReader for a Lucene.Search.IndexSearcher through IndexSearcher.GetIndexReader().
  3. If you want to display the document frequencies for the individual keywords of a search, and a piece is a NOT phrase (like -antibiotic in antimicrobial -antibiotic), you cannot use DocFreq() directly. In that case, the document frequency can be computed as:

          DOCFREQ = count of all documents - DocFreq(TERM_NO_NOT)

    as in:

          DOCFREQ = 60227 - DocFreq(New Term("all", "antibiotic"))

    where the NOT piece was -antibiotic and all is the Lucene document field in question.

(Ob. Perl: Although PLucene is now 5 years out of date, Perlesque should eventually let you get at Lucene.NET via a strongly-typed Perl 6.)

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.