Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

TeeJay (2309)

TeeJay
  (email not shown publicly)
http://www.aarontrevena.co.uk/

Working in Truro
Graduate with BSc (Hons) in Computer Systems and Networks
pm : london.pm, bath.pm, devoncornwall.pm
lug : Devon & Cornwall LUG
CPAN : TEEJAY [cpan.org]
irc : TeeJay
skype : hashbangperl
livejournal : hashbangperl [livejournal.com]
flickr :hashbangperl [flickr.com]

Journal of TeeJay (2309)

Tuesday April 20, 2004
08:57 AM

object oriented fulltext indexing and searching

[ #18401 ]
I am having fun porting my Class::Indexed code to our Tangram backed and very-OO system at $work.

I have had moderate success making the superclass handle relationships, etc and now it works well with 1 to (1 or 0) relationships.

The system allows you to specify what is indexed for a particular class, by giving an attribute name and a weighting for that attribute. It also allows you to specify a relationship (along with a method to call instead of an accessor), or a lookup (which is non-OO for orthogonal information based on lookups for speed and flexibility) from the index database ( which requires you have the relevent information in the same database as the reverse index database ).

On the whole I am rather pleased and I am working on fixing the gaps in our system, and finishing the indexing class so that it can cope with 1 or more to maNy relationships. (i.e. if an object on the many side is updated it is able to find and update the word index for object related to it, so far this works for 1 to 1, 1 to 0 and many to 1).

I have also added some limiting to word scores (so that if a word occurs frequently in 1 place the score doesn't skew results), class-level stopwords (hopefully later I will be able to add object-level stopwords), and normalisation (so that all scores fit on a curve between 0 and 1).

A lot of this should be back-ported into Class::Indexed some time soon - at least the class-level stopwords, normalisation and word score capping will anyway. The tangram related stuff is heavily dependant on our tangram hacks and isn't much use. Hopefully I can hack an alternative more general solution into Class::Indexed, possibly subclassing for modules like Class::DBI.

edited:

is there a better way of generating placeholders for dbi queries than :

my $placeholders = join (',' ,map('?',@array));

There has to a nice way to do something that is so frequent. I was also hoping that there is a fast and elegant way to sum the contents of a hash or array somewhere in CPAN or perl 6.

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.