Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

TeeJay (2309)

TeeJay
  (email not shown publicly)
http://www.aarontrevena.co.uk/

Working in Truro
Graduate with BSc (Hons) in Computer Systems and Networks
pm : london.pm, bath.pm, devoncornwall.pm
lug : Devon & Cornwall LUG
CPAN : TEEJAY [cpan.org]
irc : TeeJay
skype : hashbangperl
livejournal : hashbangperl [livejournal.com]
flickr :hashbangperl [flickr.com]

Journal of TeeJay (2309)

Sunday June 01, 2003
03:21 PM

back to searching

[ #12547 ]
After getting distracted by the wonderful elegance and sheer funkiness of Hilberts space-filling curve I have poked about in my todo pile and released a documented (but apparently still hard to understand) Data::Iterator::EasyObj, and am now back to work on my restaurant search / portal site.

Hopefully I have about a week of evenings worth left before I can put it on my server and start testing it in anger.

Right now I am playing with the searching - as anybody who reads my journal (in between bitching about the middle east, ASP and HTML::Template - the latter of which hasn't bitten me recently) will know I find ever so interesting.

Following links from that funky vector search article on perl.com I have been reading through some of the research and plan to implement a funky vector-based 'similar to this' search to the site.

This means heavily customising the example code in the perl.com article and basically doing the extra stuff : local / global term weighting (already mostly done for the reverse index ), word finding and indexing (again mostly already done with the reverse index, but now with added magic of Lingua::EN::Tagger which is now available and in version 0.02 (clearly ready for production use).

I am also considering using Lingua::Stem but I don't trust stem'ing of words to give the right results - the results seem a little over-zealous hopefully it has a 'minimal' mode or option.

on a side note - I am fairly happy with my debian installation part from one small detail - I can't get graphviz to install. This is annoying as graphviz is something I use a lot and I want to try out network *mumble* searching which looks really interesting.

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • When I posted on PerlMonks [perlmonks.org] about user experience with Lingua::Stem [cpan.org] here [perlmonks.org], there were some really good replies and links offered, including some which profiled differences in the incidence of stemming errors between stemming methods. There may be something there of interest for you also including this link [rmit.edu.au] which references an article that compares stemmer performance - Notably, the results of this paper support the use of the Porter stemming technique, the same as that implemented in the Lingua::Stem [cpan.org] module.