Hopefully I have about a week of evenings worth left before I can put it on my server and start testing it in anger.
Right now I am playing with the searching - as anybody who reads my journal (in between bitching about the middle east, ASP and HTML::Template - the latter of which hasn't bitten me recently) will know I find ever so interesting.
Following links from that funky vector search article on perl.com I have been reading through some of the research and plan to implement a funky vector-based 'similar to this' search to the site.
This means heavily customising the example code in the perl.com article and basically doing the extra stuff : local / global term weighting (already mostly done for the reverse index ), word finding and indexing (again mostly already done with the reverse index, but now with added magic of Lingua::EN::Tagger which is now available and in version 0.02 (clearly ready for production use).
I am also considering using Lingua::Stem but I don't trust stem'ing of words to give the right results - the results seem a little over-zealous hopefully it has a 'minimal' mode or option.
on a side note - I am fairly happy with my debian installation part from one small detail - I can't get graphviz to install. This is annoying as graphviz is something I use a lot and I want to try out network *mumble* searching which looks really interesting.