Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

pudge (1)

pudge
  (email not shown publicly)
http://pudge.net/
AOL IM: Crimethnk (Add Buddy, Send Message)

I run this joint, see?

Journal of pudge (1)

Thursday September 16, 2004
12:27 PM

Search Engines

[ #20903 ]

We are looking for a new search engine for Slashdot. Right now we use live searching in MySQL, which, well, sucks. I've been told to look into Plucene and Swish-e. Any other suggestions, or comments about those?

Plucene looks like a great, flexible, system, though I am concerned about performance and scalability (those are concerns with any system, but Simon says Plucene is "much slower" than its cousin Lucene, and I can find no info on scalability). I am beginning to look into Swish-e now, and have no comments about it yet.

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • I looked briefly at Plucene but never got very far and got discouraged by all the comments about slowness.

    I had heard of Swish-e but hadn't even considered it because I didn't think it could handle gigabyte collections of documents. After seeing Josh Rabinowitz's talk at YAPC I decided to try it out, and we've been quite surprised at how fast it is. We're in the process of junking the Windows-based search software we were using (with a jerryrigged Perl/MySQL system for splitting up searches and distributin
  • I know it's not Perl, but have you considered looking at Lucene [apache.org], together with some Inline-Java [cpan.org] glue? That way, you'd get the fairly nice Lucene architecture, and possibly better performance.

    -Dom

  • I can't comment on scalability for the likes of slashdot, but swish-e certainly worked well for the site we deployed it on. Its parsing is quite good (for HTML, XML, and whatever you can parse programatically so it can grab data from SQL) and the output is quite flexible (your choice of templating systems); it wasn't too tough to make very nice looking output.

    We're not using SWISH::API under mod_perl, so I'm curious about whether it can stand up to slashdot's load.
    --

    -DA [coder.com]

  • Pudge, What ever came of this? I'm curious if you found a suitable one or if you're still looking. Have you looked at Texis' Webinator [thunderstone.com]?