sheriff_p's Journal http://use.perl.org/~sheriff_p/journal/ sheriff_p's use Perl Journal en-us use Perl; is Copyright 1998-2006, Chris Nandor. Stories, comments, journals, and other submissions posted on use Perl; are Copyright their respective owners. 2012-01-25T02:10:01+00:00 pudge pudge@perl.org Technology hourly 1 1970-01-01T00:00+00:00 sheriff_p's Journal http://use.perl.org/images/topics/useperl.gif http://use.perl.org/~sheriff_p/journal/ ORA's Safari Search sucks; the rant I just sent them... http://use.perl.org/~sheriff_p/journal/38600?from=rss <p>To whom it may concern,</p><p>Knowing that Safari has several books concerning Objective C, today I tried to find them using the search. After several tries, I gave up, used Google to find the books, and ended up typing in the author names.</p><p>This is obviously sub-optimal. Here are the three biggest problems with the search.</p><p>Problem the first: inappropriate stemming of search terms. If you're searching a corpus you don't know much about, then sure, some stemming is fine. If you're indexing technical books, then stemming 'Objective' to 'Object' is a poor choice. If you're going to insist on doing stemming, why not return results that contain the term unstemmed higher in the rankings?</p><p>There must be a huge archive of previous customer searches. Tokenize these in to words, and create a stop list against common ones to avoid stemming on them.</p><p>Problem the second: not specifying between different result concepts. Again, if you're Google, you don't have much choice but to return URLs (although even they make a fair stab at returning different objects: videos, images, products, etc). If you're a book publisher, you have much more scope for returning dramatically more useful results.</p><p>Specify your use-cases here. The search form actually allows people to search individual result objects, ish. There's a drop-down option for Authors, Titles, etc. However, it's an untitled form element, and the selected option there is "Entire Site". I'd suggest that the user expectation there is that clicking on that is going to offer sub-site search specializations, rather than meta-data specializations. But this is a bit of a tangent:</p><p>As a member of the search public, what am I looking for? One of: whole books based on title; whole books based on category; whole books based on author; individual sections from books on topic. Returning these mixed together makes the results confusing, and largely irrelevant. Split them out. Let's see: First three categories that match your search (more on this in a minute); First three authors that match your search; First three titles that match your search; First three chapters that match your search. This covers all your bases, whatever your user was searching for.</p><p>Problem the third: not indexing categories. There's a category called "Objective C". It didn't come up when searching for Objective C. It's also not obviously navigable to from the Safari homepage without performing a different search first, and then clicking through the returned nav on the left (where it isn't highlighted, but other categories are). If I "Browse the Safari Library", I can't drill down. What kind of browsing is that?</p><p>People may well be searching for a category. Why not index the category name itself, rather than just returning the categories the top books come from? With the stemming issues mentioned above, it means you may as well not even have the category classifications.</p><p>There are a relatively small number of books on Safari. Using a naive word-matching search when you know so much about your content already is far from ideal. You have obvious distinct objects that people are searching for - ignoring this and treating it like you're searching flat and homogeneous content is the reason the search is so totally broken.</p><p>-P</p> sheriff_p 2009-03-06T06:44:36+00:00 journal Catalyst in 20 minutes http://use.perl.org/~sheriff_p/journal/28375?from=rss <p>I've been spamming everywhere else, so why not here?</p><p>'The purpose of this tutorial is to teach you enough Catalyst to be dangerous, as quickly as possible. It should take less than an hour to complete. Dangerous, in this case, means "able to make use of the core documentation".'</p><p>http://www.pjls.co.uk/tutorials/catalyst.html</p><p>+Pete</p> sheriff_p 2006-01-16T21:35:13+00:00 journal Javascript Goodness http://use.perl.org/~sheriff_p/journal/27093?from=rss I've been doing some pretty funky stuff with Javascript recently - I had a two week gig to produce a front end to an XMLRPC interface to a client's admin system - if you're a <a href="http://www.bytemark.co.uk/">Bytemark</a> customer you can have a play at: <p> <a href="https://secure.bytemark.co.uk/panel/">https://secure.bytemark.co.uk/panel/</a> </p><p> I really like how Javascript makes me think differently about some aspects of programming - the inheritance system is<nobr> <wbr></nobr>... different, but kinda funky, and I've been relying on closures more than I've had to before. So if anyone knows anyone who needs some Javascript contracting done<nobr> <wbr></nobr>...<nobr> <wbr></nobr>:-) pete@nospam.clueball.com</p> sheriff_p 2005-10-10T08:30:48+00:00 journal Filtering non-ASCII characters with procmail and mutt http://use.perl.org/~sheriff_p/journal/24526?from=rss Almost a year since my last entry, awesome...<nobr> <wbr></nobr>:-)<p> So here's a little snippet of my<nobr> <wbr></nobr>.procmailrc to remove characters I can't understand anyway from the subject and from lines, as mut is sending them straight to my terminal, and messing up my display:</p><blockquote><div><p> <tt># Rewrite the subject and sender to remove foreign characters<br>OLDSUBJECT=`/usr/local/bin/formail -xSubject:`<br>NEWSUBJECT=`echo $OLDSUBJECT |<nobr> <wbr></nobr>/usr/bin/tr -cs '\11\12\40-\176' 'Z'`<br> <br>OLDSENDER=`/usr/local/bin/formail -xFrom:`<br>NEWSENDER=`echo $OLDSENDER |<nobr> <wbr></nobr>/usr/bin/tr -cs '\11\12\40-\176' 'Z'`<br> <br>:0fw<br>|/usr/local/bin/formail -i "Subject: $NEWSUBJECT"<br> <br>:0fw<br>|/usr/local/bin/formail -i "From: $NEWSENDER"</tt></p></div> </blockquote> sheriff_p 2005-05-04T09:04:19+00:00 journal mod_perl 2 guide http://use.perl.org/~sheriff_p/journal/19069?from=rss Sooo, <p> I'm putting together a very rudimentary <a href="http://grou.ch/modperl/">mod_perl 2 tutorial / guide</a>. The HTML sucks etc, but content suggestions are welcome<nobr> <wbr></nobr>... </p><p> +Pete</p> sheriff_p 2004-06-03T13:00:52+00:00 journal Bug Bonanza http://use.perl.org/~sheriff_p/journal/11572?from=rss My <a href="http://grou.ch/rtf.html">bug challenge</a> has proven to be quite popular, and the approach has picked up a <a href="http://penderel.state51.co.uk/pipermail/london.pm/Week-of-Mon-20030407/018245.html">couple</a> of <a href="/~ziggy/journal/11518">fans</a>... <p> So here's an idea: Why Doesn't Someone(tm) create an automated system where module authors can offer bounties in a central place? Bounties per bug can be set as high as authors want, in a sort of auction fashion... Sadly, as <a href="/~TorgoX/">TorgoX</a> will no-doubt point out, I'm a bear of little action, and even less time, so I think this should become someone else's baby.</p> sheriff_p 2003-04-10T09:58:34+00:00 journal RTF::Tokenizer http://use.perl.org/~sheriff_p/journal/11487?from=rss I released <a href="http://search.cpan.org/author/SARGIE/RTF-Tokenizer-1.00/">RTF::Tokenizer v1.0</a> last night. All hail me. I used FIGLET for the README file again... It's a massive improvement. I stole the best parts from RTF::Parser and the old RTF::Tokenizer, and got a huge speed improvement. There are over three thousand tests too. The plan is now to build RTF::Reader using it, and also rewrite RTF::Parser to use it, which is easier said than done, but, done it shall be. sheriff_p 2003-04-07T11:09:50+00:00 journal