This means that something that matches '"quite a long phrase"' scores lower than '+word other'. So your scoring is totally different for queries according to how many distinct tokens the query contains.
The way around this is to calculate a max score for a query.. this snippet of code is quite handy for this
#/usr/bin/perl -w
print "\nstarting...\n";
my @strings = ('"quite long phrase"','+must optional','"short phrase" word', '"quite long phrase" word');
foreach (@strings) {
my $max = 0;
print "string:$_\n";
my @tokens = m/(\"[\s\S]+\"|\S+)/g;
print "tokens:\n";
print join(":",@tokens), "\n\n";
foreach (@tokens) {
$max += (m/[\"\+]/) ? 0.8 : 0.3;
}
print "max score : $max\n";
}
print "done...\n";
Bug? (Score:2)
Shouldn't that regex be
m/(\"[^"]+\"|\S+)/g? Or as I'd normally write it,/("[^"]+"|\S+)/g. The string might include two quoted phrases.Re:Bug? (Score:1)
The second regex would do very nicely
@JAPH = qw(Hacker Perl Another Just);
print reverse @JAPH;