Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • I had a long reply using Text::Unidecode here, but use.perl.org *really* doesn't want to format things the way I want it to (half the time it seems to double-encode my unicode, and never do multiline code or pre tags!), so I'll try using words instead of pictures to explain what I'm trying to talk about. First, the easy question: How are those alternate readings sorted? It doesn't seem to be by first ms with that reading, nor by number of readings -- is it just hash order? Second, the hard question -- wh
    • Q1) Alternate readings are unsorted. That is intentional - I don't want to inadvertently give priority to the reading in ms A, or the reading with the most words, or anything.

      Q2) Alignment can occur in one of two ways. The first is by a small enough edit distance, as you say - it's how I keep the instances of "zsharagrakan" aligned. That was my "fuzzy match." The second is what is called a "negative variant" - the words aren't alike at all, but they coincide in placement. This is why "zhamanakakan" is lined up with "zsharagrakan". In the case of the 'i' at the beginning, it means that none of the manuscripts besides C have a word in that place, so it is in fact being treated as a word that is sometimes not there.

      What this doesn't display (yet) is the "fuzzymatch" and "variant" relationships that have been calculated between words. I'm hoping it will become obvious what to do with that information when I start handling user input.