I now have a script which produces output that looks like this. Each capital letter represents a manuscript. (OK, so in real life the words are lined up in columns, but I can't make use.perl play nicely with Unicode characters inside an <ecode> tag, which is the one that would preserve spacing.)
Word variation! Context:
մինչեւ ցայս վայրս բազմաջան եւ եւ աշխատաւոր քննութեամբ գտեալ գրեցաք >> ի զշարագրական գրեալս զհարիւրից ամաց, զորս ի << բազում ժամանակաց հետա հետաքննեալ հասու եղաք։ ընդ այնքանեաց տեսողացն եւ
Base ի զշարագրական գրեալս զհարիւրից ամաց, զորս ի
----
ABH: զշարագրական գրեալս զհարիւրից ամաց զորս ի
G: զշարագրական գրեալսն հարիւրից ամաց զորս ի
C: ի ժամանակական գրեալս հարիւրից ամացն զորս
J: զշարագրական գրեալս զճից ամաց զոր ի
DFI: զշարագրական գրեալս զճից ամաց զորս ի
E: զշարագրական գրեալս զճ ամաց զորս ի
Of course it doesn't take any input yet. One thing at a time.
Formatting -- yours and use.perl.org's (Score:1)
Re: (Score:1)
Q1) Alternate readings are unsorted. That is intentional - I don't want to inadvertently give priority to the reading in ms A, or the reading with the most words, or anything.
Q2) Alignment can occur in one of two ways. The first is by a small enough edit distance, as you say - it's how I keep the instances of "zsharagrakan" aligned. That was my "fuzzy match." The second is what is called a "negative variant" - the words aren't alike at all, but they coincide in placement. This is why "zhamanakakan" is
Re:Formatting – yours and use.perl.org's (Score:1)
Put posts and comments through “
encode 'us-ascii', $your_post, Encode::HTMLCREF”. That will make them come out as intended.That’s on purpose; Slashcode has its own special
<ecode>tag for that purpose (whose distinguishing features are: 1. you can write raw angle brackets and ampersands inside, and Slash will turn them into entities for you; 2. it uses<pre>, so very long lines will wrapRe: (Score:1)
Re: (Score:1)
Slashcode has its own special
<ecode>tag for that purpose (whose distinguishing features are: 1. you can write raw angle brackets and ampersands inside, and Slash will turn them into entities for you;This is the part that doesn't play nicely with UTF-8, actually, although the
<ecode>tag is almost always what I want - the Armenian characters get converted into entities upon comment submit, and those entities themselves have their ampersands turned into entities upon ecode conversion.Re: (Score:1)
The conversion to entities is your browser’s doing, actually. It sees that the form should be submitted in ISO-Latin1, so it turns all the non-Latin1 characters into entities. Slashcode can’t actually know that you didn’t mean to send them that way. There is therefore no way to get around this.
All you can do is use plain
<code>tags with<br>tags for linebreaks, sequences of for tabs, and manual escaping for ampersands and less-thans. It’s a pain to do manuall