Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • I just read about your MCE and if I had something like that while doing my PhD, I would have been a godsend. Anyway, one thing I wanted to point out and something that happens in medieval Celtic Studies often is the situation where you have two texts which are both titled the same or are very much alike but are not exactly the same text even given variant spellings. For instance, in the edition that I have done, one of the scribes moved two of the lines from the original else where in the poem and filled

  • I have been concentrating on word-level variants, it's true, because it's easiest for the computer to find meaningful differences when you break the texts down to their smallest meaningful constituent parts. For most Western languages, that's a word.

    The text I'm working on also has sentence-long (or paragraph-long, or in one case section-long) additions/deletions appearing in certain texts. (It also has word transpositions, which the MCE can detect, but which I haven't decided how to treat.) As far as th

    • I have been concentrating on word-level variants, it's true, because it's easiest for the computer to find meaningful differences when you break the texts down to their smallest meaningful constituent parts. For most Western languages, that's a word.

      Indeed, I would caution, however, that Celtic Languages have initial mutation such that grammatical meaning is encoded in the lenition or nasalization of the following word.

      I *can* envision a feature wherein the user defines a minimum word length for a "substantial" variant, and then for each "substantial" variant the editor program will look for similar lines elsewhere in the text and point them out. (The minimum length setting would be to prevent noise; you don't need the computer showing you where every instance of the word "and" is in a text, for example.) It would still be the user's (that is, the human editor's) job to note a definite correlation, in either the apparatus or the footnotes. Is that the sort of thing you'd be looking for?

      Well, for Old and Middle Irish, most of the variations in spelling are in the "Dictionary of the Irish Language based mainly on Old and Middle Irish Sources". The problem is that in Irish you have d for t spelling change, among other changes, as Old and Middle Irish mingle on the page so word length may not help in this case. It would

      • Indeed, I would caution, however, that Celtic Languages have initial mutation such that grammatical meaning is encoded in the lenition or nasalization of the following word.

        Yes, I should have been more clear. The word is the smallest meaningful difference that the computer can easily detect. Armenian also has grammatical meaning in suffixes, and a few prefixes, but for now the human still has to review those.

        Well, for Old and Middle Irish, most of the variations in spelling are in the "Dictionary of the Irish Language based mainly on Old and Middle Irish Sources". The problem is that in Irish you have d for t spelling change, among other changes, as Old and Middle Irish mingle on the page so word length may not help in this case. It would be easier to define the orthographic differences that may occur in variant spellings. So, a tree of variant spellings based on known change patterns would be of greater utility. I could be wrong and miss understanding you though.

        Again I was unclear; sorry about that. I was addressing large variants (e.g. your example of transplanted lines), so by "word length" I meant "number of words in variant" rather than "number of characters in word." So you might want to know if a contiguous set of, say,