Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

aurum (8572)

aurum
  (email not shown publicly)
http://www.eccentricity.org/

Ex-Akamaite, ex-Goldmanite. Currently working on Ph.D. in Armenian/Byzantine history at Oxford. Spends more time these days deciphering squiggly characters than spaghetti code. Thinks that UTF-8 is the best thing since sliced bread.

Journal of aurum (8572)

Saturday September 06, 2008
05:56 PM

cpan module #3

[ #37376 ]

Today I released the first small piece of the Collation Project. (Yes, I have another research proposal I ought to be writing. Yes, I spent hours today writing documentation and formalizing tests. What's your point?)

This piece addresses the problem that is efficient transcription of manuscripts. It is my weird idea of a markup language for TEI XML. As an added bonus for people who aren't me, it exports a function to take an existing TEI XML file (well, string), parse it, wrap all the whitespace-separated words in <w/> ("word") tags, and return the new file. Identifying the words is, after all, step one in efficient word collation.

This also means that my collator should be able to handle pretty much any language or writing system, as long as the basic unit of meaning that ought to be collated is enclosed within a <w/> tag. When it's done, of course.

This also means that I am going to need a module name for the collator soon. Suggestions?

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.