I am going to give two talks at forthcoming YAPC::Europe, one full-length and a lightning.
Coincidently, both talks are about using Perl in linguistics applications.
How to make Google Books at home
Several years ago I made online search for two books published by Art. Lebedev Studio. The main idea was to take best from Google and made a better service.
After I left the Studio I totally re-wrote the code, because as always, real understanding comes after you have made the prototype of a system. Current engine gives a search results in the form of graphical preview of page with underlined search words, like this: http://deeptext.net/booksearch/selected.gif
I am going to cover several elements:
* Working with book layout and converting it into a suitable format.
* Extracting paragraphs, phrases and words from the layout.
* Understanding the importance of separate words.
* Thinking of how to restore the word order if the source had damaged it.
* Restoring words split with hyphens.
* Indexing the text of a book.
* What is better for index: dictionary or morphology engine?
* Building the cloud of popular words.
* Generating previews and thumbnails.
* Highlighting words that are found.
* Caching search results.
* Adding hot word lists to search results.
To demonstrate how all that stuff works I will make a brochure of all my posts in use.perl.org and create online search through them.
Translating human language with computer grammar
A couple of months before German Perl Workshop in 2007 I started to learn German. My lightning talk there was about parsing URIs with Parse::RecDescent grammars. I am so excited about both German and parsing module, that decided to create a very simple machine translator which should be able to translate basic phrases from my German textbook. In Perl no doubt.
Although I am not going too deeply dig into the theory of grammars and human languages, I have bought a 600-pages book "Grammar of the Text" today