Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

tinman (2063)

tinman
  (email not shown publicly)

tinman spent a few years mucking around industry before going back to school for a Masters. Currently not enjoying the weather in North England..

He wrote Perl that looked suspiciously like C code in 1998, while working as an intern, and has been trying to cure that bad habit ever since.

Journal of tinman (2063)

Monday April 12, 2004
04:38 PM

strange .. or obvious ?

[ #18303 ]

No one has ever done it before ? or it's just that no one is bothered or even simpler, that it doesn't make sense?

I need to search through documents written in Asian scripts. It's part of my language independent, entity extraction thingmajig (to put it technically, heh heh). As I mentioned before, people seem to author content in a variety of custom built true type fonts. I'm just going to construct mapping tables for the more popular fonts, so that I can convert them easily into Unicode. Voila, I don't need to worry about weird ASCII garbage, but I can just run scripts through a converter and expect everything to be readable by any application which understands ONLY Unicode..

I see a couple of problems already, though.

There must be a language somewhere which breaks my formatter; one with strange formatting or character modification rules. I simply can't believe a simple hash (or HashMap, depending on the language) job like this hasn't been done before.

I wonder if someone is going to complain about having me run their font through charmap and build a mapping table.

For extra mod/brownie points, I can even build an Editor thingummy which converts between custom fonts and Unicode. Ok, THAT I am certain has been done before. Do these Unicode aware applications also understand custom TrueType fonts ?

More investigation needed ....

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.