tinman spent a few years mucking around industry before going back to school for a Masters. Currently not enjoying the weather in North England..
He wrote Perl that looked suspiciously like C code in 1998, while working as an intern, and has been trying to cure that bad habit ever since.
No one has ever done it before ? or it's just that no one is bothered or even simpler, that it doesn't make sense?
I need to search through documents written in Asian scripts. It's part of my language independent, entity extraction thingmajig (to put it technically, heh heh). As I mentioned before, people seem to author content in a variety of custom built true type fonts. I'm just going to construct mapping tables for the more popular fonts, so that I can convert them easily into Unicode. Voila, I don't need to worry about weird ASCII garbage, but I can just run scripts through a converter and expect everything to be readable by any application which understands ONLY Unicode..
I see a couple of problems already, though.
There must be a language somewhere which breaks my formatter; one with strange formatting or character modification rules. I simply can't believe a simple hash (or HashMap, depending on the language) job like this hasn't been done before.
I wonder if someone is going to complain about having me run their font through charmap and build a mapping table.
For extra mod/brownie points, I can even build an Editor thingummy which converts between custom fonts and Unicode. Ok, THAT I am certain has been done before. Do these Unicode aware applications also understand custom TrueType fonts ?
More investigation needed