Stories
Slash Boxes
Comments
NOTE: use Perl; is on undef hiatus. You can read content, but you can't post it. More info will be forthcoming forthcomingly.

All the Perl that's Practical to Extract and Report

use Perl Log In

Log In

[ Create a new account ]

slanning (5049)

slanning
  (email not shown publicly)
http://search.cpan.org/~slanning/

Scott Lanning is currently working in Amsterdam at a hotel-booking company. The following interviews and commentaries are for entertainment only. The views and opinions expressed therein do not necessarily represent the views of his employer or even himself.

Journal of slanning (5049)

Monday August 28, 2006
10:27 AM

chinese character components

[ #30778 ]

If for some strange reason you've looked on my User Info page, you might've noticed that I'm studying (Mandarin) Chinese. I started taking classes at Ecole-club Migros last August, so I've been at it for about a year. I'm still nowhere near being able to speak Chinese yet[1], but it's going pretty well I think.

So my point in bringing this up is.. a long shot. I'm hoping that someone familiar with Chinese characters will have some information.

As I've been studying writing the characters, I've of course noticed there are a lot of patterns. Each character is basically composed of one or more "components". For example, if you look at 能 neng, which means "to be able to" or "can", it's made of four parts. The top-left looks like the bottom part of 云, "cloud". The bottom-left, 月, means "moon" or "month". Both top and bottom of the right side are the same; I'm not sure, but it might be 七, which means "seven".[2]

When you look up Chinese characters in a dictionary, you look them up by "radical", which is the main "component" of the character. I don't think there's a steadfast rule for determining the radical of any given character; I don't even know which component of 能 above is the radical. Beginner dictionaries try to make it easier by locating a character under several of its components, not just the radical.

There are programs to train yourself in Chinese characters, notably Hanzi Master [3]. I also got a lot of data on Chinese characters from the same guy's web site.

In addition to a relatively limited number of components (a few hundred total, I think) being shared between all the characters (a few tens of thousands), the strokes of each character are written in a specific order (top to bottom, left to right, several other rules).

Here, finally, is my problem. I can't find a database with the stroke-order or component data. I found data on ChineseEnglish definitions, on the radicals of the characters (this is used in Hanzi Master), the characters' stroke count, etc. I'd really like to find a database with the stroke order. I want to program a little flashcard application that teaches you how to write the characters. I want to find data on the character componenents more out of curiosity, though I could also use that in a flashcard kind of app.

I know this kind of data must exist because I've seen applications that show how to write characters. But I don't know if the data is publicly useable. If anyone has any information, I'd appreciate it. (I just wonder if it's findable - if you know Chinese and know where to look for it. :)

--

[1] c.f. French, where I could get by after one year of classes (and I guess watching the news, reading papers, and hanging out on #perlfr help a lot...)

[2] It also appears on the right side of 北 bei, which you're familiar with from the name 北京 Beijing, which literally means "north capital". Looking at the "jing" character, the top is a lid, the middle box means "mouth", and the bottom (小 xiao) means "small". So you can see how these components are put together to form characters.

[3] "hanzi", 汉字, means "Chinese character(s)"

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • i don't use chinese that much now ( esp for writing ). seeking from my memory..

    i do not remember learning stroke order for hanzi from school. many chinese characters are composed by some simple characters such as 文 then 蚊. i think it is also called 偏旁部首 ( the left part may represent the sound and the right part may represent the meaning). some of the simple character has it's own meaning too.

    one thing i like about chinese character is that the shape/structure of the
    • Thanks, the website does look useful. I liked reading your experience learning how to write Chinese.
  • I googled on Chinese character stroke order and found this page, http://www.csulb.edu/~txie/character.htm [csulb.edu]
    where he has his own character drawing program and mentions this company (www.eon.com.hk)'s program that will draw arbitrary characters for you. But I guess it has to be a unicode character which the program has already analyzed, not your own made-up character.

    There are some general rules, like left to right, top to bottom, all based on ease/smoothness of movement of the brush.

    Sometimes there are differe
    • Thanks, the csulb.edu link looks close to what I'm looking for. I hadn't come across that one in searching, apparently. I know there are many sites on line that help you learn Chinese. For example, from that site you linked to, there is a page to learn to write Chinese [csulb.edu], where if you click on a character, an animated gif shows the order in which you draw the character. But that's the problem, it's an animated gif, not a database! Same with this Chinese Writing Master [cchar.com] application, I'd like to find the "raw da

      • Most of the effects of the programs looked at here could be achieved with a drawing program and an image manipulation program.

        A more interesting area to look at for help would be hand-written text to character recognition. I imagine such programs exist, but don't know anything about them.

        They might analyze characters into strokes.

        On the other hand, a demo of a Chinese speech-to-text program I saw was quite impressive. And Chinese text-to-speech programs exist too.

        The example of 七 and 匕 indicat