[XeTeX] Polyglossia: Support for romanization of CJK

Gerrit z0idberg at gmx.de
Wed Jun 15 20:44:18 CEST 2011


Hello again, everyone,

I am currently writing an article, in which I also have some 
romanization of Japanese. Until now, I have to define the hyphenation 
manually, which I think is a little bit of a nuisance.

So I wonder if it is possible to include at least hyphenation for 
Japanese, Chinese and Korean? Full support of CJK scripts may be a 
little bit in the future, but I think that at least hyphenation patterns 
shouldn’t be that hard, because the romanizations are quite regularly. 
Unfortunately, I don’t really have any idea how to do that, so would 
someone be willing to help me with it? I think, the basic rules would be 
like that (just some preliminary thoughts):

Japanese - Hepburn:
Syllable structure are always consonant-vowel or consonant-vowel-n. 
Sometimes, if there is a double consonant (e.g. “/asatte/”), hyphenation 
should take place between the double consonant.

Chinese - Pinyin:
Syllables can end with a vowel (/lai/), n (/wan/) or ng (/zhong/). Some 
words like /xian /cannot be hyphenated, in contrast to words like 
/Xi’an/. Maybe for that, we could just insert all syllables (about 200 
or so) in the hyphenation file. Maybe it is important that tone marks 
have to be ignored, so that /Zhōngwén /is treated the same as /Zhongwen/.

Korean:
No idea, actually. :(

For Chinese, it would also be nice to have some kind of 
Tone-marks-escaping. Either, for the ease of typing, do it automatically 
when a syllable is followed by a number: Zhōngwén: 
\textchinese{Zhong1wen2}. Or, do it with some kind of escaping: 
\textchinese{\Zhong1\wen2} or something like that. Maybe the first 
method would be nicer to type, but could be a nuisance if you want to 
mix numbers with text, although I think that this will not be the case 
that often. For Wade-Giles, the same thing could be done for putting the 
tone numbers in a superscript (Chung¹-wen²). For that, I think the 
writer has to chose the romanization system in advance.

What do you think about that? Currently, Polyglossia has a huge “hole” 
for CJK languages. Even if there is currently manpower lacking for nice 
full support of the scripts themselves, I think romanization is needed 
as well (maybe even more). If we could start with at least hyphenation 
support for romanization, we could gradually improve support of the 
other features (spacing, word breaking rules for Japanese, ruby, 
vertical writing etc.) as well. I think, it is easier to start with some 
small, easy stuff, instead of the difficult features.

I think providing translations for table of contents and so on would be 
easy as well, this could be the next step.

Gerrit
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://tug.org/pipermail/xetex/attachments/20110615/958f1a5b/attachment.html>


More information about the XeTeX mailing list