[XeTeX] Polyglossia: Support for romanization of CJK

Gerrit z0idberg at gmx.de
Thu Jun 16 12:38:57 CEST 2011


Am 16.06.2011 01:41, schrieb mskala at ansuz.sooke.bc.ca:
> I thought the original poster was talking about segments of text written
> in romanized Japanese as the only script - not phonetic guide texts
> (furigana) attached to Japanese script, nor equivalents in other
> languages.  The issues you describe are interesting for knowing how to
> break furigana when words are split at the end of a line, but I don't
> think they're relevant to the original poster's question; it sounded like
> they had a pretty clear, and simple, idea of the hyphenation they wanted
> to use for romanized text, and it would be a fair bit simpler than the
> existing algorithm currently used for languages like English.
>
> I'm not sure that romanized Japanese is used enough for texts of more than
> a few words, to justify a lot of development effort going into figuring
> out how to hyphenate it beyond the original poster's immediate
> application.  Do any established standards or traditions exist for such
> hyphenation at all?

Hello,

yes, I thought exactly of such a few words in a western text.

For example, in situations like this: “A town where many hot springs 
(/onsen/) are located is Beppu in Kyūshū. ”
Here, you have three Japanese words in a text: onsen, Beppu, Kyūshū. The 
hyphenation rules would be quite easy: on-sen, bep-pu, kyū-shū.
Of course, you can do this all manually, but I think if one writes a 
text in japan studies or somewhere like that, occurrences of romanized 
Japanese can be quite often. Also, if you have a Japanese book in a 
reference section, you may need to write the complete title in 
romanization, where hyphenation may be needed as well: “Wakabayashi 
Masahiro: Taiwan - Henyō shi chūcho suru aidentiti.”
This is often not so much of a problem, because many Japanese words are 
not that long (but there are some longer words!), but still, the space 
will not be used perfectly.

Because the hyphenation rules seem to be very easy, I think it shouldn’t 
be much of a problem to create rules for it. I am not sure if there 
exist standards or traditions for that, but I can imagine it. Especially 
Pinyin has many rules, so I think there would be romanization for that. 
Japanese on the other side is quite easy to structure, so I think 
hyphenation, even in absence of specific rules, is clearly straightforward.

Furigana hyphenation etc. is an entire different field: For that, we 
would first need Furigana support, which seems to be very difficult (or 
at least needs some work). But romanization hyphenation seems easy.

By the way, I think this romanization hyphenation is not only necessary 
for Japanese or Chinese, but for any other language as well (Arabic, 
Greek, Russian, Thai, etc.). For Japanese and Chinese the advantage is 
that you have a universal romanization, but for other languages there 
seem to be romanization specific to the target language (e.g. Arabic 
romanization for German). Still, there also seems to be some scientific 
romanization which could be the standard. After all, I guess Latex (and 
Xelatex) is most often used for western scientific texts, which often do 
include these romanized foreign terms (if they deal with these areas).

Gerrit


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://tug.org/pipermail/xetex/attachments/20110616/65a4a751/attachment.html>


More information about the XeTeX mailing list