# [XeTeX] Japanese, Chinese, Korean support for Polyglossia

Gerrit z0idberg at gmx.de
Fri Jul 23 17:15:53 CEST 2010

Hello!

I will try to gather some information about Japanese, Chinese and Korean
support for Polyglossia in the next days.

Because I do not understand tex programming at all, I can only give some
information here. I will try to write it as detailled as possible, so
that the implementation should not be that hard :)

What I understand until now – what is possible, what is too different
would be like this:

For every three languages:

1. Line spacing needs to be increased. All characters from these three
scripts are written in a square, which would be like writing in capitals
all the time in Latin fonts. Because of this, the line spacing would be
too narrow with the default setting.
I do not yet know how much the line spacing actually should be, but I
will try to figure that out.
Also, line spacing should be according to the text environment. If the
default language of the document is some western text, the line spacing
for e.g. \textkorean{} should not be increased. This is because one
would use this option to enter some Korean text in a western text, where
it is not desirable to increase the line spacing (you would not do that
if you enter an abbreviation in all caps, either).
If a CJK language is chosen with \setdefaultlanguage or \begin{korean},
the line spacing should be adjusted, though.

2. A date would be in this format: 2010 [word for year] 7 [word for
month] 23 [word for month].
In Chinese and Japanese, this would be: 2010年7月23日
In Korean it would be 2010년7월23일

3. Chapternames etc. are written with the number between two words:
ordinal prefix - number – “chapter”
e.g., “chapter 1”: 第1章 in Japanese or Chinese.

-----

For Chinese and Japanese:

1. There are calendar systems in Japan and Taiwan, which count the year
after the founding of the republic of China or after the current emperor.
In Taiwan, one simply needs to substract 1911 and get the current year.
Also, one needs to write 民國 (Mínguó = “Republic”) in front of the year.
E.g.: 2010-07-23 -> 民國99年7月23日
In Japan, the year is depending on the current emperor.
From 1868 to 1911: Substract 1867 and add a 明治 (Meiji) before the number.
e.g.: 1905 -> 明治38年
From 1912 to 1925: substract 1911, add 大正 (Taishō)
From 1926 to 1988: substract 1925, add 昭和 (Shōwa)
From 1989: substract 1988, add 平成 (Heisei)
if it is the first year of the emperor, don’t write 1年, but write 元年,
e.g. 昭和元年.
I think, only the last emperor, Heisei, is of practically relevance. It
would be nice to include the other ones, though.
Before 1868 it is too hard, because they still used the lunar calendar
at that time. I think nobody needs a calculation for that, though.

2. Both languages still use Chinese numerals, although to a different
kind of degree.
They need to be converted from arabic digits. The method is different
sometimes.
For year numbers and page numbers (seldom): Just replace every arabic
digit with the appropriate Chinese digit (一二三四五六七八九〇). E.g.
page 354 = 三五四. Year 1980 = 一九八〇年. But: 民國九十八年 (十 = 10;
For other numbers: e.g. 1324 = 一千三百二十四

3. Another option: If arabic numbers are used, they may need to be
converted to full width numbers. e.g. 3 = ３

--------

For Japanese:

1. kinsoku shori (line breaking rules). In Japanese, a line cannot be
broken at every character (like it would be in Chinese). Some
punctuation marks are prohibited to start or end a line (e.g. 。、「),
just like in western languages. Also, some Kana are not allowed to start
a line (ょ、－、っ etc.).
There are different levels of strictness. Punctuation marks like 。 are
never allowed to break, but for e.g. ょ, the situation may be different.
There could e.g. be 3 levels of strictness: off (break everywhere), low
(break everywhere except in front of 。 etc)., medium (don’t break in
front of ょ, but in front of －), high (don’t break in front of ょ, －
or any other similar character).
Because Japanese is written without spaces, it can be a little bit
difficult to achieve this effect. Characters like 。、 are just written
at the end of the line, so that the line becomes a little bit longer. In
other cases, it may be necessary to shorten or lengthen the spacing.
Usually, the only place where this is possible is before/after  。、「
and similar characters. Also, in some fonts, the characters are not
actually all the same size, so it may be possible to do that there (not

For Chinese:

1. They still use the lunar calendar (I don’t yet quite understand the
calculation). But this is very optional. I don’t think that this is ever
used in academic writings. Even if, you could just write it by hand.
Would be a nice feature, though.

2. Support for simplified and traditional Chinese is needed. This would
some other, typographic effects.

Features, which may not be easily achieved:

1. Vertical writing. Absolutely necessary, but I think extremely hard.
May need some drastically changes in xetex, if it should not be a dirty
hack (“put every character in a box and then put all the boxes under
each other”). Maybe not as necessary for academic writing, though. This
depends on the subject. In subjects, where mathematics is used, vertical
writing is not useful. But I think, it is still extensively used in
subjects like history etc.

2. Ruby characters. They are also extremely necessary (for Japanese).
They are smaller characters put on top (or below) of the Chinese
character to indicate the reading. Basically, they are put between the
lines (in the line spacing), with no change in the line spacing. There
are different ways of ruby annotations, e.g. mono ruby (every character
has its pronounciation), group ruby (a complete word, consisting of
multiple Chinese characters, has the reading put on top). Also, the ruby
character can overlap on the other characters next to the word (Ruby
characters are printed at half the size of the base text, which gives
every Chinese character room for two ruby characters. There may be words
where the reading is longer than that, e.g. 承る with the ruby
characters うけたまわ). It can also put a space between the word (in
compounds. E.g. 躊躇 (ちゅうちょ) would be too long, so it may be
stretched like 躊 躇.
In vertical writing, the ruby characters go on the right side of the line.

There are also ruby characters (Zhuyin Fuhao) in Taiwan, which is more
complicated. In vertical writing, they are written like Japanese on the
right side of the line. In horizontal writing, they are, unlike
Japanese, written on the right side of the character. It is more
difficult, because the characters forming a syllable themselves need to
be stacked vertically, even in horizontal writing, but the tone mark
goes on the right side of the sylabble. It may be better to let a
Opentype font handle the composition of the sylabbles (for example via
ligatures), because I guess that Xetex would not achieve a visually
pleasing result. The problem is, that there are no opentype fonts who do
that, as far as I know.

I think, there is a ruby package for the old cjk package, but I don’t
know if that still works with Xetex.

3. Emphasis. There is no italic writing in Chinese characters. In
Japanese, emphasis is done by putting 、 on top of every character (as a
ruby character). This method is quite easily achieved if ruby characters
are supported. I am not sure about Chinese, but I think they do that
with a dot, similarly to Japanese.

4. Footnotes: In Japanese, they are also done like the emphasis mark, as
a ruby character.

Ok, that is all which comes to my mind right now. I will gather more
information.

I wonder if polyglossia is the right approach for everything? Of course,
polyglossia, but what about ruby characters?
I think, it may be nice to have a CJK package which offers support for
vertical writing, ruby, maybe calculation of the calendars etc.
They are extremely necessary for these languages, but may not be needed
for other languages. Maybe it would be good if polyglossia loaded this
package if it detects one of these three languages. This would then make
it easy to actually use for example Japanese, because it is not
necessary to know which packages you need to load.

e.g. just load polyglossia and set Japanese, and it will automatically
load packages for vertical writing and ruby characters, without the need