[XeTeX] Several suggestions and bug reports concerning Asian language support

Yin Dian yindian at gmail.com
Thu Oct 18 16:39:39 CEST 2007


Hi,

I've been using XeTeX for a long time and I really like it, for its native
Unicode support as well as opentype/truetype capabilities, which give XeTeX
great advantages over traditional TeX engines. Recently, I wrote the macro
package zhspacing for fine-tuning Chinese document using
\XeTeXinterchartoks, and successfully typesetted an ancient Chinese book
with vertical typesetting and warichu, using XeLaTeX + zhspacing + gezhu(my
macro package to typeset warichu). Many thanks to all XeTeX developers and
contributors, especially Jjgod.

During my using experience, I've found several bugs and got some thoughts
concerning Asian language document typesetting. However I know little about
the XeTeX source code so I'm unable to make a change. Now I post them here
in the hope that we can make XeTeX better.


First, here are the three suggestions.

1. I wish XeTeX 0.997 could provide a means to determine the character class
number for a given character, for example, a primitive \XeTeXgetcharclass,
which expands to the class number of the following char slot.

It is desired because when typesetting document in several Asian languages
such as Chinese and Japanese, spaces between fullwidth characters are
usually ignored in the source tex file, like in the CJK* environment when
using LaTeX + CJK. So, a space between two fullwidth characters first
triggers the inter-char token between the class number (say A)  of the
previous character and 255, then between 255 and the class number (say B) of
the next character. However, to perform correct spacing adjustment and
prohibition, one needs to know the class number B when the inter-char token
between A and 255 is triggered. To get the character after the space,
\futurenonspacelet is enough, which can be found in TeXbook Appendix D. The
problem simply lies in getting the class number out of the character.

In zhspacing, to add skips between Chinese and Western characters when they
are seperated by spaces, I used \ifcjkchar to test whether the character
after the spaces is a CJK character, which gets the character's Unicode
ordinal and compares it against several ranges. Dealing with punctuations
seperated by spaces could be more difficult, without a means to determine
hte class number of a given character.

2. I wish XeTeX or fontspec could provide a means to adjust the baseline of
characters in specified ranges or character classes. Take CJK's macro
\CJKhdef as an example of adjusting the baseline of CJK characters.

This comes to demand when typesetting documents with mixed vertical and
horizontal fonts. Fonts that are with :vertical feature seem to have their
baselines in the center, while horizontal/normal fonts have their baselines
near the bottom. So it's really ugly to typeset a vertical Chinese document
with intermixed English or halfwidth numbers. Ability to adjust the baseline
of certain characters or font shapes could solve this problem.

3. I suggest XeTeX can provide interfaces to use dvipdfmx's font slant and
stretch feature, as well as control wheter or not to embed the specified
font. Mr. Sun Wen-chang's latex package ttfshape can achieve these by
dynamically generate dvipdfmx map files.

This is needed because almost all Chinese fonts don't come with an
italic/slant shape. In LaTeX + CJK slant shape for Chinese can be achieved
using dvipdfmx's slant feature (adding -s 0.167 in cid-x.map), but in XeTeX
it seems not to have a solution currently. Also, as many Chinese fonts are
quite large, so user may not want to embed them into pdf in order to shrink
the file size, so controlling whether or not to embed the font is desirable.


Now, here is a list of bugs I've found so far, with sample source and output
pdf attached.

1. Vertical layout of several Chinese fonts.

XeTeX couldn't handle the line break, \hbox enclosing, and punctuation of
several Chinese fonts shipped with Windows XP and Office, including KaiTi
and FangSong.

You can also see that the heights and widths of none of the Chinese fonts in
vertical are correct, which results in a misplaced \fbox.

This bug was reported to Jjgod long ago, but he seems not to have fixed it.
He had asked a question about this here:
http://tug.org/pipermail/xetex/2007-May/006423.html.

2. Wrong width for CJK Ext-B characters.

This bug is introduced in revision 93 of xdvipdfmx. Revision 92 generates
the correct result, while the following revisions not.

3. Wrong \meaning result for some CJK Ext characters

\meaning generates "the letter ???". Should be the character itself instead
of three question marks.


Finally, my words are over. Sorry for the trouble I made and thanks for your
patience to read such a long mail :-)

-Yin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://tug.org/pipermail/xetex/attachments/20071018/45117cb1/attachment-0001.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: vertbug.tex
Type: application/x-tex
Size: 910 bytes
Desc: not available
Url : http://tug.org/pipermail/xetex/attachments/20071018/45117cb1/attachment-0003.tex 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: vertbug.pdf
Type: application/pdf
Size: 14313 bytes
Desc: not available
Url : http://tug.org/pipermail/xetex/attachments/20071018/45117cb1/attachment-0004.pdf 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cjkextbbug.tex
Type: application/x-tex
Size: 325 bytes
Desc: not available
Url : http://tug.org/pipermail/xetex/attachments/20071018/45117cb1/attachment-0004.tex 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cjkextbbug-correct.pdf
Type: application/pdf
Size: 6320 bytes
Desc: not available
Url : http://tug.org/pipermail/xetex/attachments/20071018/45117cb1/attachment-0005.pdf 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cjkextbbug.pdf
Type: application/pdf
Size: 6326 bytes
Desc: not available
Url : http://tug.org/pipermail/xetex/attachments/20071018/45117cb1/attachment-0006.pdf 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: meaningbug.tex
Type: application/x-tex
Size: 66 bytes
Desc: not available
Url : http://tug.org/pipermail/xetex/attachments/20071018/45117cb1/attachment-0005.tex 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: meaningbug.pdf
Type: application/pdf
Size: 4883 bytes
Desc: not available
Url : http://tug.org/pipermail/xetex/attachments/20071018/45117cb1/attachment-0007.pdf 


More information about the XeTeX mailing list