[XeTeX] How to prevent Chinese chars to be treated as part of TeX command?
Jonathan Kew
jfkthame at googlemail.com
Sat Oct 17 13:31:02 CEST 2009
On 17 Oct 2009, at 07:02, Joseph Wright wrote:
> mhbezine209 mhbezine2009 wrote:
>> I find a problem of XeTeX: I often encounter errors like
>> "! Undefined control sequence l.6 \TeX浣犲ソ"
>> when I typeset Chinese documents with XeTeX.
>> See example below to have an idea on the source of errors.
>> ------cut from here----------
>> \documentclass{article}
>> \usepackage{xeCJK}
>> \begin{document}
>> \TeX你好Hello
>> \end{document}
>> -------end----------------------
>>
>> Such errors occur when chinese characters (or any other non-ASCII
>> unicode
>> chars) follow a valid command immediately.
>> In other words, if there is no space between Chinese characters and a
>> command name,
>> XeTeX will treat the Chinese characters as part of the command
>> name, so it
>> issues an error message. I do not know whether it is a bug of
>> XeTeX or it
>> is intended. Anyway, I find this design is very annoying because I
>> must
>> manually add a white space or {} after each command name so as to
>> avoid such
>> errors. Does anybody to have good solution to resolve this problem?
>> It would
>> be disirable if this feature of XeTeX can be disabled with one
>> command or a
>> macro. I think it would be better to restrict command names in
>> ASCII chars.
>> Thanks for any discussion on this issue:-)
>>
>
> TeX treats any "letters" as part of a control sequence, so if I write:
>
> \TeXHello
>
> TeX will complain and I need to write
>
> \TeX Hello. All XeTeX is doing is extending this concept to UTF-8 by
> setting a lot
> more characters up as "letters". So everything seems pretty consistent
> to me.
> Most users want to use non-ASCII characters in csnames with XeTeX, in
> any case.
Right. Basically, the character (category) codes in the xetex/xelatex
formats are initialized based on Unicode character properties;
anything that is classified as a "letter" in Unicode is given \catcode
11, so that xetex also treats it as a letter. This includes the
Chinese characters, as well as letters in the various alphabetic
scripts. These assignments are made in the file unicode-letters.tex,
which is loaded during format file creation.
If you want to change this in your documents, you could write a macro
\MakeCJKother that changes the \catcode of all those characters from
11 to 12 ("other"). (Use a loop macro!) Then they will terminate
control-sequence names, just like punctuation characters, etc.
(On the other hand, this would prevent you using multi-chinese-
character macro names such as \你好 or \谢谢. Currently, these work
just like alphabetic equivalents such as \नमस्ते or
\спасибо.)
JK
More information about the XeTeX
mailing list