[XeTeX] Follow-up on CJK (Unicode) and XeTeX (xelatex)

Jonathan Kew jonathan_kew at sil.org
Thu Feb 24 18:34:34 CET 2005

On 24 Feb 2005, at 5:00 pm, Roger Hart wrote:

> First, XeTeX is absolutely amazing in how easy it is to set up fonts. 
> .....  Using XeTex, simply by typing in a font name, it is possible to 
> change, for example, to the 60,000+ character font Simsun (Foundry 
> Extended), or to any other style of Chinese characters.

So nice to know that this works, even for people using completely 
different fonts and scripts than those I work with; making standard 
fonts easier to use was one of the goals of XeTeX. :-)

> Unfortunately, the wonderful package ps4pdf does not work under 
> xelatex.  I assume it would not be hard to fix or find a work-around.

Am I right in thinking that this is a package to allow pdflatex to 
include PS graphics, by converting them to pdf behind the scenes? If 
so, it ought to be possible for one of the LaTeX experts out there to 
figure out how to configure it to work with XeLaTeX, given that XeTeX 
supports pdf graphics, and supports the \write18 mechanism to run shell 
commands (provided you enable it in your configuration).

> Did I overlook something simple here, but is there a way to make 
> xelatex in Computer Modern (no roman font specified) recognize the 
> Unicode characters that seem to work under pdflatex, such as m-dashes, 
> smart quotes, German and French characters?

You mean the use of these characters as literal Unicode in the input 
file? For that to work under standard LaTeX, I assume you use something 
like \usepackage[utf8]{inputenc}, which would map the byte codes that 
represent the Unicode characters in the input file onto LaTeX commands 
to access the appropriate CM characters.

That won't work as it stands under XeLaTeX, because you don't use the 
inputenc package to interpret UTF8 byte sequences; the em-dash, 
accented characters, etc., are simply individual characters, just like 
the ASCII ones.

What you could do is make these characters \active, and \def them to 
generate the appropriate output, e.g.:

	\catcode`—=\active \def—{---} % em-dash
	\catcode`ß=\active \defß{\ss{}} % es-zet
	\catcode`¿=\active \def¿{?`} % spanish begin-question
	% ...etc... for as many Unicode characters as you want to support in 
CM output

This then becomes dependent on the CM font encoding. I'm sure there's a 
True LaTeX Way to do it, which someone like Ross would know, so that it 
interacts properly with the various legacy LaTeX font setups. But in 
general, I'd suggest it's simplest to use a Unicode font for your body 

> Finally, if I might very humbly submit a request -- and I am truly 
> humbled by the work you've done on XeTeX -- could you please make 
> proper wrapping of CJK a priority for the next release? I think that 
> line-wrapping is a very basic capability, it is easy for someone 
> installing XeTeX, on finding CJK lines don't wrap, to just assume that 
> XeTeX is not compatible with CJK.

I'm taking a look at the place where that code needs to be 
inserted..... stay tuned for developments. ;-)


More information about the XeTeX mailing list