[XeTeX] Issue with CJK in pdf build

Wilfred van Rooijen wvanrooijen at yahoo.com
Thu Nov 19 03:25:43 CET 2009


> 
> What I don't understand is that the dblatex manual
> apparently states
> that mixing 'languages' (scripts, rather) is not possible.
> Since one
> really wonderful thing about Unicode is that you can mix
> different
> scripts in the same document, I wrongly concluded that it
> was time I
> look elsewhere for my fun. Must have gotten my eyes crossed
> since it
> seems you only need joints that are flexible enough to
> perform the
> contortions.

There are many "tex-engines". The oldest is TeX itself. LaTeX is a "macro-package", which provides extra commands etc for TeX. But there are more TeX-engines, each with specific extra functionality (from the top of my head, there may be some inaccuracies):

- etex: able to set text Right-to-Left as well as Left-to-Right
- Omega: more internationalization support (Lambda is latex for omega)
- pdftex: sets text directly into PDF, not the older DVI format of TeX (pdflatex is latex for pdftex)
- ptex: a Japanese variant capable of setting top-to-bottom right-to-left as well as left-to-right top-to-bottom. As the old versions of TeX only supported 127 character alphabets, ptex is a different branch on the tex-tree to support the JIS-set of characters (6100+ characters)
- several others
- none of these flavors of tex are able to read UTF-8 or use Unicode fonts. Only xetex can do that.

The moral of the story: in the Good Old Days, you could only use, say, Latin characters in a TeX document, and only with extra work was one capable of including CJK, Hindi, Urdu, Sanskrit, Thai or whatever. One document with Chinese and Latin was possible, but Chinese, Korean and Hindi in one document was out of the question, or required some "non-standard" packages and stylefiles. Therefore the warning in the toolchain: use only one characterset, otherwise your TeX will (likely) explode. I guess the maintainers of the toolchain should add some warnings about xetex and the possibilities of completely unlimited use of characters.

> No..!! What I expected is that the _asciidoc/a2x tool
> chain_
 what
> to do. But on second thoughts, since my locale is set to
> en_US.UTF-8 I
> have a feeling that's what it's doing and chooses a font
> that has all
> the glyphs that would be of interest, including arrows, box
> drawing,
> etc. on top of the ASCII range.

There are not many fonts that actually have glyphs for *all* the characters presently defined in the Unicode tables. I think only code2000 comes close, and maybe Cyberbit. So at present, there is no "catch all" font.

> 
> In terms of functionality, probably. But then, is there an
> alternative?

I don't know what this toolchain does :-)). But given the tremendous amount of text-processors in the world, I'd say, yes, there is an alternative :-))

Wilfred


      Get your new Email address!
Grab the Email name you've always wanted before someone else does!
http://mail.promotions.yahoo.com/newdomains/aa/


More information about the XeTeX mailing list