[XeTeX] xelatex: problem with dumping format
Jonathan Kew
jonathan_kew at sil.org
Fri Jun 4 16:04:57 CEST 2004
On 4 Jun 2004, at 2:11 pm, Gilles Pérez-Lambert wrote:
> Hello,
>
> I tried to create the format for xelatex (with fmtutil) after
> installing it in my texlive~2003 distribution and I got:
> .
> .
> ....many things...
> (/usr/local/texlive2003/texmf/tex/generic/hyphen/hrhyph.tex)
> \l at hungarian=\language16
>
> (/usr/local/texlive2003/texmf/tex/generic/hyphen/huhyph.tex
> Hungarian hyphenation patterns
> ! Nonletter.
> l.51 1b<C3><AF><C2><88><C2><B1>
> be
> ?
> ! Emergency stop.
> l.51 1b<C3><AF><C2><88><C2><B1>
> be
> End of file on the terminal!
> .
> .
> For now, I suppressed the Hungarian and the Russian hyphantion patterns
> in language.dat for xelatex to work.
>
> Any idea?
This is a known issue, mentioned on the XeTeX FAQ page at:
http://scripts.sil.org/xetex_faq
See the question "XeTeX is installed, but there's no xelatex.xfmt
format so I can't use it with LaTeX files. And fmtutil can't seem to
build this format. What's wrong?".
XeTeX reads all input files as Unicode text. This means that:
(a) Plain ASCII files are fine, because they're also valid UTF8-encoded
Unicode
(b) 8-bit files with non-ASCII characters cannot be read, in general,
and even if they can be read, they probably won't be interpreted as
expected.
There are some additional considerations that apply to some of the
multilingual hyphenation patterns (as well as to your input text files,
of course):
(c) Many of the 8-bit codes in TeX-Latin1 (Cork) encoding correspond
directly to Unicode codepoints. Therefore, if a file uses these
character codes, expressed as ^^xx sequences, it will work fine. (But
if it uses the literal 8-bit characters, XeTeX will try to interpret
them as UTF8 sequences, and fail.)
(d) There are exceptions to (c); in particular, the codes 0x80..0xBF
don't match, nor do 0xDF and 0xFF, if I recall correctly. So the fact
that such a multilingual file can be read by the program doesn't
necessarily mean that it will be correct for the Unicode environment.
I have adapted some of the pattern files that currently give trouble,
including Hungarian and Russian, to be readable in XeTeX and to load
the correct Unicode-encoded patterns; I expect these will be included
with the next package I release. And they'll provide a model for how
others can be updated, too. (I've tried to do this in such a way that
the same files can still be used in standard TeX as well as in XeTeX,
despite the different encodings in use.)
Note, however, that loading correct Unicode patterns will NOT give the
expected hyphenation if you try and use them in conjunction with text
in some other encoding! XeTeX is really designed to be purely a Unicode
system; it does try to continue working with older non-standard
encodings, to the extent that these can be treated as though they were
simply Unicode values re-used, but a mixed-encoding world is a messy
place to live.
> By the way, is there a way to have babel work with xetex?
I know essentially nothing about babel, but my impression is that it
is, partly at least, a solution for working with multiple input and
font encodings in the legacy 8-bit world, and so I suspect the marriage
of babel with XeTeX will be an untidy affair at best.
But someone who actually knows about it, and also understands Unicode
issues, may be able to answer more fully.
Hope this is helpful,
Jonathan
More information about the XeTeX
mailing list