[XeTeX] Western encoding and XeTeX

Jonathan Kew jonathan_kew at sil.org
Tue Nov 13 21:45:41 CET 2007


On 13 Nov 2007, at 7:52 pm, Marcel Korpel wrote:
>
> When typing é and ü again, I noticed that the system expects Unicode
> characters. I ploughed through this mailing list and Google & Co., but
> couldn't find a useful solution. As a (former?) TeTeX user I use
> tex-text.tec to convert -- etc. to the correct 'ligatures'. I think
> that using TECkit it must be possible to convert é and ü etc. in the
> same way. Is there perhaps already a 'standard' solution
> (map/tec-file) to workaround this problem?

You can't deal with these in the same way, as a "font mapping" within  
xetex; if you have 8-bit Western Latin-1 character codes in your  
file, and xetex tries to interpret the file as UTF-8, the proper  
codes will be lost. (Really, it should probably stop with an error  
message message in this case, rather than simply discarding data.)

My strong recommendation would be to convert the text files to  
Unicode (UTF-8), which can easily be done at the command line with  
iconv:

     iconv -f latin1 -t utf8 oldfilename.tex > newfilename.tex

or similar. If you want to use such files with an 8-bit TeX, you can  
just load the inputenc package with [utf8] option; xetex will read  
them directly with no special package.

If you really don't want to convert your files to Unicode -- although  
that is the One True Way of the future :) -- then you'd have to use  
the (rather obscure) \XeTeXinputencoding command to tell xetex to  
convert Latin-1 to Unicode as it reads them. If you have  
\XeTeXinputencoding "ISO-8859-1" at the beginning of your document,  
or at least before any non-ASCII characters such as accented forms  
occur, it should work.

Converting to Unicode throughout is the better option in most cases,  
however.

JK



More information about the XeTeX mailing list