[XeTeX] [6933] luatex output encoding

Gavin Smith gavinsmith0123 at gmail.com
Tue Jan 12 20:31:37 CET 2016


It has been suggested to me to let you know about a problem we had
with Texinfo with XeTeX about character encoding. XeTeX reads and
writes to files by default using a UTF-8 encoding. It's possible to
override the input encoding with \XeTeXdefaultencoding and
\XeTeXinputencoding, but as far as we can tell there's no
corresponding command for the output encoding. We managed to fix this
problem for LuaTeX for both input and output, but XeTeX appears to
have a setting for input only.

This is a much smaller problem than it would be it would be if the
input encoding couldn't be set, but it is a problem when reading and
writing to auxiliary files to handle indices, cross-references and
tables of contents. For example, a chapter title may have a non-ASCII
character in it, e.g. "ü". When we set \XeTeXinputencoding "bytes",
this is read in as two tokens with hex values c3 bc. That's intended:
texinfo.tex needs the individual byte values. But then when it's
written out to build up the table of contents, this gets written out
as two UTF-8 characters (ü, bytes c3 83 c2 bc) which isn't what is
needed: we want to write out the two bytes, that is a single UTF-8
character. As it is, this means when the table of contents is typeset,
the character "ü" comes out as "ü".

If we're right in thinking there's no way to set the output encoding
(for \write), it might be a good idea to add one.

Best wishes,
Gavin



More information about the XeTeX mailing list