[XeTeX] How to use EC font encoding in XeTeX?

Fri May 19 11:44:18 CEST 2006

On 19 May 2006, at 1:21 am, Mojca Miklavec wrote:

> However here's my next question (probably there will be another no as
> an answer):

Actually, the answer is "yes". :)

> Is it possible to use any other 8-bit input encoding (such as cp1250)
> or is only utf-8 currently supported? I tried to test it on some old
> documents, but the behaviour is completely random and depending on the
> surronding characters.

The input is interpreted as UTF-8 by default, and so the bytes in  
your cp1250 text are (mis)interpreted as being part of UTF-8  
sequences, leading to apparently random results -- not truly random,  
of course, but certainly not useful!

However, you can say

	\XeTeXinputencoding "cp1250"

and then input will then be interpreted as codepage 1250 (and mapped  
to the corresponding Unicode character codes for processing within  
XeTeX).

To be precise, \XeTeXinputencoding will change the interpretation of  
the input bytes beginning at the *next* line of the input file.

There is also \XeTeXdefaultencoding, which is similar, but it does  
not affect the reading of the *current* file; instead, it changes the  
initial encoding for any files that are *subsequently* opened.  
Therefore, you can use this to get XeTeX to read an existing file in  
a legacy encoding, without having to edit that file itself -- just  
set the default encoding from a "driver" file before doing \input.

One caution about setting the default encoding: regardless of the  
\XeTeX...encoding settings, any files that XeTeX writes (with \write)  
will *always* be written as UTF-8. So if you're writing and then  
reading auxiliary files from within the job, these will be UTF-8 even  
if your main input text is a legacy codepage, and you may have to  
take care to switch the default encoding back to utf8 before opening  
such a file.

> ConTeXt creates active characters for 128-255 when working with pdfTeX
> and maps the characters to \namedglyphs which works OK there, but
> XeTeX works completely different with input encoding. I approximately
> understand why it doesn't work here, but is there a trick to make
> XeTeX work with those 8-bit encodings as well?

There's another possibility, too: if you use

	\XeTeXinputencoding "bytes"

then XeTeX will simply read the input text as byte values 0..255,  
with no attempt to map them to Unicode according to any specific  
codepage. (In practice, this is probably the same as using encoding  
"8859-1" or "latin1".) If you do this, then ConTeXt's active- 
character scheme for handling encodings within TeX macros should  
still work, as you'll be getting the same character codes as standard  
TeX would see.

> I might have something misconfigured or too old version installed (I
> didn't install .992 yet), so I'm just curious: what do you get if you
> type
>     \catcode`ð=\active \defð{^^f0}
>     ð
> My version seems to have some problems with ^^f0.

It seems to work for me -- with font ec-lmr10, it prints the ð  
character, as expected. What do you get?

JK