[XeTeX] Handling of the ^^-input

Jonathan Kew jonathan at jfkew.plus.com
Wed Oct 8 18:22:53 CEST 2008


On 8 Oct 2008, at 8:45 PM, Ulrike Fischer wrote:

> After some thought I at least found this example (a ansinew file):
>
> \XeTeXinputencoding "cp1252"
> \documentclass[11pt,a4paper]{book}
> \begin{document}
> \catcode`\€=\active
> \def€{EuroSign}
>
> \catcode`\^^80=\active
> \def^^80{Roof notation}
> € ^^80
>
> \end{document}
>
> When run with LaTeX (without the \XeTeXinputencoding line) the
> definition of ^^80 overwrites the definition of €. With xetex € and  
> ^^80
> give different results.  € is no longer accessible through the ^^80
> notation with xetex (in 8-bit files), you must use ^^^^20ac instead.

Right. \XeTeXinputencoding "cp1252" will cause the literal bytes of  
input to be mapped to Unicode codepoints through the given codepage,  
so the € character (0x80 in cp1252, I guess) will be mapped to U+20AC.  
This happens at the very first level of input.

*Then* the ^^ notation will be handled (this depends on TeX \catcodes,  
of course), and sequences of the form ^^hh are replaced by the  
character code given. So ^^80 becomes the character U+0080 (which is a  
control code in Unicode, not something you usually want to use). This  
is unaffected by the \XeTeXinputencoding.

So to input the Unicode character for the Euro sign using ^^ notation  
in XeTeX, you need ^^^^20ac, regardless of the \XeTeXinputencoding.  
And the € character in input will be mapped to U+20AC if (and only if)  
you set the appropriate \XeTeXinputencoding.

To make xetex behave as much like pdftex as possible, you can use  
\XeTeXinputencoding "bytes". This gives a "straight-through" mapping  
where the byte codes 0..255 become the Unicode codepoints U+0000..U 
+00FF. This would let you read arbitrary byte data and get the same  
numeric character codes as with pdftex. Just remember that it won't be  
valid Unicode! (In general, I wouldn't recommend doing this: if you  
want to work with text and fonts that use 8-bit, non-Unicode  
encodings, don't bother with xetex at all.)

JK



More information about the XeTeX mailing list