# [XeTeX] First experiences with xetex and some bugs

Sun Jul 4 16:24:16 CEST 2004

In a message dated 4/7/04 1:37:33 pm, prohaska at zib.de writes:

> A design decision made in xetex is to completely switch to unicode and
> don't
> support any old tex encoding. The changes in the encoding of quotes
> (''),
> dashes (--), special characters (\"a, \"o) etc. make it really hard to
> switch.
> You either have to recode all your input files or you can't use xetex.
>
>

I believe there are at least two fairly simple solutions to this problem.

(1) You can use a Perl filter or shell script to convert your legacy files.
This depends on the text editor you use, for BBEdit you could use something
like:

#!perl
#use utf8;
while(<>)
{
s/\\"a/ä/g;
s/\\"u/ü/g;
s/\\"o/ö/g;
s/--/–/g;
s/---/—/g;
s/\/‘/g;
s/\\/“/g;
s/\'/’/g;
s/\'\'/”/g;
print   "$_"; } If you use a Cocoa text editor such as SubEthaEdit you can do the same with the TextExtras service to put a conversion script into the menu bar of any Cocoa app.: #! /bin/sh # # Converts legacy TeX abbreviations to UTF-8 # # -- TextExtras User Script Info -- # %%%{TEName=Legacy to utf-8}%%% # %%%{TEInput=Selection}%%% # %%%{TEOutput=ReplaceSelection}%%% # %%%{TEKeyEquivalent=@8}%%% # %%%{TEArgument=-c}%%% # Start with the standard input INPUT=cat - # Comment OUTPUT=echo "${INPUT}" | sed -e 's/\\"a/ä/g;s/\\"u/ü/g;s/\\"o/ö/g;s/--/–
/g;s/---/—/g;s/\/‘/g;s/\\/“/g;s/\'/’/g;s/\'\'/”/g;'
echo -n "%%%{TESelection}%%%"
echo "\${OUTPUT}"
echo -n "%%%{TESelection}%%%"

(2) You can redefine the commands \", \= etc. and turn quotation marks into
active chararacters (which effectively makes them commands). Much of this has
already been discussed on this list. For instance, to get  and ' to produce
styled quotation marks you can say:
%
\font\testfont="Zapfino" at 12pt
%
\catcode\'=\active\def'{’}
\catcode\=\active\def{‘}
%
\testfont
%
This is a test'.
%
\bye

To get double quotes you need to do a test as can be seen on TeXbook page
395.

I hope this might convince you that XeTeX is not inherently incompatible with
legacy TeX abbreviations.

Rather than urging Jonathan to implement this stuff in XeTeX directly one can
just put it all into a style file tailored to one's needs. Much of this will
also remain language and font dependent. I would much rather see Jonathan add