<HTML><FONT FACE=arial,helvetica><HTML><FONT COLOR="#000000" FACE="Geneva" FAMILY="SANSSERIF" SIZE="2"><BR>

In a message dated 4/7/04 1:37:33 pm, prohaska@zib.de writes:<BR>

<BR>

<BR>

<BLOCKQUOTE CITE STYLE="BORDER-LEFT: #0000ff 2px solid; MARGIN-LEFT: 5px; MARGIN-RIGHT: 0px; PADDING-LEFT: 5px" TYPE="CITE"></FONT><FONT COLOR="#000000" FACE="Geneva" FAMILY="SANSSERIF" SIZE="2">A design decision made in xetex is to completely switch to unicode and<BR>

don't<BR>

support any old tex encoding. The changes in the encoding of quotes<BR>

(''),<BR>

dashes (--), special characters (\"a, \"o) etc. make it really hard to<BR>

switch.<BR>

You either have to recode all your input files or you can't use xetex.<BR>

<BR>

</BLOCKQUOTE></FONT><FONT COLOR="#000000" FACE="Geneva" FAMILY="SANSSERIF" SIZE="2"><BR>

<BR>

I believe there are at least two fairly simple solutions to this problem. <BR>

<BR>

(1) You can use a Perl filter or shell script to convert your legacy files. This depends on the text editor you use, for BBEdit you could use something like:<BR>

<BR>

#!perl<BR>

#use utf8;<BR>

while(&lt;&gt;)&nbsp;&nbsp;&nbsp;  <BR>

{<BR>

&nbsp;&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;&nbsp; s/\\"a/ä/g;<BR>

&nbsp;&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;&nbsp; s/\\"u/ü/g;<BR>

&nbsp;&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;&nbsp; s/\\"o/ö/g;<BR>

&nbsp;&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;&nbsp; s/--/–/g;<BR>

&nbsp;&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;&nbsp; s/---/—/g;<BR>

&nbsp;&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;&nbsp; s/\`/‘/g;<BR>

&nbsp;&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;&nbsp; s/\`\`/“/g;<BR>

&nbsp;&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;&nbsp; s/\'/’/g;<BR>

&nbsp;&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;&nbsp; s/\'\'/”/g;<BR>

print&nbsp;  "$_";<BR>

}&nbsp;&nbsp;&nbsp;&nbsp; <BR>

<BR>

If you use a Cocoa text editor such as SubEthaEdit you can do the same with the TextExtras service to put a conversion script into the menu bar of any Cocoa app.:<BR>

<BR>

<BR>

#! /bin/sh<BR>

#<BR>

# Converts legacy TeX abbreviations to UTF-8<BR>

#<BR>

# -- TextExtras User Script Info --<BR>

# %%%{TEName=Legacy to utf-8}%%%<BR>

# %%%{TEInput=Selection}%%%<BR>

# %%%{TEOutput=ReplaceSelection}%%%<BR>

# %%%{TEKeyEquivalent=@8}%%%<BR>

# %%%{TEArgument=-c}%%%<BR>

# Start with the standard input<BR>

INPUT=`cat -`<BR>

# Comment<BR>

&nbsp;&nbsp;&nbsp;  OUTPUT=`echo "${INPUT}" | sed -e 's/\\"a/ä/g;s/\\"u/ü/g;s/\\"o/ö/g;s/--/–/g;s/---/—/g;s/\`/‘/g;s/\`\`/“/g;s/\'/’/g;s/\'\'/”/g;'`<BR>

echo -n "%%%{TESelection}%%%"<BR>

echo "${OUTPUT}"<BR>

echo -n "%%%{TESelection}%%%"<BR>

<BR>

<BR>

(2) You can redefine the commands \", \= etc. and turn quotation marks into active chararacters (which effectively makes them commands). Much of this has already been discussed on this list. For instance, to get ` and ' to produce styled quotation marks you can say:<BR>

%<BR>

\font\testfont="Zapfino" at 12pt<BR>

%<BR>

\catcode`\'=\active\def'{’}<BR>

\catcode`\`=\active\def`{‘}<BR>

%<BR>

\testfont<BR>

%<BR>

This is a `test'.<BR>

%<BR>

\bye&nbsp;  <BR>

<BR>

<BR>

To get double quotes you need to do a test as can be seen on TeXbook page 395.<BR>

<BR>

I hope this might convince you that XeTeX is not inherently incompatible with legacy TeX abbreviations.<BR>

<BR>

Rather than urging Jonathan to implement this stuff in XeTeX directly one can just put it all into a style file tailored to one's needs. Much of this will also remain language and font dependent. I would much rather see Jonathan add margin kerning and some kind direct access to override Adobe's rather inadequate Opentype ligature mechanism.<BR>

<BR>

regards,<BR>

<BR>

Somadeva Vasudeva<BR>

Wolfson College<BR>

Oxford OX2 6UD<BR>

</FONT><FONT COLOR="#000000" FACE="Geneva" FAMILY="SANSSERIF" SIZE="2"></FONT></HTML>