[XeTeX] Converting legacy encodings to utf-8

Tue Jul 11 00:48:10 CEST 2006

On 10 Jul 2006, at 10:41 pm, Peter Heslin wrote:

> One of the issues involved with migrating to XeTeX is the
> incompatibility between old documents in various legacy encodings and
> new documents in utf-8.  While it's possible to run the compile the
> former with old TeX and the latter with Xetex, there is a constant
> problem when you want to copy text from the former to the latter.
>
> The solution I have come up with for this is to use Emacs to  
> convert the
> legacy encodings to utf-8.  You see, Emacs comes with many "input
> methods" which allow you to type \`a to get à (TeX input method) or  
> h('|
> to get ᾕ (greek-ibycus4) or <'h| to get ᾕ (greek-babel).  There  
> are also
> input methods for Cyrillic, and various Asian scripts which correspond
> quite exactly to various legacy 7-bit and 8-bit text encodings.
>
> I wrote some code that takes advantage of these methods to translate
> text in files rather than keystrokes, and I posted it to the  
> emacs.devel
> list.  I would be happy to put it up on the web with a detailed
> explanation of how to use it for non-Emacs people, if there is  
> interest.
>
> What do other people do to solve this conversion problem?

Not being an emacs aficionado -- it's been too many years, and my  
fingers have forgotten those keystrokes -- I would typically use  
TECkit <http://scripts.sil.org/teckit> to create mappings for such  
encodings.

If the source text is an industry-standard codepage such as MacRoman  
or Windows CP1250 or whatever, then tools like iconv will do nicely.

Or in the case of "standard" TeX input like \`a, an option is to just  
load the xunicode package (thanks, Ross!) and leave the source file  
as simple ASCII.

Of course, I don't tend to do much back-and-forth interchange between  
xetex and legacy tex systems....

JK