[XeTeX] Converting legacy encodings to utf-8

Mon Jul 10 23:41:35 CEST 2006

One of the issues involved with migrating to XeTeX is the
incompatibility between old documents in various legacy encodings and
new documents in utf-8.  While it's possible to run the compile the
former with old TeX and the latter with Xetex, there is a constant
problem when you want to copy text from the former to the latter.

The solution I have come up with for this is to use Emacs to convert the
legacy encodings to utf-8.  You see, Emacs comes with many "input
methods" which allow you to type \`a to get à (TeX input method) or h('|
to get ᾕ (greek-ibycus4) or <'h| to get ᾕ (greek-babel).  There are also
input methods for Cyrillic, and various Asian scripts which correspond
quite exactly to various legacy 7-bit and 8-bit text encodings.

I wrote some code that takes advantage of these methods to translate
text in files rather than keystrokes, and I posted it to the emacs.devel
list.  I would be happy to put it up on the web with a detailed
explanation of how to use it for non-Emacs people, if there is interest.

What do other people do to solve this conversion problem?

-- 
Peter Heslin (http://www.dur.ac.uk/p.j.heslin)