[XeTeX] Converting legacy encodings to utf-8
Jonathan Kew
jonathan_kew at sil.org
Tue Jul 11 00:48:10 CEST 2006
On 10 Jul 2006, at 10:41 pm, Peter Heslin wrote:
> One of the issues involved with migrating to XeTeX is the
> incompatibility between old documents in various legacy encodings and
> new documents in utf-8. While it's possible to run the compile the
> former with old TeX and the latter with Xetex, there is a constant
> problem when you want to copy text from the former to the latter.
>
> The solution I have come up with for this is to use Emacs to
> convert the
> legacy encodings to utf-8. You see, Emacs comes with many "input
> methods" which allow you to type \`a to get à (TeX input method) or
> h('|
> to get ᾕ (greek-ibycus4) or <'h| to get ᾕ (greek-babel). There
> are also
> input methods for Cyrillic, and various Asian scripts which correspond
> quite exactly to various legacy 7-bit and 8-bit text encodings.
>
> I wrote some code that takes advantage of these methods to translate
> text in files rather than keystrokes, and I posted it to the
> emacs.devel
> list. I would be happy to put it up on the web with a detailed
> explanation of how to use it for non-Emacs people, if there is
> interest.
>
> What do other people do to solve this conversion problem?
Not being an emacs aficionado -- it's been too many years, and my
fingers have forgotten those keystrokes -- I would typically use
TECkit <http://scripts.sil.org/teckit> to create mappings for such
encodings.
If the source text is an industry-standard codepage such as MacRoman
or Windows CP1250 or whatever, then tools like iconv will do nicely.
Or in the case of "standard" TeX input like \`a, an option is to just
load the xunicode package (thanks, Ross!) and leave the source file
as simple ASCII.
Of course, I don't tend to do much back-and-forth interchange between
xetex and legacy tex systems....
JK
More information about the XeTeX
mailing list