[XeTeX] Converting legacy encodings to utf-8

Wed Jul 12 21:27:33 CEST 2006

Will Robertson wrote:
> But perhaps it's too weighed down with Aleph assumptions/dependence.  
> Note that I suspect a sort of equivalence between OCPs and TECkit  
> mappings...
I'd be delighted if someone could confirm that! Up to now I only wrote a 
few very simple TECkit mappings, and my initial impression was that 
TECkit's functionality is not as ambitious as that of Omega translations 
processes (OTP). But perhaps I should just read the TECkit 
documentation... ;-)
Last year I wrote a set of OTPs to convert ArabTeX input to UTF-8, 
admittedly not a simple task. The results were yet not perfect, but 
pretty decent. Since then I more or less abandoned Aleph/Omega, at least 
for my own practical purposes: too many bugs and headaches.

Now if it indeed turns out that TECkit provides the equivalent 
functionality or OTPs, I would be willing to rewrite ArabTeX -> UTF-8 
TECkit mappings for the benefit of XeTeX's users. (Despite the 
availability of Unicode bidi editors nowadays, there are still 
compelling reasons why one -- in particular linguists, orientalists, or 
historians of science like myself -- would prefer to input a language 
such as Arabic by means of an intelligent ASCII encoding convention. But 
this is another story.)

BTW I have here a quick-and-dirty Perl script for converting traditional 
LaTeX input to UTF-8 and covering about 650 glyphs. It is based on the 
data in the utf2any programm, except of course that the conversion is 
done in reverse. I can provide it to anyone who might be interested. I 
guess such a tool, once extended and improved, could be "shipped" along 
with XeTeX eventually.

Regards,
François Charette