[XeTeX] Re: New feature request for XeTeX

Jonathan Kew jonathan_kew at sil.org
Tue Jul 27 10:11:29 CEST 2004

On 27 Jul 2004, at 3:21 am, Ross Moore wrote:

> On 27/07/2004, at 4:31 AM, Somadevah at aol.com wrote:
>>  In a message dated 26/7/04 4:45:59 pm, ross at ics.mq.edu.au writes:
>> > If you need something more than a simple character code remapping 
>> for
>>  > certain characters, perhaps those instances could be handled as
>>  > \active characters in TeX, while the "mapping=...." option would 
>> allow
>>  > you to remap the majority of simple characters without having to
>>  > \activate just about everything. Reasonable compromise?
>>  I suspect this might prove far more valuable than it first looks. 
>> Would it not allow one to remap from one script to another? So that 
>> one might type Hindi in unicode Roman transliteration and then be 
>> able to output it either as Roman transliterated text or Devanagari, 
>> depending on the "mapping" used for an environment?
> Yes, indeed.
> This is precisely one of the possible uses of a "font mapping".
> Another, more mundane, is a simple solution for the "--" and "---"
>  *lack of ligature* problem:
>    "---" could be mapped to ^^^^2014 (em dash)
>    "--"  could be mapped to ^^^^2013 (en dash)
> and similarly for other TeX-specific ligature sequences.

This all depends what level of mapping is provided. My initial 
impression from your suggestion, Ross, was that we'd have a simple 
remapping of individual character codes, somewhat analogous to the .map 
files used by ttf2pt1, for example. This would allow each individual 
legacy codepoint to be mapped to a desired Unicode character in the 
font; but it wouldn't provide a solution for ligatures. And it might 
well be inadequate for many kinds of transliteration, where the 
relationship between scripts is not always a simple one-to-one 

I'm also toying with ideas of more powerful mappings; it happens that I 
have a character-mapping engine, TECkit (see 
http://scripts.sil.org/teckit) that we could perhaps press into 
service. TECkit supports many-to-many mappings, contextually-determined 
mappings, even reordering of code sequences. It was developed primarily 
to support complex mappings between legacy byte encodings and Unicode, 
but can also operate as a transducer entirely within Unicode, which is 
what would be needed here as XeTeX will already have interpreted the 
input text as Unicode when it was initially read from the file.

Anyway, it's an interesting concept and may actually be implementable, 
too.... stay tuned.... but don't hold your breath. :-)


More information about the XeTeX mailing list